阅读背景:

为什么Python中的re.sub在此测试用例中无法正常工作?

来源:互联网 

Try this code.

试试这个代码。

test = ' az z bz z z stuff z  z '
re.sub(r'(\W)(z)(\W)', r'

Try this code.

试试这个代码。

test = ' az z bz z z stuff z  z '
re.sub(r'(\W)(z)(\W)', r'\1_\2\3', test)

This should replace all stand-alone z's with _z

这应该用_z替换所有独立的z

However, the result is:

但结果是:

' az _z bz _z z stuff _z _z '

'az _z bz _z z stuff _z _z'

You see there's a z there that is missing. I theorize that it's because the grouping can't grab the space between the z's to match two z's at once (one for trailing whitespace, one for leading whitespace). Is there a way to fix this?

你看到那里有一个缺少的z。我认为这是因为分组不能抓住z之间的空间来同时匹配两个z(一个用于尾随空格,一个用于前导空格)。有没有办法来解决这个问题?

4 个解决方案

#1


4  

The reason why it does that is that you get an overlapping match; you need to not match the extra character - there are two ways you can do this; one is using \b, the word boundary, as suggested by others, the other is using a lookbehind assertion and a lookahead assertion. (If reasonable, as it should probably be, use \b instead of this solution. This is mainly here for educational purposes.)

这样做的原因是你得到一个重叠的匹配;你需要不匹配额外的角色 - 你有两种方法可以做到这一点;一个是使用\ b,单词边界,如其他人所建议的,另一个是使用lookbehind断言和前瞻断言。 (如果合理,可能应该使用\ b而不是此解决方案。这主要用于教育目的。)

>>> re.sub(r'(?<!\w)(z)(?!\w)', r'_\1', test)
' az _z bz _z _z stuff _z  _z '

(?<!\w) makes sure there wasn't \w before.

(?<!\ w)确保之前没有\ w。

(?!\w) makes sure there isn't \w after.

(?!\ w)确保没有\ w后。

The special (?...) syntax means they aren't groups, so the (z) is \1.

特殊(?...)语法意味着它们不是组,因此(z)是\ 1。


As for a graphical explanation of why it fails:

至于它失败原因的图解说明:

The regex is going through the string doing replacement; it's at these three characters:

正则表达式正在通过字符串进行替换;这是三个字符:

' az _z bz z z stuff z  z '
          ^^^

It does that replacement. The final character has been acted upon, so its next step is approximately this:

它做了那个替代。最后一个角色已被采取行动,所以下一步是这样的:

' az _z bz _z z stuff z  z '
              ^^^ <- It starts matching here.
             ^ <- Not this character, it's been consumed by the last match

#2


6  

If your goal is to make sure you only match z when it's a standalone word, use \b to match word boundaries without actually consuming the whitespace:

如果您的目标是确保只在匹配z是独立单词时匹配z,请使用\ b匹配单词边界而不实际使用空格:

>>> re.sub(r'\b(z)\b', r'_\1', test)
' az _z bz _z _z stuff _z  _z '

#3


5  

You want to avoid capturing the whitespace. Try using the 0-width word break \b, like this:

你想避免捕获空白。尝试使用0宽度的单词中断\ b,如下所示:

re.sub(r'\bz\b', '_z', test)

#4


1  

Use this:

用这个:

test = ' az z bz z z stuff z  z '
re.sub(r'\b(z)\b', r'_\1', test)

_', test) test = ' az z bz z z



你的当前访问异常,请进行认证后继续阅读剩余内容。

分享到: