Try this code.
试试这个代码。
test = ' az z bz z z stuff z z '
re.sub(r'(\W)(z)(\W)', r'
Try this code.
试试这个代码。
test = ' az z bz z z stuff z z '
re.sub(r'(\W)(z)(\W)', r'\1_\2\3', test)
This should replace all stand-alone z's with _z
这应该用_z替换所有独立的z
However, the result is:
但结果是:
' az _z bz _z z stuff _z _z '
'az _z bz _z z stuff _z _z'
You see there's a z there that is missing. I theorize that it's because the grouping can't grab the space between the z's to match two z's at once (one for trailing whitespace, one for leading whitespace). Is there a way to fix this?
你看到那里有一个缺少的z。我认为这是因为分组不能抓住z之间的空间来同时匹配两个z(一个用于尾随空格,一个用于前导空格)。有没有办法来解决这个问题?
4 个解决方案
#1
4
The reason why it does that is that you get an overlapping match; you need to not match the extra character - there are two ways you can do this; one is using \b, the word boundary, as suggested by others, the other is using a lookbehind assertion and a lookahead assertion. (If reasonable, as it should probably be, use \b instead of this solution. This is mainly here for educational purposes.)
这样做的原因是你得到一个重叠的匹配;你需要不匹配额外的角色 - 你有两种方法可以做到这一点;一个是使用\ b,单词边界,如其他人所建议的,另一个是使用lookbehind断言和前瞻断言。 (如果合理,可能应该使用\ b而不是此解决方案。这主要用于教育目的。)
>>> re.sub(r'(?<!\w)(z)(?!\w)', r'_\1', test)
' az _z bz _z _z stuff _z _z '
(?<!\w) makes sure there wasn't \w before.
(?<!\ w)确保之前没有\ w。
(?!\w) makes sure there isn't \w after.
(?!\ w)确保没有\ w后。
The special (?...) syntax means they aren't groups, so the (z) is \1.
特殊(?...)语法意味着它们不是组,因此(z)是\ 1。
As for a graphical explanation of why it fails:
至于它失败原因的图解说明:
The regex is going through the string doing replacement; it's at these three characters:
正则表达式正在通过字符串进行替换;这是三个字符:
' az _z bz z z stuff z z '
^^^
It does that replacement. The final character has been acted upon, so its next step is approximately this:
它做了那个替代。最后一个角色已被采取行动,所以下一步是这样的:
' az _z bz _z z stuff z z '
^^^ <- It starts matching here.
^ <- Not this character, it's been consumed by the last match
#2
6
If your goal is to make sure you only match z when it's a standalone word, use \b to match word boundaries without actually consuming the whitespace:
如果您的目标是确保只在匹配z是独立单词时匹配z,请使用\ b匹配单词边界而不实际使用空格:
>>> re.sub(r'\b(z)\b', r'_\1', test)
' az _z bz _z _z stuff _z _z '
#3
5
You want to avoid capturing the whitespace. Try using the 0-width word break \b, like this:
你想避免捕获空白。尝试使用0宽度的单词中断\ b,如下所示:
re.sub(r'\bz\b', '_z', test)
#4
1
Use this:
用这个:
test = ' az z bz z z stuff z z '
re.sub(r'\b(z)\b', r'_\1', test)
_', test)
test = ' az z bz z z