I have this piece of code:
我有这段代码:
for n in (range(1,10)):
new = re.sub(r'(regex(group)regex)?regex', r'something'+str(n)+r'
I have this piece of code:
我有这段代码:
for n in (range(1,10)):
new = re.sub(r'(regex(group)regex)?regex', r'something'+str(n)+r'\1', old, count=1)
It throws the unmatched group error. But if it is unmatched, I want to add empty string there instead of throwing an error. How could I achieve this?
它抛出了无法匹配的组错误。但如果它是不匹配的,我想在那里添加空字符串而不是抛出错误。我怎么能实现这个目标?
Note: My full code is much more complicated than this example. But if you find out better solution how to iterate over matches and add number inside, you could share. My full code:
注意:我的完整代码比这个例子复杂得多。但是如果你找到更好的解决方案如何迭代匹配并在里面添加数字,你可以分享。我的完整代码:
for n in (range(1,(text.count('soutez')+1))):
text = re.sub(r'(?i)(\s*\{{2}infobox medaile reprezentant(ka)?\s*\|\s*([^\}]*)\s*\}{2}\s*)?\{{2}infobox medaile soutez\s*\|\s*([^\}]*)\s*\}{2}\s*', r"\n | reprezentace"+str(n)+r" = \3\n | soutez"+str(n)+r" = \4\n | medaile"+str(n)+r" = \n", text, count=1)
3 个解决方案
#1
8
Root cause
Before Python 3.5, backreferences to failed capture groups in Python re.sub were not populated with an empty string. Here is Bug 1519638 description at bugs.python.org. Thus, when using a backreference to a group that did not participate in the match resulted in an error.
在Python 3.5之前,Python re.sub中对失败捕获组的反向引用没有填充空字符串。这是bugs.python.org上的Bug 1519638描述。因此,当对未参与匹配的组使用反向引用时会导致错误。
There are two ways to fix that issue.
有两种方法可以解决这个问题。
Solution 1: Adding empty alternatives to make optional groups obligatory
You can replace all optional capturing groups (those constructs like (\d+)?) with obligatory ones with an empty alternative (i.e. (\d+|)).
您可以将所有可选捕获组(这些结构如(\ d +)?)替换为具有空替代(即(\ d + |))的必需捕获组。
Here is an example of the failure:
以下是失败的示例:
import re
old = 'regexregex'
new = re.sub(r'regex(group)?regex', r'something\1something', old)
print(new)
Replacing one line with
用。替换一行
new = re.sub(r'regex(group|)regex', r'something\1something', old)
It works.
Solution 2: Using lambda expression in the replacement and checking if the group is not None
This approach is necessary if you have optional groups inside another optional group.
如果在另一个可选组中包含可选组,则此方法是必需的。
You can use a lambda in the replacement part to check if the group is initialized, not None, with lambda m: m.group(n) or ''. Use this solution in your case, because you have two backreferences - #3 and #4 - in the replacement pattern, but some matches (see Match 1 and 3) do not have Capture group 3 initialized. It happens because the whole first part - (\s*\{{2}funcA(ka|)\s*\|\s*([^}]*)\s*\}{2}\s*|) - is not participating in the match, and the inner Capture group 3 (i.e. ([^}]*)) just does not get populated even after adding an empty alternative.
您可以在替换部件中使用lambda来检查组是否已初始化,而不是None,使用lambda m:m.group(n)或''。在您的情况下使用此解决方案,因为您在替换模式中有两个反向引用 - #3和#4 - 但是某些匹配(请参阅匹配1和3)没有初始化Capture组3。这是因为整个第一部分 - (\ s * \ {{2} funcA(ka |)\ s * \ | \ s *([^}] *)\ s * \} {2} \ s * |) - 没有参与比赛,内部的Capture组3(即([^}] *))即使在添加空替换后也不会被填充。
re.sub(r'(?i)(\s*\{{2}funcA(ka|)\s*\|\s*([^\}]*)\s*\}{2}\s*|)\{{2}funcB\s*\|\s*([^\}]*)\s*\}{2}\s*',
r"\n | funcA"+str(n)+r" = \3\n | funcB"+str(n)+r" = \4\n | string"+str(n)+r" = \n",
text,
count=1)
should be re-written with
应该重写
re.sub(r'(?i)(\s*{{funcA(ka|)\s*\|\s*([^}]*)\s*}}\s*|){{funcB\s*\|\s*([^}]*)\s*}}\s*',
lambda m: r"\n | funcA"+str(n)+r" = " + (m.group(3) or '') + "\n | funcB" + str(n) + r" = " + (m.group(4) or '') + "\n | string" + str(n) + r" = \n",
text,
count=1)
See IDEONE demo
请参阅IDEONE演示
import re
text = r'''
{{funcB|param1}}
*some string*
{{funcA|param2}}
{{funcB|param3}}
*some string2*
{{funcB|param4}}
*some string3*
{{funcAka|param5}}
{{funcB|param6}}
*some string4*
'''
for n in (range(1,(text.count('funcB')+1))):
text = re.sub(r'(?i)(\s*\{{2}funcA(ka|)\s*\|\s*([^\}]*)\s*\}{2}\s*|)\{{2}funcB\s*\|\s*([^\}]*)\s*\}{2}\s*',
lambda m: r"\n | funcA"+str(n)+r" = "+(m.group(3) or '')+"\n | funcB"+str(n)+r" = "+(m.group(4) or '')+"\n | string"+str(n)+r" = \n",
text,
count=1)
assert text == r'''
| funcA1 =
| funcB1 = param1
| string1 =
*some string*
| funcA2 = param2
| funcB2 = param3
| string2 =
*some string2*
| funcA3 =
| funcB3 = param4
| string3 =
*some string3*
| funcA4 = param5
| funcB4 = param6
| string4 =
*some string4*
'''
print 'ok'
#2
0
I looked at this again.
A note that it is unfortunate that you have to deal with NULL's,
but here are the rules you must follow.
我再看一遍。请注意,您不得不处理NULL,但这是您必须遵循的规则。
The below matches all successfully match nothing.
You have to do this to find out the rules.
以下匹配所有成功匹配任何内容。你必须这样做才能找出规则。
It's not as simple as you may think. Take a close look at the results.
There is no apparent steadfast way formwise to tell if you will get NULL or EMPTY.
它并不像你想象的那么简单。仔细看看结果。没有明显的坚定方式来判断你是否会获得NULL或EMPTY。
However, looking at it closer, the rules come out and are fairly simple.
These rules must be followed if you care about NULL.
然而,仔细观察,规则出来并且相当简单。如果您关心NULL,则必须遵循这些规则。
There are only Two rules:
只有两个规则:
Rule # 1 - Any code GROUP that can't be reached, will result in NULL
规则#1 - 无法访问的任何代码GROUP将导致NULL
(?<Alt_1> # (1 start)
(?<a> a )? # (2)
(?<b> b? ) # (3)
)? # (1 end)
|
(?<Alt_2> # (4 start)
(?<c> c? ) # (5)
(?<d> d? ) # (6)
) # (4 end)
** Grp 0 - ( pos 0 , len 0 ) EMPTY
** Grp 1 [Alt_1] - ( pos 0 , len 0 ) EMPTY
** Grp 2 [a] - NULL
** Grp 3 [b] - ( pos 0 , len 0 ) EMPTY
** Grp 4 [Alt_2] - NULL
** Grp 5 [c] - NULL
Rule # 2 - Any code GROUP that can't be matched on the INSIDE, will result in NULL
规则#2 - 无法在INSIDE上匹配的任何代码GROUP将导致NULL
(?<A_1> # (1 start)
(?<a1> a? ) # (2)
)? # (1 end)
(?<A_2> # (3 start)
(?<a2> a )? # (4)
)? # (3 end)
(?<A_3> # (5 start)
(?<a3> a ) # (6)
)? # (5 end)
(?<A_4> # (7 start)
(?<a4> a )? # (8)
) # (7 end)
** Grp 0 - ( pos 0 , len 0 ) EMPTY
** Grp 1 [A_1] - ( pos 0 , len 0 ) EMPTY
** Grp 2 [a1] - ( pos 0 , len 0 ) EMPTY
** Grp 3 [A_2] - ( pos 0 , len 0 ) EMPTY
** Grp 4 [a2] - NULL
** Grp 5 [A_3] - NULL
** Grp 6 [a3] - NULL
** Grp 7 [A_4] - ( pos 0 , len 0 ) EMPTY
** Grp 8 [a4] - NULL
#3
0
To simplify:
Problem
- You are getting the error "sre_constants.error: unmatched group" from a Python 2.7 regex.
您从Python 2.7正则表达式中收到错误“sre_constants.error:unmatched group”。
- You have any regex pattern with optional groups (with or without nested expressions) and are trying to use those groups in your sub replacement argument (
re.sub(pattern, *repl*, string) or compiled.sub(*repl*, string))
你有任何正则表达式模式与可选组(有或没有嵌套表达式),并试图在你的子替换参数(re.sub(模式,* repl *,字符串)或compiled.sub(* repl *,字符串)中使用这些组))
Solution:
For results, return match.group(1) instead of \1 (or 2, 3, etc.). That's it; there is no or needed. The group result(s) can be returned with a function or a lambda.
对于结果,返回match.group(1)而不是\ 1(或2,3等)。而已;没有或没需要。可以使用函数或lambda返回组结果。
Example
You are using a common regex to strip C-style comments. Its design uses an optional group 1 to pass through pseudo-comments which should not be deleted (if they exist).
您正在使用常见的正则表达式来删除C风格的注释。它的设计使用一个可选的组1来传递不应删除的伪注释(如果它们存在)。
pattern = r'//.*|/\*[\s\S]*?\*/|("(\\.|[^"])*")'
regex = re.compile(pattern)
Using \1 fails with the error: "sre_constants.error: unmatched group":
使用\ 1失败并显示错误:“sre_constants.error:unmatched group”:
return regex.sub(r'\1', string)
Using .group(1) succeeds:
使用.group(1)成功:
return regex.sub(lambda m: m.group(1), string)
For those not familiar with lambda, this solution is equivalent to:
对于那些不熟悉lambda的人来说,这个解决方案相当于:
def optgroup(match):
return match.group(1)
return regex.sub(optgroup, string)
See the accepted answer for an excellent discussion of why \1 fails due to Bug 1519638. While the accepted answer is authoritative, it has two shortcomings: 1) the example from the original question is so convoluted that it makes the example solution difficult reading, and 2) it suggests returning a group or empty string -- that is not required, you may merely call .group() on each match.
请参阅接受的答案,以便对由于错误1519638而失败的原因进行了很好的讨论。虽然接受的答案具有权威性,但它有两个缺点:1)原始问题中的示例如此复杂以至于使得示例解决方案难以阅读, 2)它建议返回一个组或空字符串 - 这不是必需的,你可以在每次匹配时调用.group()。
', old, count=1)
for n in