I need to match a regex that uses backreferences (e.g.
I need to match a regex that uses backreferences (e.g. \1) in my Go code.
我需要在我的Go代码中匹配使用反向引用的正则表达式(例如\ 1)。
That's not so easy because in Go, the official regexp package uses the RE2 engine, one that have chosen to not support backreferences (and some other lesser-known features) so that there can be a guarantee of linear-time execution, therefore avoiding regex denial-of-service attacks. Enabling backreferences support is not an option with RE2.
这并不容易,因为在Go中,官方的regexp软件包使用RE2引擎,该引擎选择不支持反向引用(以及其他一些鲜为人知的功能),以便可以保证线性时间执行,从而避免使用正则表达式拒绝服务攻击。 RE2不支持启用反向引用支持。
In my code, there is no risk of malicious exploitation by attackers, and I need backreferences.
在我的代码中,攻击者不存在恶意利用的风险,我需要反向引用。
What should I do?
我该怎么办?
3 个解决方案
#1
7
Answering my own question here, I solved this using golang-pkg-pcre, it uses libpcre++, perl regexes that do support backreferences. The API is not the same.
在这里回答我自己的问题,我使用golang-pkg-pcre解决了这个问题,它使用libpcre ++,支持反向引用的perl正则表达式。 API不一样。
#2
7
Regular Expressions are great for working with regular grammars, but if your grammar isn't regular (i.e. requires back-references and stuff like that) you should probably switch to a better tool. There are a lot of good tools available for parsing context-free grammars, including yacc which is shipped with the Go distribution by default. Alternatively, you can also write your own parser. Recursive descent parsers can be easily written by hand for example.
正则表达式非常适合使用常规语法,但如果你的语法不规则(即需要反向引用和类似的东西),你应该切换到更好的工具。有许多好的工具可用于解析无上下文语法,包括默认情况下随Go分发一起提供的yacc。或者,您也可以编写自己的解析器。例如,可以手动编写递归下降解析器。
I think regular expressions are overused in scripting languages (like Perl, Python, Ruby, ...) because their C/ASM powered implementation is usually more optimized than those languages itself, but Go isn't such a language. Regular expressions are usually quite slow and are often not suited for the problem at all.
我认为正则表达式在脚本语言(如Perl,Python,Ruby,...)中被过度使用,因为它们的C / ASM驱动的实现通常比那些语言本身更优化,但Go不是这样的语言。正则表达式通常很慢,通常根本不适合这个问题。
#3
2
When I had the same problem, I solved it using a two-step regular expression match. The original code is:
当我遇到同样的问题时,我使用两步正则表达式匹配解决了它。原始代码是:
if m := match(pkgname, `^(.*)\$\{DISTNAME:S(.)(\\^?)([^:]*)(\\$?)\2([^:]*)\2(g?)\}(.*)$`); m != nil {
before, _, left, from, right, to, mod, after := m[1], m[2], m[3], m[4], m[5], m[6], m[7], m[8]
// ...
}
The code is supposed to parse a string of the form ${DISTNAME:S|from|to|g}, which itself is a little pattern language using the familiar substitution syntax S|replace|with|.
该代码应该解析$ {DISTNAME:S | from | to | g}形式的字符串,它本身是一种使用熟悉的替换语法S | replace | with |的小模式语言。
The two-stage code looks like this:
两阶段代码如下所示:
if m, before, sep, subst, after := match4(pkgname, `^(.*)\$\{DISTNAME:S(.)([^\\}:]+)\}(.*)$`); m {
qsep := regexp.QuoteMeta(sep)
if m, left, from, right, to, mod := match5(subst, `^(\^?)([^:]*)(\$?)`+qsep+`([^:]*)`+qsep+`(g?)$`); m {
// ...
}
}
The match, match4 and match5 are my own wrapper around the regexp package, and they cache the compiled regular expressions so that at least the compilation time is not wasted.
match,match4和match5是我自己的regexp包的包装器,它们缓存编译的正则表达式,这样至少不会浪费编译时间。
) in my Go code.I need to match a regex that uses backreference