阅读背景:

正则表达式,用大写字母分割字符串,但忽略TLA

来源:互联网 

I'm using the regex

我使用正则表达式

System.Text.RegularExpressions.Regex.Replace(stringToSplit, "([A-Z])", " 

I'm using the regex

我使用正则表达式

System.Text.RegularExpressions.Regex.Replace(stringToSplit, "([A-Z])", " $1").Trim()

to split strings by capital letter, for example:

用大写字母分割字符串,例如:

'MyNameIsSimon' becomes 'My Name Is Simon'

“MyNameIsSimon”变成了“我的名字是西蒙”

I find this incredibly useful when working with enumerations. What I would like to do is change it slightly so that strings are only split if the next letter is a lowercase letter, for example:

我发现这在使用枚举时非常有用。我想做的是稍微改变一下,这样只有当下一个字母是小写字母时,字符串才会被分割,例如:

'USAToday' would become 'USA Today'

“今日美国”将成为“今日美国”

Can this be done?

这个可以做吗?

EDIT: Thanks to all for responding. I may not have entirely thought this through, in some cases 'A' and 'I' would need to be ignored but this is not possible (at least not in a meaningful way). In my case though the answers below do what I need. Thanks!

编辑:感谢大家的回复。我可能没有完全考虑过这个问题,在某些情况下,“A”和“I”需要被忽略,但这是不可能的(至少不是以一种有意义的方式)。就我而言,下面的答案可以满足我的需要。谢谢!

7 个解决方案

#1


40  

((?<=[a-z])[A-Z]|[A-Z](?=[a-z]))

or its Unicode-aware cousin

或其能够处理unicode的表弟

((?<=\p{Ll})\p{Lu}|\p{Lu}(?=\p{Ll}))

when replaced globally with

当全球换成

" $1"

handles

处理

TodayILiveInTheUSAWithSimon
USAToday
IAmSOOOBored

yielding

屈服

 Today I Live In The USA With Simon
USA Today
I Am SOOO Bored

In a second step you'd have to trim the string.

第二步,你得把绳子修剪一下。

#2


11  

any uppercase character that is not followed by an uppercase character:

任何大写字符,后面没有大写字符:

Replace(string, "([A-Z])(?![A-Z])", " $1")

Edit:

编辑:

I just noticed that you're using this for enumerations. I really do not encourage using string representations of enumerations like this, and the problems at hand is a good reason why. Have a look at this instead: https://www.refactoring.com/catalog/replaceTypeCodeWithClass.html

我刚刚注意到你在用它进行枚举。我真的不鼓励像这样使用枚举的字符串表示,手边的问题是一个很好的理由。看看这个:https://www.refactoring.com/catalog/replaceTypeCodeWithClass.html

#3


1  

You might think about changing the enumerations; MS coding guidelines suggest Pascal casing acronyms as though they were words; XmlDocument, HtmlWriter, etc. Two-letter acryonyms don't follow this rule, though; System.IO.

您可以考虑更改枚举数;编码指南建议Pascal的缩写词就像单词一样;但是,XmlDocument、HtmlWriter等两个字母的丙烯腈不遵循此规则;先。

So you should be using UsaToday, and your problem will disappear.

所以你应该用UsaToday,这样你的问题就消失了。

#4


1  

I hope this will help you regarding splitting a string by its capital letters and much more. You can try using Humanizer, which is a free nuget package. This will save you for more trouble with letters, sentences, numbers, quantities and much more in many languages. Check out this at: https://www.nuget.org/packages/Humanizer/

我希望这能帮助你用它的大写字母来分割一个字符串。你可以尝试使用Humanizer,这是一个免费的nuget包。这将为你在许多语言中避免更多的字母、句子、数字、数量和更多的麻烦。看看这个:https://www.nuget.org/packages/Humanizer/

#5


0  

Tomalak's expression worked for me, but not with the built-in Replace function. Regex.Replace(), however, did work.

Tomalak的表达式对我来说是有效的,但是没有内置的替换功能。然而,Regex.Replace()所做的工作。

For i As Integer = 0 To names.Length - 1
  'Worked
  names(i) = Regex.Replace(names(i), "((?<=[a-z])[A-Z]|[A-Z](?=[a-z]))", " $1").TrimStart()

  ' Didn't work
  'names(i) = Replace(names(i), "([A-Z])(?=[a-z])|(?<=[a-z])([A-Z])", " $1").TrimStart()
Next

BTW, I'm using this to split the words in enumeration names for display in the UI and it works beautifully.

顺便说一句,我用它来分割枚举名称中的单词,以便在UI中显示,它工作得很好。

#6


0  

Note: I didn't read the question good enough, USAToday will return "Today"; so this anwser isn't the right one.

注:我读的问题不够好,USAToday将返回“Today”;所以这个不是正确的。

    public static List<string> SplitOnCamelCase(string text)
    {
        List<string> list = new List<string> ();
        Regex regex = new Regex(@"(\p{Lu}\p{Ll}+)");
        foreach (Match match in regex.Matches(text))
        {
            list.Add (match.Value);
        }
        return list;
    }

This will match "WakeOnBoot" as "Wake On Boot" and doesn't return anything on NMI or TLA

这将匹配“WakeOnBoot”作为“启动”,并且不会在NMI或TLA上返回任何东西。

#7


0  

My version that also handles simple arithmetic expressions:

我的版本也处理简单的算术表达式:

private string InjectSpaces(string s)
{
    var patterns = new string[] {
        @"(?<=[^A-Z,&])[A-Z]",          // match capital preceded by any non-capital except ampersand
        @"(?<=[A-Z])[A-Z](?=[a-z])",    // match capital preceded by capital and followed by lowercase letter
        @"[\+\-\*\/\=]",                // match arithmetic operators
        @"(?<=[\+\-\*\/\=])[0-9,\(]"    // match 0-9 or open paren preceded by arithmetic operator
    };
    var pattern = $"({string.Join("|", patterns)})";
    return Regex.Replace(s, pattern, " $1");
}

").Trim() System.Text.Re



你的当前访问异常,请进行认证后继续阅读剩余内容。

分享到: