阅读背景:

正则表达式将字符串标记为类

来源:互联网 
msiinv  2015  14:58:10

SSMSBoost for SSMS 2012
    Product code:   {94EDFFE7-E4F4-4C9B-A57E-C7267BB4A777}
    Product state:  (5) Installed.
    Assignment: per user
    Package code:   {5D9DA43D-E03A-4420-A8AF-3D2FFBA1A696}
    Version:    2.15.5473.18051
    Publisher:  Solutions Crew
    Language:   1033
    Installed from: C:\Users\EffiaSoft\Downloads\
        Package:    SSMSBoostInstaller2012_2.15.5473.18051.msi
    Product Icon:   %APPDATA%\Microsoft\Installer\{94EDFFE7-E4F4-4C9B-A57E-C7267BB4A777}\icon.ico
    Instance type:  0
    Local package:  C:\Windows\Installerb9554a.msi
    Install date:   2015
msiinv  2015\03\22  14:58:10

SSMSBoost for SSMS 2012
    Product code:   {94EDFFE7-E4F4-4C9B-A57E-C7267BB4A777}
    Product state:  (5) Installed.
    Assignment: per user
    Package code:   {5D9DA43D-E03A-4420-A8AF-3D2FFBA1A696}
    Version:    2.15.5473.18051
    Publisher:  Solutions Crew
    Language:   1033
    Installed from: C:\Users\EffiaSoft\Downloads\
        Package:    SSMSBoostInstaller2012_2.15.5473.18051.msi
    Product Icon:   %APPDATA%\Microsoft\Installer\{94EDFFE7-E4F4-4C9B-A57E-C7267BB4A777}\icon.ico
    Instance type:  0
    Local package:  C:\Windows\Installer\58b9554a.msi
    Install date:   2015\01\22
    0 patch packages.

Microsoft Application Error Reporting
    Product code:   {95120000-00B9-0409-0000-0000000FF1CE}
    Product state:  (5) Installed.
    Assignment: per machine
    Package code:   {420F351B-33A5-4A58-A856-69B2EDEDC8F7}
    Version:    12.0.6012.5000
    Publisher:  Microsoft Corporation
    Language:   1033
    Installed from: c:\f04684676d077419cb\redist\watson\
        Package:    dw20shared.msi
    About link: https://support.microsoft.com
    Help link:  https://support.microsoft.com
    Instance type:  0
    Local package:  c:\Windows\Installer\913d6.msi
    Install date:   2014\03\19
    0 patch packages.

I'm trying to tokenize this text. The result that I expect is a class call Software which will have properties as ProductCode, ProductState and all other properties defined in the text and populate that with the values after the colon. So parsing this file would give me list of Software class. How do you think I should proceed with this.

我正在尝试将此文字标记为正确。我期望的结果是一个类调用软件,它将具有ProductCode,ProductState和文本中定义的所有其他属性的属性,并使用冒号后的值填充它。所以解析这个文件会给我一个Software类列表。你怎么认为我应该继续这样做。

1 个解决方案

#1


2  

I'm not able to comment because of my rep (which is a bit stupid) but here is my suggestion.

由于我的代表(这有点愚蠢),我无法评论,但这是我的建议。

It probably is not going to be a clean solution but if that is your only output then you could always split the string up by line breaks and then loop around the array that it gives you and use the following regex to get the value before the first semi-colon

它可能不是一个干净的解决方案,但如果这是你唯一的输出,那么你总是可以通过换行符拆分字符串然后循环它给你的数组并使用以下正则表达式获取第一个之前的值分号

^\D+(?=:\s)

you would then need to have some sort of switch to work out what property of the software class you would need to put it in. It may be messy but it looks as though with the outputted text it would be pretty safe to assume it's going to be largely the same.

With there being variable amounts of spaces tabs and different characters to get the value for the property I would simply just use the regex above to replace the property name with nothing and then the rest of that array would be your value. This would be reducing the amount of regex you need to use which is generally a better thing.

然后你需要进行某种切换来计算你需要将它放入的软件类的属性。它可能很乱,但看起来好像输出的文本,假设它将是非常安全的大致相同。由于存在可变数量的空格选项卡和不同的字符来获取属性的值,我只需使用上面的正则表达式将属性名称替换为空,然后该数组的其余部分将是您的值。这将减少您需要使用的正则表达式的数量,这通常是更好的事情。

psuedo code wise would be

psuedo代码明智的将是

Split string up by new line characters
loop through collection of strings
    Run the regex ^\D+(?=:\s)
    switch on the regex string to find the property name
        replace property name with blank space using the regex  ^\D+:\s+
        using the rest of the string as the value set the property.

I don't know why you have been marked down, I suspect that its because you mentioned regex. If possible, the nicer solution (and one which may not get marked down) would be to convert the text file into an xml file. I don't know how possible it is for your output but make it a lot better solution.

我不知道为什么你被标记下来,我怀疑它是因为你提到了正则表达式。如果可能的话,更好的解决方案(以及可能未被标记的解决方案)将文本文件转换为xml文件。我不知道它对你的输出有多大可能,但是它使它成为更好的解决方案。

EDIT: updated the regex to work with the exceptions in the comment.

编辑:更新正则表达式以使用注释中的例外。


0 patch packages. Microsoft Application Error Reporting Product code: {95120000-00B9-0409-0000-0000000FF1CE} Product state: (5) Installed. Assignment: per machine Package code: {420F351B-33A5-4A58-A856-69B2EDEDC8F7} Version: 12.0.6012.5000 Publisher: Microsoft Corporation Language: 1033 Installed from: c:\f04684676d077419cb\redist\watson\ Package: dw20shared.msi About link: https://support.microsoft.com Help link: https://support.microsoft.com Instance type: 0 Local package: c:\Windows\Installer3d6.msi Install date: 2014 0 patch packages. msiinv 2015 14:58:10 SSMSBoost for SSM



你的当前访问异常,请进行认证后继续阅读剩余内容。

分享到: