阅读背景:

SQL Server全文搜索带逗号的数字字符串

来源:互联网 

I have a full-text indexed nvarchar(max) column on a SQL Server 2012 install. If one row of the column has 'blah blah

I have a full-text indexed nvarchar(max) column on a SQL Server 2012 install. If one row of the column has 'blah blah $1,234,567 blah blah' as data. When I run the following queries, the ones that return the row are shown:

我在SQL Server 2012安装上有一个全文索引nvarchar(max)列。如果该列的一行有'blah blah $ 1,234,567 blah blah'作为数据。当我运行以下查询时,将显示返回该行的查询:

SELECT ftext FROM dbo.Test WHERE Contains([ftext], '"1,234,567*"') --true
SELECT ftext FROM dbo.Test WHERE Contains([ftext], '"1234567*"') --true
SELECT ftext FROM dbo.Test WHERE Contains([ftext], '"1,234*"') --true
SELECT ftext FROM dbo.Test WHERE Contains([ftext], '"1234*"') --true
SELECT ftext FROM dbo.Test WHERE Contains([ftext], '"1,234,5*"') --false
SELECT ftext FROM dbo.Test WHERE Contains([ftext], '"12345*"') --true
SELECT ftext FROM dbo.Test WHERE Contains([ftext], '"1,234,56*"') --false
SELECT ftext FROM dbo.Test WHERE Contains([ftext], '"123456*"') --true

At first I just assumed the comma was treated as noise, but that doesn't seem to be the case as "1,234,567*" and "1,234*" return a result while "1,234,5*" and "1,234,56*" do not. Why is this?

起初我只是假设逗号被视为噪音,但似乎并非如此,因为“1,234,567 *”和“1,234 *”返回结果,而“1,234,5 *”和“1,234,56 *”做不。为什么是这样?

1 个解决方案

#1


This behavior is due to a combination of how numeric values are treated and how the word breaker is applied to the search term. In short, if the text looks like a number without the wildcard then it is treated like a number, otherwise it is treated like a string.

此行为是由于数字值的处理方式以及断字符应用于搜索词的方式的组合。简而言之,如果文本看起来像没有通配符的数字,则将其视为数字,否则将其视为字符串。

When searching on a valid number with commas, the full text engine will treat it as both a string and a number. You can see this in action by using sys.dm_fts_parser which is used by the engine to parse the search string. For example, here are the results of SELECT display_term FROM sys.dm_fts_parser (' "1,234,567*" ', 1033, 0, 0):

使用逗号搜索有效数字时,全文引擎会将其视为字符串和数字。您可以通过使用引擎用来解析搜索字符串的sys.dm_fts_parser来查看此操作。例如,以下是SELECT display_term FROM sys.dm_fts_parser('“1,234,567 *”',1033,0,0)的结果:

display_term
---------------------
1,234,567      <-- string
nn1234567      <-- number

I'm a little unsure of how 1,234,567 is stored in the full text index -- it will be one of the above values listed above or both -- but regardless, it's easy to see how "1,234,567*" will find a match in the index.

我有点不确定1,234,567如何存储在全文索引中 - 它将是上面列出的上述值之一或两者 - 但无论如何,很容易看出“1,234,567 *”将如何在指数。

Now let's try "1,234,56*". The results of SELECT * FROM sys.dm_fts_parser (' "1,234,56*" ', 1033, 0, 0) are:

现在让我们试试“1,234,56 *”。 SELECT * FROM sys.dm_fts_parser('“1,234,56 *”',1033,0,0)的结果是:

display_term
---------------------
1
nn1
234
nn234
56
nn56

Whoa, what happened? Well, 1,234,56 is not a valid number, so it is treated like a string. Thus it is split by the commas and the individual values (1, 234, 56) are identified as being strings or numbers. It's the same as if you searched for "1" AND "234" AND "56*".

哇,发生什么事了?好吧,1,234,56不是有效数字,所以它被视为一个字符串。因此,它被逗号分开,并且各个值(1,234,56)被识别为字符串或数字。这与搜索“1”和“234”和“56 *”的情况相同。

Some ideas for working around this behavior:

解决此问题的一些想法:

  • Use a LIKE query instead SELECT ftext FROM dbo.Test WHERE [ftext] LIKE '1,234,56%'
  • 使用LIKE查询而不是SELECT ftext FROM dbo.Test WHERE [ftext] LIKE'1,234,56%'

  • Pre-process the search string to remove commas from numbers.
  • 预处理搜索字符串以从数字中删除逗号。


,234,567 blah blah' as data. When I run the following queries, the ones that return the row are shown:I have a full-text indexed nvarchar(max) column




你的当前访问异常,请进行认证后继续阅读剩余内容。

分享到: