Okay, so I'm trying to make a full text search in multiple columns, something simple like this:
好的,我想在多个列中进行全文搜索,像这样简单:
SELECT * FROM pages WHERE MATCH(head, body) AGAINST('some words' IN BOOLEAN MODE)
Now i want to order by relevance, (how many of the words are found?) which I have been able to do with something like this:
现在我想按相关性排序(找到了多少单词?)
SELECT * , MATCH (head, body) AGAINST ('some words' IN BOOLEAN MODE) AS relevance
FROM pages
WHERE MATCH (head, body) AGAINST ('some words' IN BOOLEAN MODE)
ORDER BY relevance
Now here comes the part where I get lost, I want to prioritize the relevance in the head
column.
这是我迷失的部分,我想在标题栏中优先考虑相关性。
I guess I could make two relevance columns, one for head
and one for body
, but at that point I'd be doing somewhat the same search in the table three times, and for what i'm making this function, performance is important, since the query will both be joined and matched against other tables.
我想我可以做两个关联列,一个头和一个身体,但在这一点上我做的有些相同的搜索表中的三次,和我做这个功能,性能是很重要的,因为查询都会加入和其他表匹配。
So, my main question is, is there a faster way to search for relevance and prioritize certain columns? (And as a bonus possibly even making relevance count number of times the words occur in the columns?)
所以,我的主要问题是,是否有一种更快的方法来搜索相关性并对某些列进行优先排序?(而且,作为额外的奖励,甚至可能使相关性计算出在列中出现的单词的次数?)
Any suggestions or advice would be great.
任何建议或建议都很好。
Note: I will be running this on a LAMP-server. (WAMP in local testing)
注意:我将在一个LAMP-server上运行它。(WAMP在本地测试)
4 个解决方案
#1
128
This might give the increased relevance to the head part that you want. It won't double it, but it might possibly good enough for your sake:
这可能会增加你想要的头部的相关性。它不会翻倍,但为了你,它可能已经足够好了:
SELECT pages.*,
MATCH (head, body) AGAINST ('some words') AS relevance,
MATCH (head) AGAINST ('some words') AS title_relevance
FROM pages
WHERE MATCH (head, body) AGAINST ('some words')
ORDER BY title_relevance DESC, relevance DESC
-- alternatively:
ORDER BY title_relevance + relevance DESC
An alternative that you also want to investigate, if you've the flexibility to switch DB engine, is Postgres. It allows to set the weight of operators and to play around with the ranking.
另一种选择是Postgres,如果您有切换DB引擎的灵活性,您也希望进行研究。它允许设置操作符的权重,并对排名进行调整。
#2
9
I have never done so, but it seems like
我从来没有这样做过,但它看起来是这样的。
MATCH (head, head, body) AGAINST ('some words' IN BOOLEAN MODE)
Should give a double weight to matches found in the head.
应该给在头部找到的火柴双倍的重量。
Just read this comment on the docs page, Thought it might be of value to you:
看看文档页面上的评论,你会觉得它可能对你有价值:
Posted by Patrick O'Lone on December 9 2002 6:51am
Patrick O'Lone在2002年12月9日早上6:51发布的
It should be noted in the documentation that IN BOOLEAN MODE will almost always return a relevance of 1.0. In order to get a relevance that is meaningful, you'll need to:
应该在文档中指出,在布尔模式下,几乎总是返回1.0的相关性。为了获得有意义的相关性,你需要:
SELECT MATCH('Content') AGAINST ('keyword1 keyword2') as Relevance
FROM table
WHERE MATCH ('Content') AGAINST('+keyword1+keyword2' IN BOOLEAN MODE)
HAVING Relevance > 0.2
ORDER BY Relevance DESC
Notice that you are doing a regular relevance query to obtain relevance factors combined with a WHERE clause that uses BOOLEAN MODE. The BOOLEAN MODE gives you the subset that fulfills the requirements of the BOOLEAN search, the relevance query fulfills the relevance factor, and the HAVING clause (in this case) ensures that the document is relevant to the search (i.e. documents that score less than 0.2 are considered irrelevant). This also allows you to order by relevance.
This may or may not be a bug in the way that IN BOOLEAN MODE operates, although the comments I've read on the mailing list suggest that IN BOOLEAN MODE's relevance ranking is not very complicated, thus lending itself poorly for actually providing relevant documents. BTW - I didn't notice a performance loss for doing this, since it appears MySQL only performs the FULLTEXT search once, even though the two MATCH clauses are different. Use EXPLAIN to prove this.请注意,您正在执行一个常规的关联查询,以获得与使用布尔模式的WHERE子句结合的关联因素。布尔模式为您提供满足布尔搜索需求的子集,关联查询满足关联因素,并且have子句(在本例中)确保文档与搜索相关(例如,得分低于0.2的文档被认为是不相关的)。这也允许您按相关性排序。这可能是布尔模式下的错误,也可能不是,尽管我在邮件列表上读到的评论表明,布尔模式下的相关性排名并不复杂,因此在实际提供相关文档时显得很糟糕。顺便说一句,我没有注意到这样做会造成性能损失,因为MySQL似乎只执行一次全文搜索,即使这两个匹配子句是不同的。用EXPLAIN来证明这一点。
So it would seem you may not need to worry about calling the fulltext search twice, though you still should "use EXPLAIN to prove this"
因此,你似乎不需要担心调用两次全文搜索,尽管你仍然应该“使用EXPLAIN来证明这一点”
#3
8
Just adding for who might need.. Don't forget to alter the table!
只是为可能需要的人添加。别忘了换桌子!
ALTER TABLE table_name ADD FULLTEXT(column_name);
#4
3
I was just playing around with this, too. One way you can add extra weight is in the ORDER BY area of the code.
我只是在玩这个游戏。添加额外权重的一种方法是按代码的区域排序。
For example, if you were matching 3 different columns and wanted to more heavily weight certain columns:
例如,如果你要匹配3个不同的列并且想要更重的权重,某些列:
SELECT search.*,
MATCH (name) AGAINST ('black' IN BOOLEAN MODE) AS name_match,
MATCH (keywords) AGAINST ('black' IN BOOLEAN MODE) AS keyword_match,
MATCH (description) AGAINST ('black' IN BOOLEAN MODE) AS description_match
FROM search
WHERE MATCH (name, keywords, description) AGAINST ('black' IN BOOLEAN MODE)
ORDER BY (name_match * 3 + keyword_match * 2 + description_match) DESC LIMIT 0,100;