MySQL全文检索解析器ngram

MySQL全文检索解析器ngram自版本5.7.6可用。

ngram_token_size变量

ngram_token_size默认为2,表示按最小两个字来分词,比如"十六",单个词如"六"是检索不出来的,需要将ngram_token_size设置为1。

ngram_token_size变量的两种设置方式:

1.启动mysqld命令时指定

mysqld --ngram_token_size=2

2.修改MySQL配置文件

[mysqld] 
ngram_token_size=2

改完之后重启MySQL Server才能生效。

FULLTEXT索引 ngram parser

创建一个带全文检索索引的表

CREATE TABLE `content` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `data_id` int DEFAULT '0',
  `body` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
  PRIMARY KEY (`id`),
  KEY `data_id` (`data_id`) USING BTREE,
  FULLTEXT KEY `body` (`body`) WITH PARSER `ngram`
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

建表时如果没创建全文索引,还可以新增

ALTER TABLE content ADD FULLTEXT INDEX body (body) WITH PARSER ngram;

# 等价于

CREATE FULLTEXT INDEX body ON body (body) WITH PARSER ngram;

如果所使用的MySQL版本不支持ngram,则会报错

1128 - Function 'ngram' is not defined

这个时候只能升级到较高版本了(5.7.6+)。

查询

1. 自然语言模式检索

包含论语孟子的查询

select *, MATCH (body) AGAINST ('论语 孟子' IN NATURAL LANGUAGE MODE) as score from content where MATCH (body) AGAINST ('论语 孟子' IN NATURAL LANGUAGE MODE) order by score desc

score是相似度打分。

2. Boolean模式检索

包含论语, 同时包含孟子的查询

select *, MATCH (body) AGAINST ('+论语 +孟子' IN BOOLEAN MODE) as score from content where MATCH (body) AGAINST ('+论语 +孟子' IN BOOLEAN MODE) order by score desc

包含论语,但不包含孟子的查询

select *, MATCH (body) AGAINST ('+论语 -孟子' IN BOOLEAN MODE) as score from content where MATCH (body) AGAINST ('+论语 -孟子' IN BOOLEAN MODE) order by score desc

包含论语,或包含孟子的查询

select *, MATCH (body) AGAINST ('论语 孟子' IN BOOLEAN MODE) as score from content where MATCH (body) AGAINST ('论语 孟子' IN BOOLEAN MODE) order by score desc

Leave a Comment

豫ICP备19001387号-1