一、插件网址
https://github.com/medcl/elasticsearch-analysis-ik/
版本对应关系
| IK version | ES version |
|---|---|
| master | 7.x -> master |
| 6.x | 6.x |
| 5.x | 5.x |
| 1.10.6 | 2.4.6 |
| 1.9.5 | 2.3.5 |
| 1.8.1 | 2.2.1 |
| 1.7.0 | 2.1.1 |
| 1.5.0 | 2.0.0 |
| 1.2.6 | 1.0.0 |
| 1.2.5 | 0.90.x |
| 1.1.3 | 0.20.x |
| 1.0.0 | 0.16.2 -> 0.19.0 |
二、安装插件
- 下载插件(选择对应的版本):https://github.com/medcl/elasticsearch-analysis-ik/releases
- 减压插件到 your-es-root/plugins/ik
- 重启es
三、测试插件分词效果
默认分词器
请求
GET http://127.0.0.1:9200/user5/_analyze
参数
{"text":"中华人民共和国"}
响应
{"tokens": [{"token": "中","start_offset": 0,"end_offset": 1,"type": "<IDEOGRAPHIC>","position": 0},{"token": "华","start_offset": 1,"end_offset": 2,"type": "<IDEOGRAPHIC>","position": 1},{"token": "人","start_offset": 2,"end_offset": 3,"type": "<IDEOGRAPHIC>","position": 2},{"token": "民","start_offset": 3,"end_offset": 4,"type": "<IDEOGRAPHIC>","position": 3},{"token": "共","start_offset": 4,"end_offset": 5,"type": "<IDEOGRAPHIC>","position": 4},{"token": "和","start_offset": 5,"end_offset": 6,"type": "<IDEOGRAPHIC>","position": 5},{"token": "国","start_offset": 6,"end_offset": 7,"type": "<IDEOGRAPHIC>","position": 6}]}
ik分词器
请求
GET http://127.0.0.1:9200/user5/_analyze?analyzer=ik_max_word&pretty=true
参数
{"text":"中华人民共和国"}
响应
{"tokens": [{"token": "中华人民共和国","start_offset": 0,"end_offset": 7,"type": "CN_WORD","position": 0},{"token": "中华人民","start_offset": 0,"end_offset": 4,"type": "CN_WORD","position": 1},{"token": "中华","start_offset": 0,"end_offset": 2,"type": "CN_WORD","position": 2},{"token": "华人","start_offset": 1,"end_offset": 3,"type": "CN_WORD","position": 3},{"token": "人民共和国","start_offset": 2,"end_offset": 7,"type": "CN_WORD","position": 4},{"token": "人民","start_offset": 2,"end_offset": 4,"type": "CN_WORD","position": 5},{"token": "共和国","start_offset": 4,"end_offset": 7,"type": "CN_WORD","position": 6},{"token": "共和","start_offset": 4,"end_offset": 6,"type": "CN_WORD","position": 7},{"token": "国","start_offset": 6,"end_offset": 7,"type": "CN_CHAR","position": 8}]}
四、ik分词器介绍
ik_max_word 和 ik_smart 什么区别?
- ik_max_word: 会将文本做最细粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”,会穷尽各种可能的组合,适合 Term Query;*
- ik_smart: 会做最粗粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”,适合 Phrase 查询。
