全文检索
MongoDB自身的全文检索对中文支持不好,因为MongoDB建立全文索引时是词语建立的(不连续的字符) 因此需要使用ElasticSearch来实现 这里我们通过python的模块mongo-connector来同步mongo的数据到ES,再通过ES来进行查询
安装
安装elasticsearch
方式1:直接下载官方编译好的文件
https://github.com/elastic/elasticsearch https://www.elastic.co/downloads/elasticsearch
PS: 依赖Java8
wget -c https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.2-linux-x86_64.tar.gztar xf elasticsearch-7.6.2-linux-x86_64.tar.gzcd elasticsearch-7.6.2
方式2::通过官方提供的yum源来安装(需要root权限)
https://www.elastic.co/guide/en/elasticsearch/reference/7.6/rpm.html#rpm-repo
# 1 导入GPG Keyrpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch# 2 添加yum源cat > /etc/yum.repos.d/elasticsearch.repo << EOF[elasticsearch]name=Elasticsearch repository for 7.x packagesbaseurl=https://artifacts.elastic.co/packages/7.x/yumgpgcheck=1gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearchenabled=0autorefresh=1type=rpm-mdEOF# 3 指定yum源来安装yum install --enablerepo=elasticsearch elasticsearc
方式3:使用rpm安装
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.2-x86_64.rpmwget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.2-x86_64.rpm.sha512shasum -a 512 -c elasticsearch-7.6.2-x86_64.rpm.sha512rpm -ivh elasticsearch-7.6.2-x86_64.rpm
安装mongo-connector
pip install mongo-connector
安装elastic2-doc-manage
pip install elastic2-doc-manager[elastic5]
使用
MongoDB开启副本集
启动ES
/path/to/elasticsearch-7.6.2/bin/elaseticseach -d
数据同步
mongo-connector \-m localhost:27015 \-t localhost:9200 \-d elastic2_doc_manager
使用配置文件
https://github.com/yougov/mongo-connector/wiki/Configuration-Options
mongo-connector -c config.json
{"__comments": "__开头的字段会被忽略","mainAddress": "localhost:27015","docManagers": [{"docManager": "elastic2_doc_manager","targetURL": "localhost:9200","autoCommitInterval": 0,"bulkSize": 5000,"args": {"clientOptions": {"timeout": 100}}}]}
查询
curl localhost:9200/_cat/indices # 查看indices列表curl localhost:9200/pubmed?pretty #查看pubmed index的字段信息等curl localhost:9200/pubmed/_search?pretty # 全文检索curl 'localhost:9200/pubmed/article/5eb64effc3b702070a873076' # 查询_index/_type/_idcurl localhost:9200/pubmed/article/_search?prettycurl localhost:9200/pubmed/_search?pretty \-d '{"query": {"match": {"pmid": 123}}}' \-H "Content-Type: application/json"// URI查询curl 'localhost:9200/pubmed/article/_search?q=pmid:1234&pretty'
插件
ik
ES的默认分词器 standard 对中文分词不好(会拆成单个汉字)
ik分词器两种模式:
- ik_smart: 粗颗粒度
- ik_max_word: 细颗粒度
测试:
curl 'localhost:9200/_analyze?pretty' \-H "Content-Type: application/json" \-d '{"analyzer": "ik_smart", "text": "搜狗输入法"}'# tokens: ['搜狗', '输入法']
curl 'localhost:9200/_analyze?pretty' \-H "Content-Type: application/json" \-d '{"analyzer": "ik_max_word", "text": "搜狗输入法"}'# tokens: ['搜狗', '输入法', '输入', '法']
kibana
./bin/kibana # 默认配置文件 config/kibana.yml

