Stop Token Filter（Stop 词元过滤器）

Stop Token Filter（Stop 词元过滤器）

原文链接 : https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-stop-tokenfilter.html

译文链接 : http://www.apache.wiki/pages/viewpage.action?pageId=10028519

stop 类型的词元过滤器，用于从词元流中移除stop words。

以下是 stop 类型的词元过滤器的可选设置：

stopwords 参数接受一个数组类型的无效词：

PUT /my_index
{
    "settings": {
        "analysis": {
            "filter": {
                "my_stop": {
                    "type":       "stop",
                    "stopwords": ["and", "is", "the"]
                }
            }
        }
    }
}

或预定义的语言特定列表：

PUT /my_index
{
    "settings": {
        "analysis": {
            "filter": {
                "my_stop": {
                    "type":       "stop",
                    "stopwords":  "_english_"
                }
            }
        }
    }
}

Elasticsearch 提供以下预定义语言列表：

_arabic_, _armenian_, _basque_, _brazilian_, _bulgarian_, _catalan_, _czech_, _danish_, _dutch_, _english_, _finnish_, _french_, _galician_, _german_, _greek_, _hindi_, _hungarian_,_indonesian_, _irish_, _italian_, _latvian_, _norwegian_, _persian_, _portuguese_, _romanian_, _russian_, _sorani_, _spanish_, _swedish_, _thai_, _turkish_.

空的无效词列表（禁用无效词）使用：\_none_。