- 一、聚合分析简介
- 二、指标聚合
- 三、桶聚合
一、聚合分析简介
1. ES聚合分析是什么?
聚合分析是数据库中重要的功能特性,完成对一个查询的数据集中数据的聚合计算,如:找出某字段(或计算表达式的结果)的最大值、最小值,计算和、平均值等。ES作为搜索引擎兼数据库,同样提供了强大的聚合分析能力。
对一个数据集求最大、最小、和、平均值等指标的聚合,在ES中称为指标聚合 metric
而关系型数据库中除了有聚合函数外,还可以对查询出的数据进行分组group by,再在组上进行指标聚合。在 ES 中group by 称为分桶,桶聚合 bucketing
ES中还提供了矩阵聚合(matrix)、管道聚合(pipleline),但还在完善中。
2. ES聚合分析查询的写法
在查询请求体中以aggregations节点按如下语法定义聚合分析:
"aggregations" : {"<aggregation_name>" : { <!--聚合的名字 -->"<aggregation_type>" : { <!--聚合的类型 --><aggregation_body> <!--聚合体:对哪些字段进行聚合 -->}[,"meta" : { [<meta_data_body>] } ]? <!--元 -->[,"aggregations" : { [<sub_aggregation>]+ } ]? <!--在聚合里面在定义子聚合 -->}[,"<aggregation_name_2>" : { ... } ]*<!--聚合的名字 -->}
3. 聚合分析的值来源
二、指标聚合
1. max min sum avg
示例1:查询所有记录中年龄的最大值
POST /book1/_search?pretty{"size": 0,"aggs": {"maxage": {"max": {"field": "age"}}}}
结果1:
{"took": 4,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 41,"max_score": 0,"hits": []},"aggregations": {"maxage": {"value": 54}}}
示例2:加上查询条件,查询名字包含’test’的年龄最大值:
POST /book1/_search?pretty{"query":{"term":{"name":"test"}},"size": 2,"sort": [{"age": {"order": "desc"}}],"aggs": {"maxage": {"max": {"field": "age"}}}}
结果2:
{"took": 3,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 5,"max_score": null,"hits": [{"_index": "book1","_type": "english","_id": "6IUkUmUBRzBxBrDgFok2","_score": null,"_source": {"name": "test goog my money","age": [14,54,45,34],"class": "dsfdsf","addr": "中国"},"sort": [54]},{"_index": "book1","_type": "english","_id": "54UiUmUBRzBxBrDgfIl9","_score": null,"_source": {"name": "test goog my money","age": [11,13,14],"class": "dsfdsf","addr": "中国"},"sort": [14]}]},"aggregations": {"maxage": {"value": 54}}}
示例3:值来源于脚本,查询所有记录的平均年龄是多少,并对平均年龄加10
POST /book1/_search?pretty{"size":0,"aggs": {"avg_age": {"avg": {"script": {"source": "doc.age.value"}}},"avg_age10": {"avg": {"script": {"source": "doc.age.value + 10"}}}}}
结果3:
{"took": 3,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 41,"max_score": 0,"hits": []},"aggregations": {"avg_age": {"value": 7.585365853658536},"avg_age10": {"value": 17.585365853658537}}}
示例4:指定field,在脚本中用_value 取字段的值
POST /book1/_search?pretty{"size":0,"aggs": {"sun_age": {"sum": {"field":"age","script": {"source": "_value * 2"}}}}}
结果4:
{"took": 4,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 41,"max_score": 0,"hits": []},"aggregations": {"sun_age": {"value": 942}}}
示例5:为没有值字段指定值。如未指定,缺失该字段值的文档将被忽略:
POST /book1/_search?pretty{"size":0,"aggs": {"sun_age": {"avg": {"field":"age","missing":15}}}}
结果5:
{"took": 12,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 41,"max_score": 0,"hits": []},"aggregations": {"sun_age": {"value": 12.847826086956522}}}
2. 文档计数 count
示例1:统计银行索引book下年龄为12的文档数量
POST book1/english/_count{"query":{"match":{"age":12}}}
结果1:
{"count": 16,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0}}
3. Value count 统计某字段有值的文档数
示例1:
POST /book1/_search?size=0{"aggs":{"age_count":{"value_count":{"field":"age"}}}}
结果1:
{"took": 1,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 41,"max_score": 0,"hits": []},"aggregations": {"age_count": {"value": 38}}}
4. cardinality 值去重计数
示例1:
POST /book1/_search?size=0{"aggs":{"age_count":{"value_count":{"field":"age"}},"name_count":{"cardinality":{"field":"age"}}}}
结果1:
{"took": 16,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 41,"max_score": 0,"hits": []},"aggregations": {"name_count": {"value": 11},"age_count": {"value": 38}}}
5. stats 统计 count max min avg sum 5个值
示例1:
POST /book1/_search?size=0{"aggs":{"age_count":{"stats":{"field":"age"}}}}
结果1:
{"took": 12,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 41,"max_score": 0,"hits": []},"aggregations": {"age_count": {"count": 38,"min": 1,"max": 54,"avg": 12.394736842105264,"sum": 471}}}
6. Extended stats
高级统计,比stats多4个统计结果: 平方和、方差、标准差、平均值加/减两个标准差的区间。
示例1:
POST /book1/_search?size=0{"aggs":{"age_stats":{"extended_stats":{"field":"age"}}}}
结果1:
{"took": 8,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 41,"max_score": 0,"hits": []},"aggregations": {"age_stats": {"count": 38,"min": 1,"max": 54,"avg": 12.394736842105264,"sum": 471,"sum_of_squares": 11049,"variance": 137.13365650969527,"std_deviation": 11.710408041981085,"std_deviation_bounds": {"upper": 35.81555292606743,"lower": -11.026079241856905}}}}
7. Percentiles 占比百分位对应的值统计
示例1:
对指定字段(脚本)的值按从小到大累计每个值对应的文档数的占比(占所有命中文档数的百分比),返回指定占比比例对应的值。默认返回[ 1, 5, 25, 50, 75, 95, 99 ]分位上的值。如下中间的结果,可以理解为:占比为50%的文档的age值 <= 12,或反过来:age<=12的文档数占总命中文档数的50%。
POST /book1/_search?size=0{"aggs":{"age_percentiles":{"percentiles":{"field":"age"}}}}
结果1:
{"took": 16,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 41,"max_score": 0,"hits": []},"aggregations": {"age_percentiles": {"values": {"1.0": 1,"5.0": 1,"25.0": 1,"50.0": 12,"75.0": 13,"95.0": 40.600000000000016,"99.0": 54}}}}
示例2:指定分位值(占比50%,96%,99%的范围值分别是多少)
POST /book1/_search?size=0{"aggs":{"age_percentiles":{"percentiles":{"field":"age","percents" : [50,96,99]}}}}
结果2:
{"took": 6,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 41,"max_score": 0,"hits": []},"aggregations": {"age_percentiles": {"values": {"50.0": 12,"96.0": 44.779999999999966,"99.0": 54}}}}
说明:50%的数值<= 12, 96%的数值<= 96%, 99%的数值<= 54
8. Percentiles rank 统计值小于等于指定值的文档占比
示例1:统计年龄小于25和30的文档的占比,和第7项相反
POST /book1/_search?size=0{"aggs":{"aggs_perc_rank":{"percentile_ranks":{"field":"age","values" : [12,35]}}}}
结果1:
{"took": 8,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 41,"max_score": 0,"hits": []},"aggregations": {"aggs_perc_rank": {"values": {"12.0": 71.05263157894737,"35.0": 92.76315789473685}}}}
结果说明:年龄小于12的文档占比为71%,年龄小于35的文档占比为92%,
9. Geo Bounds aggregation 求文档集中的地理位置坐标点的范围
参考官网链接:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-geobounds-aggregation.html
10. Geo Centroid aggregation 求地理位置中心点坐标值
参考官网链接:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-geocentroid-aggregation.html
三、桶聚合
1. Terms Aggregation 根据字段值项分组聚合
示例1:
POST /book1/_search?size=0{"aggs":{"age_terms":{"terms":{"field":"age"}}}}
说明:相当于group by age
结果1:
{"took": 4,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 41,"max_score": 0,"hits": []},"aggregations": {"age_terms": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 1,"buckets": [{"key": 12,"doc_count": 16},{"key": 1,"doc_count": 11},{"key": 13,"doc_count": 2},{"key": 14,"doc_count": 2},{"key": 11,"doc_count": 1},{"key": 16,"doc_count": 1},{"key": 21,"doc_count": 1},{"key": 33,"doc_count": 1},{"key": 34,"doc_count": 1},{"key": 45,"doc_count": 1}]}}}
结果说明:
“doc_count_error_upper_bound”: 0:文档计数的最大偏差值
“sum_other_doc_count”: 1:未返回的其他文档数,不在桶里的文档数量
默认情况下返回按文档计数从高到低的前10个分组:
示例2:sizz可以指定返回多少组数
POST /book1/_search?size=0{"aggs":{"age_terms":{"terms":{"field":"age","size":5}}}}
结果2:
{"took": 4,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 41,"max_score": 0,"hits": []},"aggregations": {"age_terms": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 6,"buckets": [{"key": 12,"doc_count": 16},{"key": 1,"doc_count": 11},{"key": 13,"doc_count": 2},{"key": 14,"doc_count": 2},{"key": 11,"doc_count": 1}]}}}
示例3:每个分组上显示偏差值
POST /book1/_search?size=0{"aggs":{"age_terms":{"terms":{"field":"age","size":5,"show_term_doc_count_error": true}}}}
结果3:
{"took": 5,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 41,"max_score": 0,"hits": []},"aggregations": {"age_terms": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 6,"buckets": [{"key": 12,"doc_count": 16,"doc_count_error_upper_bound": 0},{"key": 1,"doc_count": 11,"doc_count_error_upper_bound": 0},{"key": 13,"doc_count": 2,"doc_count_error_upper_bound": 0},{"key": 14,"doc_count": 2,"doc_count_error_upper_bound": 0},{"key": 11,"doc_count": 1,"doc_count_error_upper_bound": 0}]}}}
示例4:shard_size 指定每个分片上返回多少个分组
POST /book1/_search?size=0{"aggs":{"age_terms":{"terms":{"field":"age","size":3,"shard_size": 20}}}}
结果4:
{"took": 3,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 41,"max_score": 0,"hits": []},"aggregations": {"age_terms": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 9,"buckets": [{"key": 12,"doc_count": 16},{"key": 1,"doc_count": 11},{"key": 13,"doc_count": 2}]}}}
示例5:根据分组值”_key”排序
POST /book1/_search?size=0{"aggs":{"age_terms":{"terms":{"field":"age","size":3,"order":{"_key":"desc"}}}}}
结果5:
{"took": 6,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 41,"max_score": 0,"hits": []},"aggregations": {"age_terms": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 35,"buckets": [{"key": 54,"doc_count": 1},{"key": 45,"doc_count": 1},{"key": 34,"doc_count": 1}]}}}
示例6:根据文档计数”_count”排序
POST /book1/_search?size=0{"aggs":{"age_terms":{"terms":{"field":"age","size":3,"order":{"_count":"desc"}}}}}
结果6:
{"took": 91,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 41,"max_score": 0,"hits": []},"aggregations": {"age_terms": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 9,"buckets": [{"key": 12,"doc_count": 16},{"key": 1,"doc_count": 11},{"key": 13,"doc_count": 2}]}}}
示例7:取分组指标值排序
POST /book1/_search?size=0{"aggs":{"age_terms":{"terms":{"field":"age","order":{"max_age":"desc"}},"aggs":{"max_age":{"max":{"field":"age"}},"min_age":{"min":{"field":"age"}}}}}}
说明:先根据age 分组,再计算每个组的最大最小值,最后根据最大值倒排
示例8:筛选分组-正则表达式匹配值
POST book1/_search?size=0{"aggs":{"tags":{"terms":{"field":"name","include":"里*","exclude":"test*"}}}}
结果8:
{"took": 22,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 41,"max_score": 0,"hits": []},"aggregations": {"tags": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "里","doc_count": 13}]}}}
示例9:筛选分组-指定值列表
POST book1/_search?size=0{"aggs":{"Chinese":{"terms":{"field":"name","include":["里","国"]}},"Test":{"terms":{"field":"name","exclude":["test","the"]}}}}
结果9:
{"took": 23,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 41,"max_score": 0,"hits": []},"aggregations": {"Test": {"doc_count_error_upper_bound": 6,"sum_other_doc_count": 559,"buckets": [{"key": "里","doc_count": 12},{"key": "否","doc_count": 11},{"key": "a","doc_count": 7},{"key": "default","doc_count": 7},{"key": "document","doc_count": 7},{"key": "for","doc_count": 7},{"key": "absolute","doc_count": 6},{"key": "account","doc_count": 6},{"key": "accurate","doc_count": 6},{"key": "documents","doc_count": 6}]},"Chinese": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "国","doc_count": 4}]}}}
示例10:根据脚本计算值分组
POST book1/_search?size=0{"aggs":{"name":{"terms":{"script":{"source":"doc['age'].value + doc.age.value","lang": "painless"}}}}}
说明:脚本取值的方式doc[‘age’].value 或者 doc.age.value
结果10:
{"took": 18,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 41,"max_score": 0,"hits": []},"aggregations": {"name": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "24","doc_count": 16},{"key": "2","doc_count": 11},{"key": "0","doc_count": 8},{"key": "22","doc_count": 1},{"key": "26","doc_count": 1},{"key": "28","doc_count": 1},{"key": "32","doc_count": 1},{"key": "42","doc_count": 1},{"key": "66","doc_count": 1}]}}}
2. filter Aggregation 对满足过滤查询的文档进行聚合计算
示例1:在查询命中的文档中选取符合过滤条件的文档进行聚合,先过滤再聚合(和上面的示例9示例9:筛选分组,区分开:先聚合再过滤)
POST book1/_search?size=0{"aggs":{"age_terms":{"filter":{"match":{"name":"test"}},"aggs":{"avg_age":{"avg":{"field":"age" }}}}}}
结果1:
{"took": 152,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 41,"max_score": 0,"hits": []},"aggregations": {"age_terms": {"doc_count": 5,"avg_age": {"value": 19.9}}}}
3. Filters Aggregation 多个过滤组聚合计算
示例1:分别统计包含‘test’,和‘里’的文档的个数
POST book1/_search?size=0{"aggs":{"age_terms":{"filters":{"filters":{"test":{"match":{"name":"test"}},"china":{"match":{"name":"里"}}}}}}}
结果:
{"took": 3,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 41,"max_score": 0,"hits": []},"aggregations": {"age_terms": {"buckets": {"china": {"doc_count": 13},"test": {"doc_count": 5}}}}}
例如:日志中选出 error和warning日志的个数,作日志预警
GET logs/_search{"size": 0,"aggs": {"messages": {"filters": {"filters": {"errors": {"match": {"body": "error"}},"warnings": {"match": {"body": "warning"}}}}}}}
示例2:为其他值组指定key
POST book1/_search?size=0{"aggs":{"age_terms":{"filters":{"other_bucket_key": "other_messages","filters":{"test":{"match":{"name":"test"}},"china":{"match":{"name":"里"}}}}}}}
结果2:
{"took": 9,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 41,"max_score": 0,"hits": []},"aggregations": {"age_terms": {"buckets": {"china": {"doc_count": 13},"test": {"doc_count": 5},"other_messages": {"doc_count": 23}}}}}
4. Range Aggregation 范围分组聚合
示例1:
POST book1/_search?size=0{"aggs":{"age_range":{"range":{"field":"age","keyed":true,"ranges":[{"to":20,"key":"TW"},{"from":25,"to":40,"key":"TH"},{"from":60,"key":"SIX"}]}}}}
结果1:
{"took": 3,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 41,"max_score": 0,"hits": []},"aggregations": {"age_range": {"buckets": {"TW": {"to": 20,"doc_count": 31},"TH": {"from": 25,"to": 40,"doc_count": 2},"SIX": {"from": 60,"doc_count": 0}}}}}
5. Date Range Aggregation 时间范围分组聚合
示例1:
POST /bank/_search?size=0{"aggs": {"range": {"date_range": {"field": "date","format": "MM-yyy","ranges": [{"to": "now-10M/M"},{"from": "now-10M/M"}]}}}}
结果1:
{"took": 115,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 1000,"max_score": 0,"hits": []},"aggregations": {"range": {"buckets": [{"key": "*-2017-08-01T00:00:00.000Z","to": 1501545600000,"to_as_string": "2017-08-01T00:00:00.000Z","doc_count": 0},{"key": "2017-08-01T00:00:00.000Z-*","from": 1501545600000,"from_as_string": "2017-08-01T00:00:00.000Z","doc_count": 0}]}}}
6. Date Histogram Aggregation 时间直方图(柱状)聚合
就是按天、月、年等进行聚合统计。可按 year (1y), quarter (1q), month (1M), week (1w), day (1d), hour (1h), minute (1m), second (1s) 间隔聚合或指定的时间间隔聚合。
示例1:
POST /bank/_search?size=0{"aggs": {"sales_over_time": {"date_histogram": {"field": "date","interval": "month"}}}}
结果1:
{"took": 9,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 1000,"max_score": 0,"hits": []},"aggregations": {"sales_over_time": {"buckets": []}}}
7. Missing Aggregation 缺失值的桶聚合
示例:统计没有值的文档的数量
POST /book/_search?size=0{"aggs" : {"account_without_a_age" : {"missing" : { "field" : "age" }}}}
结果1:
{"took": 10,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 41,"max_score": 0,"hits": []},"aggregations": {"account_without_age": {"doc_count": 8}}}

