How to speed up this ElasticSearch query? -
i'm trying build auto suggest based on docs title. if user types 'south', auto suggest suggest 'south korea' example. used shingle filter break title 2 words. here mapping :
{ "settings":{ "analysis":{ "filter":{ "suggestions_shingle":{ "type":"shingle", "min_shingle_size":2, "max_shingle_size":2 } }, "analyzer":{ "suggestions":{ "tokenizer":"standard", "filter":[ "suggestions_shingle" ] } } } }, "mappings":{ "docs":{ "properties":{ "docs_title":{ "type":"multi_field", "fields":{ "docs_title":{ "type":"string" }, "suggestions":{ "type":"string", "analyzer":"suggestions", "search_analyzer":"simple" } } } } } } }
and here query:
{ explain:true, "aggs":{ "description_suggestions":{ "terms":{ "field":"docs_title.suggestions", "size":10, "include":"south .*" } } }, size:0 }
here response query :
{ "took": 2764, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 453526, "max_score": 0, "hits": [] }, "aggregations": { "description_suggestions": { "doc_count_error_upper_bound": 10, "sum_other_doc_count": 2363, "buckets": [ { "key": "south korea", "doc_count": 274 }, { "key": "south india", "doc_count": 179 }, { "key": "south carolina", "doc_count": 179 } ] } } }
as can see, query took 2764 complete. how can speed query?
i thinking run aggregation query on last 2000 docs speed using filters. noticed elastic search ignoring filter , run aggs on docs. here query:
{ explain:true, "aggs":{ "recent_suggestions":{ "filter":{ "range":{ "docs_date":{ "gte":1453886958 } } }, "aggs":{ "description_suggestions":{ "terms":{ "field":"docs_title.suggestions", "size":10, "include":"south .*" } } } } }, size:0 }
and here response:
{ "took": 2216, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 453526, "max_score": 0, "hits": [] }, "aggregations": { "recent_suggestions": { "doc_count": 27240, "description_suggestions": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 173, "buckets": [ { "key": "south korea", "doc_count": 19 }, { "key": "south india", "doc_count": 17 }, { "key": "south carolina", "doc_count": 17 } ] } } } }
as can see, total hits same.
how make 2 queries faster?
i'm using aws elasticsaerch v1.5.2 , lucene v4.10.4 on single instance.
the problem here all documents considered aggregations expensive , hence takes time.
1) first query:
{ "query": { "match": { "docs_title": "south" } }, "aggs": { "unique": { "terms": { "field": "docs_title.suggestions", "size": 10, "include": "(?i)south .*", "execution_hint": "map" } } }, "size": 0 }
we consider documents have south
in them aggregations. did not specify query , default match all
query. have added (?i)
case insensitive flag in include
matches south korea , south korea both.
2) second query:
again need narrow down set of documents satisfy our criteria aggregation.
{ "query": { "filtered": { "query": { "match": { "docs_title": "south" } }, "filter": { "range": { "docs_date": { "gte": 1453886958 } } } } }, "aggs": { "unique": { "terms": { "field": "docs_title.suggestions", "size": 10, "include": "(?i)south .*", "execution_hint": "map" } } }, "size": 0 }
filtering recent documents should done inside query , not in aggregation in case.
you should see considerable difference now. aggregation done on 450k docs , should smaller.
edit1 : this issue provides more details on why include/exclude
costly on high cardinality fields doc_title.suggestions
is(shingles increases more). @markharwood commented on issue
the root cause includeexclude.acceptedglobalordinals() method enumerates terms eagerly terms in index rather lazily in result set. high cardinality field can take long time
so basically, aggs going through terms in index. solution use "execution_hint": "map"
in aggregation avoid loading global ordinals. more on that. there not 100% assurance. docs
please note elasticsearch ignore execution hint if not applicable , there no backward compatibility guarantee on these hints.
it considered when few documents match query case here.
note : might unrelated might want completion suggester, although works when string begins specific letters.
Comments
Post a Comment