How to speed up this ElasticSearch query? -

- June 15, 2015

i'm trying build auto suggest based on docs title. if user types 'south', auto suggest suggest 'south korea' example. used shingle filter break title 2 words. here mapping :

{    "settings":{       "analysis":{          "filter":{             "suggestions_shingle":{                "type":"shingle",                "min_shingle_size":2,                "max_shingle_size":2             }          },          "analyzer":{             "suggestions":{                "tokenizer":"standard",                "filter":[                   "suggestions_shingle"                ]             }          }       }    },    "mappings":{       "docs":{          "properties":{             "docs_title":{                "type":"multi_field",                "fields":{                   "docs_title":{                      "type":"string"                   },                   "suggestions":{                      "type":"string",                      "analyzer":"suggestions",                      "search_analyzer":"simple"                   }                }             }          }       }    } }

and here query:

{    explain:true,    "aggs":{       "description_suggestions":{          "terms":{             "field":"docs_title.suggestions",             "size":10,             "include":"south .*"          }       }    },    size:0 }

here response query :

{     "took": 2764,     "timed_out": false,     "_shards": {         "total": 5,         "successful": 5,         "failed": 0     },     "hits": {         "total": 453526,         "max_score": 0,         "hits": []     },     "aggregations": {         "description_suggestions": {             "doc_count_error_upper_bound": 10,             "sum_other_doc_count": 2363,             "buckets": [                 {                     "key": "south korea",                     "doc_count": 274                 },                 {                     "key": "south india",                     "doc_count": 179                 },                 {                     "key": "south carolina",                     "doc_count": 179                 }             ]         }     } }

as can see, query took 2764 complete. how can speed query?

i thinking run aggregation query on last 2000 docs speed using filters. noticed elastic search ignoring filter , run aggs on docs. here query:

{    explain:true,    "aggs":{       "recent_suggestions":{          "filter":{             "range":{                "docs_date":{                   "gte":1453886958                }             }          },          "aggs":{             "description_suggestions":{                "terms":{                   "field":"docs_title.suggestions",                   "size":10,                   "include":"south .*"                }             }          }       }    },    size:0 }

and here response:

{     "took": 2216,     "timed_out": false,     "_shards": {         "total": 5,         "successful": 5,         "failed": 0     },     "hits": {         "total": 453526,         "max_score": 0,         "hits": []     },     "aggregations": {         "recent_suggestions": {             "doc_count": 27240,             "description_suggestions": {                 "doc_count_error_upper_bound": 0,                 "sum_other_doc_count": 173,                 "buckets": [             {                     "key": "south korea",                     "doc_count": 19                 },                 {                     "key": "south india",                     "doc_count": 17                 },                 {                     "key": "south carolina",                     "doc_count": 17                 }                 ]             }         }     } }

as can see, total hits same.

how make 2 queries faster?

i'm using aws elasticsaerch v1.5.2 , lucene v4.10.4 on single instance.

the problem here all documents considered aggregations expensive , hence takes time.

1) first query:

{   "query": {     "match": {       "docs_title": "south"     }   },   "aggs": {     "unique": {       "terms": {         "field": "docs_title.suggestions",         "size": 10,         "include": "(?i)south .*",         "execution_hint": "map"       }     }   },   "size": 0 }

we consider documents have south in them aggregations. did not specify query , default match all query. have added (?i) case insensitive flag in include matches south korea , south korea both.

2) second query:

again need narrow down set of documents satisfy our criteria aggregation.

{   "query": {     "filtered": {       "query": {         "match": {           "docs_title": "south"         }       },       "filter": {         "range": {           "docs_date": {             "gte": 1453886958           }         }       }     }   },   "aggs": {     "unique": {       "terms": {         "field": "docs_title.suggestions",         "size": 10,         "include": "(?i)south .*",         "execution_hint": "map"       }     }   },   "size": 0 }

filtering recent documents should done inside query , not in aggregation in case.

you should see considerable difference now. aggregation done on 450k docs , should smaller.

edit1 : this issue provides more details on why include/exclude costly on high cardinality fields doc_title.suggestions is(shingles increases more). @markharwood commented on issue

the root cause includeexclude.acceptedglobalordinals() method enumerates terms eagerly terms in index rather lazily in result set. high cardinality field can take long time

so basically, aggs going through terms in index. solution use "execution_hint": "map" in aggregation avoid loading global ordinals. more on that. there not 100% assurance. docs

please note elasticsearch ignore execution hint if not applicable , there no backward compatibility guarantee on these hints.

it considered when few documents match query case here.

note : might unrelated might want completion suggester, although works when string begins specific letters.

Search This Blog

JAV

How to speed up this ElasticSearch query? -

Comments

Post a Comment

Popular posts from this blog

ios - UITEXTFIELD InputView Uipicker not working in swift -

Hatching array of circles in AutoCAD using c# -

Python Pig Latin Translator -