Find concatenate words in Elasticsearch -
say have indexed data
song:{ title:"laser game" }
but user searching
lasergame
how go mapping/indexing/querying this?
this kind of tricky problem.
1) guess effective way might use compound token filter, word list
made of words think user might concatenate.
"settings": { "analysis": { "analyzer": { "concatenate_split": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "myfilter" ] } }, "filter": { "myfilter": { "type": "dictionary_decompounder", "word_list": [ "laser", "game", "lean", "on", "die", "hard" ] } } } }
after applying analyzer, lasergame split laser , game along lasergame, give results has any of words.
2) approach concatenating whole title pattern replace char filter replacing spaces.
{ "index" : { "analysis" : { "char_filter" : { "my_pattern":{ "type":"pattern_replace", "pattern":"\\s+", "replacement":"" } }, "analyzer" : { "custom_with_char_filter" : { "tokenizer" : "standard", "char_filter" : ["my_pattern"] } } } } }
you need use multi fields
approach, pattern
, laser game
indexed lasergame , query work. here problem laser game play indexed lasegameplay , search lasergame wont return might want consider using prefix query
or wildcard query
this.
3) might not make sense use synonym filter, if think users concatenating words.
hope helps!
Comments
Post a Comment