ElasticSearch 2.4.1 and Kuromoji plugin with specify filed in search query

Question

I've just used ElaticSearch(version 2.4.1) in my project for 2 weeks ago, and I have a problem if I specify field in the query string.
I want to use Kuromoji plugin and n-gram tokenizer to search Japanese data.

In my query, if I don't specify the field (for example: "Content"), I receive 2 records in the result.

{
    "query" : {
        "bool" : {
            "must": {
                "query_string": {
                    "query":"Software"
                 /*,"fields":["Content"] <-- not specify this field*/
                }
            }
        }
    }
}

But when I use the field "Content" in above query, the result has no record. (In my project, I want to search on the "Content" field.)

I also use the attribute "highlight" in step 1, but the result doesn't contain "highlight" block

{...
    "highlight": {
        "pre_tags" : ["<tag1>"],
        "post_tags" : ["</tag1>"],
        "fields" : {
            "*" : {} /* or use "_all" */
        }
    }
}

I want to ask: in step 2 (above), what field is specified in the query string? product.Content, or something else?

If I don't use Kuromoji plugin, the result of query in step 2 has 2 records. So I think the Kuromoji plugin is related to the result. Can anybody help me with this problem?

Here is my mappings and config in yaml:

{...
    "mappings": {
        "product" : {
            "properties" : {
                "Content" : {
                    "index": "not_analyzed",
                    "search_analyzer": "ja",
                    "analyzer": "ja",
                    "type": "string",
                    "store": true
                } ...
            }
        }
    }
}

index :
 analysis :
  analyzer :
   ja :
    type : custom
    tokenizer : ja_tokenizer
    char_filter : [
     html_strip,
     kuromoji_iteration_mark
    ]
    filter : [
     lowercase,
     cjk_width,
     katakana_stemmer,
     kuromoji_part_of_speech
     ]
    ja_ngram :
     type : custom
     tokenizer : ngram_ja_tokenizer
     char_filter : [html_strip]
     filter : [
      cjk_width,
      lowercase
     ]
  tokenizer :
   ja_tokenizer :
    type : kuromoji_tokenizer
    mode : search
    user_dictionary : userdict_ja.txt
   ngram_ja_tokenizer :
    type : nGram
    min_gram : 2
    max_gram : 3
    token_chars : [letter, digit]
  filter :
   katakana_stemmer :
   type : kuromoji_stemmer

score 0 · Answer 1 · answered Nov 01 '16 at 10:04

0

I found the problem in my mapping. In my mapping I use "Content": {"index":"not_analyzed"}, so it cannot search on "Content" field. I change to {"index" : "analyzed"} and it's fixed the problem.

answered Nov 01 '16 at 10:04

Tinh Huynh

1

ElasticSearch 2.4.1 and Kuromoji plugin with specify filed in search query

1 Answers1