0

When I execute a simple search query on an email it does not return anything to me, unless I remove what follows the "@", why?

I wish to make queries on the e-mails in fuzzy and autocompletion.

ELASTICSEARCH INFOS:

{
  "name" : "ZZZ",
  "cluster_name" : "YYY",
  "cluster_uuid" : "XXX",
  "version" : {
    "number" : "6.5.2",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "WWW",
    "build_date" : "2018-11-29T23:58:20.891072Z",
    "build_snapshot" : false,
    "lucene_version" : "7.5.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

MAPPING :

PUT users
{
  "mappings":
  {
    "_doc": { "properties": { "mail": { "type": "text" } } }
  }
}

ALL DATAS :

[
    { "mail": "firstname.lastname@company.com" },
    { "mail": "john.doe@company.com" }
]

QUERY WORKS :

Term request works but mail == "firstname.lastname@company.com" and not "firstname.lastname"...

QUERY :
GET users/_search
{ "query": { "term": { "mail": "firstname.lastname" } }}

RETURN :
{
  "took": 7,
  "timed_out": false,
  "_shards": { "total": 6, "successful": 6, "skipped": 0, "failed": 0 },
  "hits": {
    "total": 1,
    "max_score": 4.336203,
    "hits": [
      {
        "_index": "users",
        "_type": "_doc",
        "_id": "H1dQ4WgBypYasGfnnXXI",
        "_score": 4.336203,
        "_source": {
          "mail": "firstname.lastname@company.com"
        }
      }
    ]
  }
}

QUERY NOT WORKS :

QUERY :
GET users/_search
{ "query": { "term": { "mail": "firstname.lastname@company.com" } }}

RETURN :
{
  "took": 0,
  "timed_out": false,
  "_shards": { "total": 6, "successful": 6, "skipped": 0, "failed": 0 },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

SOLUTION :

Change mapping (reindex after mapping changes) with uax_url_email analyzer for mails.

PUT users
{
  "settings":
  {
    "index": { "analysis": { "analyzer": { "mail": { "tokenizer":"uax_url_email" } } } }
  }
  "mappings":
  {
    "_doc": { "properties": { "mail": { "type": "text", "analyzer":"mail" } } }
  }
}
Liberateur
  • 1,337
  • 1
  • 14
  • 33

1 Answers1

1

If you use no other tokenizer for your indexed text field, it will use the standard tokenizer, which tokenizes on the @ symbol [I don't have a source on this, but there's proof below].

If you use a term query rather than a match query then that exact term will be searched for in the inverted index elasticsearch match vs term query.

Your inverted index looks like this

GET users/_analyze
{
  "text": "firstname.lastname@company.com"
}

{
  "tokens": [
    {
      "token": "firstname.lastname",
      "start_offset": 0,
      "end_offset": 18,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "company.com",
      "start_offset": 19,
      "end_offset": 30,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

To resolve this you could specify your own analyzer for the mail field or you could use the match query, which will analyze your searched text just like how it analyzes the indexed text.

GET users/_search
{
  "query": {
    "match": {
      "mail": "firstname.lastname@company.com"
    }
  }
}
Kosi
  • 263
  • 4
  • 16
  • 1
    I can not use match because the mail must be exactly the same, but I will look for more information about tokenizer. thank you very much. – Liberateur Feb 21 '19 at 07:16
  • You've probably figured it out already but the uax_url_email tokenizer might be what you want https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-uaxurlemail-tokenizer.html – Kosi Feb 21 '19 at 15:06
  • Yes it is, I will add the solution to my post, thank you again – Liberateur Feb 21 '19 at 15:53