3

I have a mapping like this:

"properties": {
    "id": {"type": "long", "index": "not_analyzed"},
    "name": {"type": "string", "index": "not_analyzed"},
    "skills": {"type": "string", "index": "not_analyzed"}
}

I wanna store students' profiles in elasticsearch using the given mapping. skills is a list of computer skills they specified on their profiles (python, javascript, ...).

Given a skill set like ['html', 'css', 'sass', 'javascript', 'django', 'bootstrap', 'angularjs', 'backbone'], I wanna find all profiles that have at least 3 of the skills in this skill set. I am not interested in knowing which skills they have in common with our desired list, just interested in the count. Is there a way to do this in elasticsearch?

AliBZ
  • 4,039
  • 12
  • 45
  • 67
  • If I have time later I'm going to dig into this more but I would checkout http://www.elastic.co/guide/en/elasticsearch/reference/1.x/query-dsl-function-score-query.html. Idea being that maybe you can give higher scores to documents that have the most matches and then stop at a given threshold (say a score that maps to 3 matches). – Andrew White Apr 07 '15 at 20:30

2 Answers2

3

There might be a better way I'm not thinking of, but you can do it with a script filter.

I set up a simplified version of your index, with a few docs:

PUT /test_index
{
   "settings": {
      "number_of_shards": 1
   },
   "mappings": {
      "doc": {
         "properties": {
            "skills": {
               "type": "string",
               "index": "not_analyzed"
            }
         }
      }
   }
}

POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"skills":["html","css","javascript"]}
{"index":{"_id":2}}
{"skills":["bootstrap", "angularjs", "backbone"]}
{"index":{"_id":3}}
{"skills":["python", "javascript", "ruby","java"]}

Then ran this query:

POST /test_index/_search
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "script": {
               "script": "count=0; for(s: doc['skills'].values){ for(x: skills){ if(s == x){ count +=1 } } } count >= 3",
               "params": {
                  "skills": ["html", "css", "sass", "javascript", "django", "bootstrap", "angularjs", "backbone"]
               }
            }
         }
      }
   }
}

and got back what I expected:

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 1,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "1",
            "_score": 1,
            "_source": {
               "skills": [
                  "html",
                  "css",
                  "javascript"
               ]
            }
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 1,
            "_source": {
               "skills": [
                  "bootstrap",
                  "angularjs",
                  "backbone"
               ]
            }
         }
      ]
   }
}

Here's the code all together:

http://sense.qbox.io/gist/1018a01f1df29cb793ea15661f22bc8b25ed3476

Sloan Ahrens
  • 8,588
  • 2
  • 29
  • 31
  • Looks promising. I'm gonna wait to see if there is an answer without raw scripting. If not, I will be more than happy to accept your answer. Thanks. – AliBZ Apr 07 '15 at 20:04
2

One could use the query string with minimum_should_match option

Example:

POST <index>/_search 
{
        "query": {
            "filtered": {
               "filter": {
                   "query": { 
                        "query_string": {
                            "default_field": "skills",
                            "query": "html css sass javascript django bootstrap angularjs backbone \"ruby on rails\" ",
                            "minimum_should_match" : "3"
                        }
                   }
               }
            }
        }  
}
keety
  • 17,231
  • 4
  • 51
  • 56
  • 1
    What about skills with more than one word like `ruby on rails`? – AliBZ Apr 08 '15 at 18:17
  • 1
    @AliBZ good point i have updated the example query to incorporate this use case essentially you could use the phrase match feature of query string specified here http://stackoverflow.com/questions/24550103/how-to-do-multiple-match-or-match-phrase-values-in-elasticsearch/24559886#24559886 – keety Apr 08 '15 at 19:23