MongoDB vs MySQL Performance - Simple Query

Question

I am doing a comparison of mongodb with respect to mysql and imported the mysql data into the mongodb collection (>500000 records). the collection looks like this:

{
    "_id" : ObjectId(""),
    "idSequence" : ,
    "TestNumber" : ,
    "TestName" : "",
    "S1" : ,
    "S2" : ,
    "Slottxt" : "",
    "DUT" : ,
    "DUTtxt" : "",
    "DUTver" : "",
    "Voltage" : ,
    "Temperature" : ,
    "Rate" : ,
    "ParamX" : "",
    "ParamY" : "",
    "Result" : ,
    "TimeStart" : new Date(""),
    "TimeStop" : new Date(""),
    "Operator" : "",
    "ErrorNumber" : ,
    "ErrorText" : "",
    "Comments" : "",
    "Pos" : ,
    "SVNURL" : "",
    "SVNRev" : ,
    "Valid" : 
}

When comparing the queries (which both return 15 records):

mysql -> SELECT TestNumber FROM db WHERE Valid=0 AND DUT=68 GROUP BY TestNumber

with

mongodb -> db.results.distinct("TestNumber", {Valid:0, DUT:68}).sort()

The results are equivalent, but it takes (iro) 17secs from mongodb, compared with 0.03 secs from mysql.

I appreciate that it is difficult to make a comparison between the two db architectures and i further appreciate one of the skills of mongodb admin is to organise the data structure accordingly (therefore it is not a fair test to just import the mysql structure) Ref: MySQL vs MongoDB 1000 reads

But the time to return difference is too great to be a tuning issue. My (default) mongodb log file reads:

Wed Mar 05 04:56:36.415 [conn4089] command NTV_Results.$cmd command: { distinct: "results", key: "TestNumber", query: { Valid: 0.0, DUT: 68.0 } } ntoreturn:1 keyUpdates:0 numYields: 6 locks(micros) r:21764672 reslen:250 16525ms

I have also tried the query:

db.results.group( {
               key: { "TestNumber": 1 },
               cond: {"Valid": 0, "DUT": 68 },
               reduce: function ( curr, result ) { },
               initial: { }
            } )

With similar (17 seconds) results, any clues as to what I am doing wrong? Both services are running on the same octo-core i7 3770 desktop PC with Windows 7 and 16Gb RAM.

Neither of those functions are good examples of the types of operations you are trying to perform as they are old implementations that are considered to be superseded by the [aggregation pipeline](http://docs.mongodb.org/manual/core/aggregation-pipeline/) read up on that for further information — Neil Lunn, Mar 05 '14 at 04:55
FIXED: Need to read up on index's wrt mongodb. Especially if importing an existing mysql DB as they may not be created by default. — TRx Studio, Mar 05 '14 at 11:20

Neil Lunn · Accepted Answer · 2014-03-07T12:24:32.903

There can be many reasons for slow performance, much of which is too much detail to go into here. But I can offer you a "starter pack" as it were.

Creating Indexes on your Valid and DUT fields are going to improve results for these and other queries. Consider this compound form this case using the ensureIndex command

db.collection.ensureIndex({ "Valid": 1, "DUT": 1})

Also the use of aggregate is recommended for these types of operations:

db.collection.aggregate([
    {$match: { "Valid": 0, "DUT": 68 }},
    {$group: { _id: "$TestNumber" }}
])

Should be the equivalent of the SQL you are referring to.

There is a SQL to Aggregation Mapping Chart that may give you some assistance with the thinking. Also worth familiarizing yourself with the difference aggregation operators in order to write effective queries.

I have spent many years writing very complex SQL for advanced tasks. And I find the aggregation framework a breath of fresh air for various problem solving cases.

Worth your time to learn.

Also worth noting. Your "default" MongoDB log file is reporting those operations because they are considered to be "slow queries" and are then brought to your attention by "default". You can also see more or less information, as you require by tuning the database profiler to meet your needs.

Thanks for the tips regarding aggregation. The actual Fix was Creating Indexes. When I imported the mysql db, no indexes were created. I definitely suspect I am not the first (or the last) to not be aware of this. So I thank you for your help and swift response. — TRx Studio, Mar 05 '14 at 11:09
I'm not so sure about the aggregate{match,group} cmd though (specifically for my task) as the db.collection.distinct() I used is now the fastest - which is a type of aggregation - I read :) But you have definitely given me a vector to join, so once again thanks for that. — TRx Studio, Mar 05 '14 at 11:17
@TRxStudio Well it was said to create an index in the answer, and actually one that would be specifically used by your query. And you should **not** be using the group or distinct methods that you were showing. They are more or less considered deprecated. The aggregation parsing is all in native C++ and will run rings around the other methods. Hence the detail. — Neil Lunn, Mar 05 '14 at 11:19

MongoDB vs MySQL Performance - Simple Query

1 Answers1

Linked