16

I have the following select statement (using sqlite3 and the pysqlite module):

self.cursor.execute("SELECT precursor_id FROM MSMS_precursor "+
  "JOIN spectrum ON spectrum_id = spectrum_spectrum_id "+
  "WHERE spectrum_id = spectrum_spectrum_id "+
  "AND ROUND(ion_mz,9) = ? AND ROUND(scan_start_time,4) = ? "+
  "AND msrun_msrun_id = ?", select_inputValues)

Which takes 55 seconds when running in Python. When running it directly on the SQLite command line it only takes 15ms. Now, I noticed that when it's in this step the Python program goes into uninterrupted sleep (31283 ndeklein 18 0 126m 24m 3192 D 1.0 0.0 2:02.50 python, The D in top output) and it goes down from 100% CPU to around 1% CPU. Now that I noticed it during this query, I also looked at the top output when running the query I asked about here. During this time top also shows it going into uninterrupted sleep, although it switches back and forth between R and D and only slows down to around 50% (it fluctuates depending on if it's in D or R status).

So now I think that this is what is slowing my querying down (please correct me if uninterrupted sleep has nothing to do with programs speed). If this is true, how can I make sure a program does not go into this status?


Update 1:

The EXPLAIN QUERY PLAN using Python returned:

(0, 0, 1, u'SCAN TABLE spectrum (~50000 rows)')

The EXPLAIN QUERY PLAN using sqlite's command line returned:

0|0|1|SCAN TABLE spectrum (~50000 rows)
0|1|0|SEARCH TABLE MSMS_precursor USING INDEX fk_MSMS_precursor_spectrum_spectrum_id_1 (spectrum_spectrum_id=?) (~2 rows)

The EXPLAIN using Python returned:

(0, u'Trace', 0, 0, 0, u'', u'00', None)

The EXPLAIN using sqlite returned:

0|Trace|0|0|0||00|
1|Real|0|1|0|438.718658447|00|
2|Real|0|2|0|692.6345000000001|00|
3|Integer|1|3|0||00|
4|Goto|0|39|0||00|
5|OpenRead|1|33|0|13|00|
6|OpenRead|0|39|0|5|00|
7|OpenRead|2|41|0|keyinfo(1,BINARY)|00|
8|Rewind|1|35|0||00|
9|Column|1|8|5||00|
10|RealAffinity|5|0|0||00|
11|Integer|4|6|0||00|
12|Function|2|5|4|round(2)|02|
13|Ne|2|34|4||6a|
14|Column|1|12|4||00|
15|Ne|3|34|4|collseq(BINARY)|6c|
16|Column|1|0|8||00|
17|IsNull|8|34|0||00|
18|Affinity|8|1|0|d|00|
19|SeekGe|2|34|8|1|00|
20|IdxGE|2|34|8|1|01|
21|IdxRowid|2|7|0||00|
22|Seek|0|7|0||00|
23|Column|1|0|9||00|
24|Column|2|0|10||00|
25|Ne|10|33|9|collseq(BINARY)|6b|
26|Column|0|1|5||00|
27|RealAffinity|5|0|0||00|
28|Integer|9|6|0||00|
29|Function|2|5|11|round(2)|02|
30|Ne|1|33|11||6a|
31|Column|0|0|13||00|
32|ResultRow|13|1|0||00|
33|Next|2|20|0||00|
34|Next|1|9|0||01|
35|Close|1|0|0||00|
36|Close|0|0|0||00|
37|Close|2|0|0||00|
38|Halt|0|0|0||00|
39|Transaction|0|0|0||00|
40|VerifyCookie|0|31|0||00|
41|TableLock|0|33|0|spectrum|00|
42|TableLock|0|39|0|MSMS_precursor|00|
43|Goto|0|5|0||00|

And iostat returned:

io-bash-3.2$ iostat
Linux 2.6.18-194.26.1.el5 (ningal.cluster.lifesci.ac.uk)         06/04/2012

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           14.35    0.00    0.30    0.01    0.00   85.34

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               1.16         4.55        17.22    1520566    5752802
sda1              0.00         0.02         0.00       5074         34
sda2              1.16         4.53        17.22    1515184    5752768
sdb               0.00         0.02         0.00       5108          0
dm-0              2.29         3.88        16.70    1297226    5579336
dm-1              0.00         0.00         0.00        928          0
dm-2              0.11         0.65         0.52     216106     173432

Update 2

I migrated the database to MySQL and here the query only takes about 0.001 second, even though for all the other queries I'm doing it is actually slower than sqlite (I optimized for sqlite so this might or might not be surprising).

Community
  • 1
  • 1
Niek de Klein
  • 8,524
  • 20
  • 72
  • 143
  • 1
    Do you use the same version of the sqlite library from python and in the sqlite shell? Usually the Python pysqlite version is pretty ancient. – schlenk Jun 02 '12 at 16:16
  • Can you verify that the analysis from `EXPLAIN` is the same for each query, the one from within Python and the one from the command-line tool? (You'll probably have to write some custom Python for this debugging.) – ghoti Jun 03 '12 at 17:41
  • Just checking (because you don't say): this is against the same database? :-) – Donal Fellows Jun 03 '12 at 17:49
  • A process in "uninterruptible sleep" usually means that it is waiting for I/O. Try using `iostat` and `lsof` for an idea of what I/O is going on in your system. – Michael Hoffman Jun 03 '12 at 17:59
  • @ghoti I know it's the same query, because I copy the query directly from the python query and just remove the self.cursor.execute(. "'s and +'s. – Niek de Klein Jun 03 '12 at 19:54
  • @Donal Fellows Yes it is the same database. – Niek de Klein Jun 03 '12 at 19:54
  • @Michael Hoffman I figured it was doing that, it's just weird that it only happens with 2 out of many queries, and with one more sever than the other. I will try the iostat and lsof tomorrow. – Niek de Klein Jun 03 '12 at 19:54
  • @NiekdeKlein - I'm glad you're making the same query, but that wasn't my question. Sometimes, pre-existing conditions may affect your database's behaviour regardless of your query. For example, are you having a table locking problem? – ghoti Jun 03 '12 at 22:04
  • @NiekdeKlein, are you threading in your python script? – Mike Pennington Jun 03 '12 at 22:10
  • 1
    @ghoti I updated my question with the results from the `EXPLAIN` and `EXPLAIN QUERY PLAN` – Niek de Klein Jun 04 '12 at 10:09
  • But when you execute the query from the sqlite command line you are not using parameters and a prepared statement, you are likely using constants instead. And that's a big difference. Can you test the query from python using an all literal string, without any `?`. – rodrigo Jun 04 '12 at 10:30
  • @rodrigo using an all literal string took 41 seconds. – Niek de Klein Jun 04 '12 at 11:01
  • *How* did you verify that the same sqlite library version is in use in both places? The suggestion by @schlenk seems the most probable of what's been given here thus far. – Charles Duffy Jun 04 '12 at 11:14
  • @schlenk by printing pysqlite2.dbapi2.sqlite_version in python and .version in the sqlite command line – Niek de Klein Jun 04 '12 at 12:01
  • Uninterruptable sleep is related to I/O -- if you were accessing data files on different disks between the command line and Python versions (particularly, if one of them was running against NFS), this would be consistent with the symptoms described. – Charles Duffy Jun 04 '12 at 15:45
  • @Charles Duffy they both access the same datafile, the sqlite database, and it's from the same cluster. – Niek de Klein Jun 04 '12 at 15:55

2 Answers2

2

As I mentioned in an answer to a prior question you asked, did you give the sqlite module apsw a try? From the website:

APSW is a Python wrapper for the SQLite embedded relational database engine. In contrast to other wrappers such as pysqlite it focuses on being a minimal layer over SQLite attempting just to translate the complete SQLite API into Python. The documentation has a section on the differences between APSW and pysqlite.

I tried it myself and it seems to indeed reflect better how SQL statements are executed by the "real" Sqlite (i.e. the client or the C library).

Community
  • 1
  • 1
  • I have not tried that, as I can't find out how to install it against an already existing amalgation instead of the default installed one or using -fetch to download a new one – Niek de Klein Jun 11 '12 at 14:00
  • I'm the author of APSW. If you want to use your existing SQLite libraries then just leave out the --fetch parameter when building. This is documented at http://apidoc.apsw.googlecode.com/hg/build.html#finding-sqlite-3 – Roger Binns Jul 05 '12 at 15:55
0

There is a performance issue with SQLite and Python. Read this thread for more information. There are a few suggestions there - try them, it might work - like adding an index to your join fields or using pysqlite.

http://www.mail-archive.com/python-list@python.org/msg253067.html

kjp
  • 3,086
  • 22
  • 30
  • 1
    I am using pysqlite and I have indexes on the JOINS. Besides, it works fast for both MySQL through python and SQLite commandline. – Niek de Klein Jun 05 '12 at 16:26
  • You have an index on spectrum_spectrum_id, but I don't see one on spectrum_id of spectrum table. The issue here is specific to Python+SQLite, so it is OK that it works fast in SQLite command-line. Also which version of pysqlite are you using? – kjp Jun 05 '12 at 16:32
  • I do have an index on spectrum_id of the spectrum table. My pysqlite version is 2.6.3 – Niek de Klein Jun 05 '12 at 16:45