How does PostgreSQL perform writes so much faster than SQLite?

Question

I did a simple integer update performance test. SQLite did only 15 updates per second while PostgreSQL did 1500 updates per second.

The number with the SQLite case seems to be normal.

The FAQ in the SQLite site explains as if it's a fundamental limitation of a rotational disk.

Actually, SQLite will easily do 50,000 or more INSERT statements per second on an average desktop computer. But it will only do a few dozen transactions per second. Transaction speed is limited by the rotational speed of your disk drive. A transaction normally requires two complete rotations of the disk platter, which on a 7200RPM disk drive limits you to about 60 transactions per second. Transaction speed is limited by disk drive speed because (by default) SQLite actually waits until the data really is safely stored on the disk surface before the transaction is complete. That way, if you suddenly lose power or if your OS crashes, your data is still safe. For details, read about atomic commit in SQLite..

By default, each INSERT statement is its own transaction. But if you surround multiple INSERT statements with BEGIN...COMMIT then all the inserts are grouped into a single transaction. The time needed to commit the transaction is amortized over all the enclosed insert statements and so the time per insert statement is greatly reduced.

Another option is to run PRAGMA synchronous=OFF. This command will cause SQLite to not wait on data to reach the disk surface, which will make write operations appear to be much faster. But if you lose power in the middle of a transaction, your database file might go corrupt.

Is this description true? Then, how can PostgreSQL perform so much faster than SQLite? (I set both fsync and synchronous_commit option to on in PostgreSQL)

UPDATE:

Here's the full test code written in Clojure:

(defproject foo "0.1.0-SNAPSHOT"
  :repositories {"sonatype-oss-public" "https://oss.sonatype.org/content/groups/public/"}
  :dependencies [[org.clojure/clojure "1.5.1"]
                 [org.clojure/java.jdbc "0.3.0-SNAPSHOT"]
                 [com.mchange/c3p0 "0.9.2.1"]
                 [org.xerial/sqlite-jdbc "3.7.2"]
                 [postgresql "9.1-901.jdbc4"]])

(ns foo.core
  (:require [clojure.java.jdbc :as jdbc]
            [clojure.java.jdbc.ddl :as ddl])
  (:import  [com.mchange.v2.c3p0 ComboPooledDataSource]))

(def sqlite
  (let [spec {:classname "org.sqlite.JDBC"
              :subprotocol "sqlite"
              :subname "test.db"}]
    {:datasource (doto (ComboPooledDataSource.)
                   (.setDriverClass (:classname spec))
                   (.setJdbcUrl (str "jdbc:" (:subprotocol spec) ":" (:subname spec)))
                   (.setMaxIdleTimeExcessConnections (* 30 60))
                   (.setMaxIdleTime (* 3 60 60)))}))

(def postgres
  (let [spec {:classname "org.postgresql.Driver"
              :subprotocol "postgresql"
              :subname "//localhost:5432/testdb"
              :user "postgres"
              :password "uiop"}]
    {:datasource (doto (ComboPooledDataSource.)
                   (.setDriverClass (:classname spec))
                   (.setJdbcUrl (str "jdbc:" (:subprotocol spec) ":" (:subname spec)))
                   (.setUser (:user spec))
                   (.setPassword (:password spec))
                   (.setMaxIdleTimeExcessConnections (* 30 60))
                   (.setMaxIdleTime (* 3 60 60)))}))

(doseq [x [sqlite postgres]]
  (jdbc/db-do-commands x
    (ddl/create-table :foo [:id :int "PRIMARY KEY"] [:bar :int])))

(doseq [x [sqlite postgres]]
  (jdbc/insert! x :foo {:id 1 :bar 1}))

(defmacro bench
  [expr n]
  `(dotimes [_# 3]
     (let [start# (. System (nanoTime))]
       (dotimes [_# ~n]
         ~expr)
       (let [end#               (. System (nanoTime))
             elapsed#           (/ (double (- end# start#)) 1000000.0)
             operation-per-sec# (long (/ (double ~n) (/ (double (- end# start#)) 1000000000)))]
       (prn (str "Elapsed time: " elapsed# " ms (" (format "%,d" operation-per-sec#) " ops)"))))))

(bench (jdbc/query sqlite ["select * from foo"]) 20000)
(bench (jdbc/execute! sqlite ["update foo set bar=bar+1 where id=?" 1]) 100)

(bench (jdbc/query postgres ["select * from foo"]) 20000)
(bench (jdbc/execute! postgres ["update foo set bar=bar+1 where id=?" 1]) 5000)

And the output is:

; Running "select * from foo" 20000 times in SQLite

"Elapsed time: 1802.426963 ms (11,096 ops)"
"Elapsed time: 1731.118831 ms (11,553 ops)"
"Elapsed time: 1749.842658 ms (11,429 ops)"

; Running "update foo set bar=bar+1 where id=1" 100 times in SQLite

"Elapsed time: 6362.829057 ms (15 ops)"
"Elapsed time: 6405.25075 ms (15 ops)"
"Elapsed time: 6352.943553 ms (15 ops)"

; Running "select * from foo" 20000 times in PostgreSQL

"Elapsed time: 2898.636079 ms (6,899 ops)"
"Elapsed time: 2824.77372 ms (7,080 ops)"
"Elapsed time: 2837.622659 ms (7,048 ops)"

; Running "update foo set bar=bar+1 where id=1" 5000 times in PostgreSQL

"Elapsed time: 3213.120219 ms (1,556 ops)"
"Elapsed time: 3564.249492 ms (1,402 ops)"
"Elapsed time: 3280.128708 ms (1,524 ops)"

pg_fsync_test result:

C:\temp>"C:\Program Files\PostgreSQL\9.3\bin\pg_test_fsync"
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
        open_datasync                   81199.920 ops/sec      12 usecs/op
        fdatasync                                     n/a
        fsync                              45.337 ops/sec   22057 usecs/op
        fsync_writethrough                 46.470 ops/sec   21519 usecs/op
        open_sync                                     n/a

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
        open_datasync                   41093.981 ops/sec      24 usecs/op
        fdatasync                                     n/a
        fsync                              38.569 ops/sec   25927 usecs/op
        fsync_writethrough                 36.970 ops/sec   27049 usecs/op
        open_sync                                     n/a

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
         1 * 16kB open_sync write                     n/a
         2 *  8kB open_sync writes                    n/a
         4 *  4kB open_sync writes                    n/a
         8 *  2kB open_sync writes                    n/a
        16 *  1kB open_sync writes                    n/a

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
        write, fsync, close                45.564 ops/sec   21947 usecs/op
        write, close, fsync                33.373 ops/sec   29964 usecs/op

Non-Sync'ed 8kB writes:
        write                             889.800 ops/sec    1124 usecs/op

It is not about the speed (or delay) of writing (committing) to disk. It is about **ordered writes** — wildplasser, Oct 02 '13 at 11:06
How many updates per transaction are you doing with PosgreSQL? — CL., Oct 02 '13 at 11:54
_a simple integer update performance test_: OK, so we have no idea what you actually did. — Daniel Vérité, Oct 02 '13 at 11:54
@CL. I didn't explicitly used transaction, so it's one update per transaction. I know I can wrap multiple updates in a transaction to boost speed, but that's not what I'm asking now. — alice, Oct 02 '13 at 13:43
@DanielVérité It's a table with 2 integer columns with the query being `update foo set bar=bar+1 where id=1` — alice, Oct 02 '13 at 13:46
What OS ? What disk type? Do you have a battery backed write cache? fsync behavior is OS dependent. On Windows open_datasync is fastest because it is able to use the hardware write cache. — Amit Naidu, Oct 22 '13 at 00:44

score 19 · Answer 1 · answered Oct 02 '13 at 12:04

19

It breaks down to how they implement snapshot isolation.

SQLite uses file locking as a means to isolate transactions, allowing writes to hit only once all reads are done.

Postgres, in contrast, uses a more sophisticated approach called multiconcurrency version control (mvcc), that allows multiple writes to occur in parallel with multiple reads.

http://www.sqliteconcepts.org/SI_index.html

http://www.postgresql.org/docs/current/static/mvcc-intro.html

http://wiki.postgresql.org/wiki/MVCC

answered Oct 02 '13 at 12:04

Denis de Bernardy

75,850
13
131
154

3

It's a table with 2 integer columns with the query being `update foo set bar=bar+1 where id=1` executed serially. In other words, there's no concurrency involved, so I doubt it has to do with MVCC. – alice Oct 02 '13 at 13:36
It actually might: SQLite would (internally) update the existing row. In fact, all rows, leading it to rewrite the entire table (and, de facto, the entire SQL file). Whereas Postgres would (again, internally) mark rows as deleted at that txid, and insert new rows that are valid from that txid onward. You might see a slight performance difference for Postgres if you play around with the table's fill factor, e.g. changing it from the default (60 or 70, if memory serves) to 10 (slighhtly faster, since pages are split less often) or 95 (resp. slightly slower). – Denis de Bernardy Oct 02 '13 at 15:13

jjanes · Accepted Answer · 2013-10-02T21:18:46.443

12

You are correct to be suspicious. PostgreSQL with the settings you indicate should not be able to perform anything near 1500 updates in separate sequential transactions per second to rotating media.

Probably something in your IO stack is lying or buggy about how it implements sync. Which means your data is at risk of serious corruption after an unexpected power outage or OS failure.

Looking at the result of pg_test_fsync, this is indeed the case. open_datasync, which is the default under Windows, appears to be unrealistically fast and so must be unsafe. I see the same thing when I run pg_test_fsync on my Windows7 machine.

edited Oct 02 '13 at 21:18

answered Oct 02 '13 at 16:43

jjanes

37,812
5
27
34

I thought `fsync` is the default. After changing it to `fsync`, tps dropped to 50~60. – alice Oct 14 '13 at 10:37
3

I also changed the `journal_mode` in SQLite to `WAL`, and now I see the similar tps number, so the performance of the two databases are almost the same in this simple case as it should be. – alice Oct 14 '13 at 11:15

score 4 · Answer 3 · answered Oct 02 '13 at 12:36

Denis' answer has all the links you need. I will go for a less detailed but possbly more understandable answer.

Sqlite does not use any sophisticated transaction manager, there is not advanced multitasking logic hidden in it. It executes what you tell it to execute, in exactly that order. In other words: it does exactly what you tell it to do. If you tried using the same database from two processes - you would encounter problems.

PostgreSQL, on the other hand, is a very complex database: it efficiently supports multiple concurrent reads and writes. Think of it as a asynchronous system - you only schedule work to be done, you do not actually control it in its details - Postgres does it for you.

What to do with your efficiency? Join several - dozens - hundreds of updates/inserts into one transaction. For a simple table, you will get a very good performance.

score 2 · Answer 4 · answered May 07 '14 at 20:57

Actually, any write on a rotating disk is in the order of magnitude of 10 ms (typical number is 8 ms).

This means a little more than 100 writes per second, IF you are writing the same position in disk, which is a very odd case for a database. See "You don't know jack about disks" from the ACM, typically a disk can schedule 10 reads or writes in a single rotation.

http://queue.acm.org/detail.cfm?id=864058

So, a database can perform 1,000 writes per second and even more. I've seen apps performing 1,500 transactions per second 10 years ago in desktop computers.

score 1 · Answer 5 · answered Oct 02 '13 at 11:52

1

Assuming you are using a normal harddisk (i.e. no ssd) you can expect a maximum of 50-100 writes per second. It seems that 15 writes per second is slightly low, but not impossible.

So if Postgres is doing 1500 updates per second they are either written to some buffer/cache or collapsed into a single update. Without knowing more about the actual test it is difficult to say which is the actual reason but if you were to open a transaction, update a single row 1500 times and commit after that than Postgres should be smart enough to only execute a single "real" write to the disk.

answered Oct 02 '13 at 11:52

Wolph

78,177
11
137
148

1

I believe it's actually is related to transaction isolation related issues. – Denis de Bernardy Oct 02 '13 at 12:05
@Denis: very true, the main thing is that the guarantee of when everything has to be written to disk is only given after the commit. In the meantime it depends on the transaction isolation level. Read uncommited would give you possibly incorrect results which have been written within the transaction. – Wolph Oct 02 '13 at 12:14

score 0 · Answer 6 · answered Jan 18 '22 at 15:08

Modern Servers and Storages implements RAID technology, in memory Write Cache and Tiering.

To increase the number of transactions, there are some hardware possibilities

Multiple drives receiving the writes Storage having gigabytes of memory cache, confirming the transaction in advanced Disk controller having large memory cache (2GB) and ready for performance Tier 1 disks having SSD disks (move data to tier 2/3 later)

How does PostgreSQL perform writes so much faster than SQLite?

6 Answers6