how long do these pages usually stay dirty in memory?
It's variable. InnoDB has a background thread that flushed dirty pages to disk. It flushes a modest number of pages, then does it again after 1 second.
So if you do a lot of updates in a short space of time, you would make a lot of pages dirty. Then the flushing thread would gradually flush them to disk. The idea is that this helps to stretch the work out over time, so a sudden spike of updates doesn't overwhelm your disk.
But it means that "how long do these pages stay dirty in memory" can vary quite a bit. I think in typical cases, it would be done in a few minutes.
Different versions of MySQL flush in different ways. Years ago, the main background thread flushed a fixed number of pages every 1 second. Then they came up with adaptive flushing, so it would increase the flush rate automatically if it detected you were making a lot of changes. Then they came up with a dedicated thread called the page cleaner. I think it's even possible to configure MySQL to run multiple page cleaner threads, but that's not necessary for most applications.
You might also be interested in my answers to these past questions:
Lets say ... the buffer size is like 1gb or something(not sure how much database servers usually have)
It really varies and depends on the app. The default innodb buffer pool size out of the box is 128MB, but that's too small for most applications unless it's a test instance.
At my company, we try to maintain the buffer pool at least 10% of the size of data on disk. Some apps need more. The most common size we have is 24GB, but the smallest is 1GB and the largest is 200GB. We manage over 4,000 production MySQL instances.
how much of these 1gb could be dirty pages?
All of them, in theory. MySQL has a config variable calls innodb_max_dirty_pages_pct
which you might assume blocks any further dirty pages if you have too many. But it doesn't. You can still modify more pages even if the buffer pool is more dirty (percentage-wise) than that variable.
What the variable really does is if the buffer pool is more than that percent full of dirty pages, the rate of flushing dirty pages is increased (IIRC, it doubles the number of pages it flushes per cycle), until the number falls below that percentage threshold again.
if the power is lost with no backup power, then all of the changes to these dirty pages get lost correct?
Yes, but you won't lose the changes, because they can be reconstructed from the InnoDB redo log -- those two files iblogfile_0
and iblogfile_1
you may have seen in your data dir. Any transaction that created a dirty page must be logged in the redo log during commit.
If you have a power loss (or other kind of restart of the mysqld process), the first thing InnoDB does is scan the redo log to check that every change logged was either flushed before the crash, or else if not, load the original page and reapply the change from the log to make the dirty page again. That's what InnoDB calls crash recovery.
You can watch this happening. Tail the error log on a test instance of MySQL Server, while you kill -9
the mysqld process. mysqld_safe will restart the mysqld process, which will spew a bunch of information into the error log as it performs crash recovery.
If there was only a small amount of dirty pages to recover, this will be pretty quick, perhaps only seconds. If the buffer pool was large and had a lot of dirty pages, it'll take longer. The MySQL Server isn't fully started, and cannot take new client connections, until crash recovery is complete. This has caused many MySQL DBA's many minutes of anxiety while watching the progress of the crash recovery. There's no way to predict how long it takes after a crash.
Since the redo log is needed for crash recovery, if the redo log fills up, MySQL must flush some dirty pages. It won't allow dirty pages to be un-flushed and also unrecoverable from the redo log. If this happens, you'll actually see writes paused by InnoDB until it can do a kind of "emergency flush" of the oldest dirty pages. This used to be a problem for MySQL, but with improvements like adaptive flushing and the page cleaner, it can keep up with the pace of changes much better. You'd have to have a really extraordinary number of writes, and an undersized redo log to experience a hard stop on InnoDB while it does a sync flush.
Here's a good blog about flushing: https://www.percona.com/blog/2011/04/04/innodb-flushing-theory-and-solutions/
P.S.: For an obligatory bash against MyISAM, I'll point out that MyISAM doesn't have a redo log, doesn't have crash recovery, and relies on the host OS file buffer during writes to its data files. If your host has a power failure while there are pending writes in the file buffer and not yet written to disk, you will lose them. MyISAM does not have any real support for the Durability property of ACID.
Re your comment:
A page will probably be flushed by the time the redo log recycles. That is, if you have 2x 48MB redo log files (the default size), and you write enough transactions to it to cycle completely through it and start over at the beginning, any pages in the buffer pool made dirty during that time will need to be flushed. A page cannot remain dirty in the BP if the respective transaction in the redo log is overwritten with new transactions.
As far as I understand, it would be virtually impossible for a dirty page to stay dirty in the buffer pool without being flushed for 12-24 hours.
The possible exception, and I'm just speculating about this, is that a given page gets updated again and again before it's flushed. Therefore it remains a recent dirty page for a long time. Again, I don't know for sure if this overcomes the need to flush a page when the redo log recycles.
Regardless, I think it's highly unlikely.
Also, I'm not sure what you mean by forensic. There's no direct way to examine page versions from the buffer pool. To get info about recent changes from InnoDB, you'd need to examine the undo segment to find previous versions of pages, and correlated them with redo log entries. The dirty page and its previous versions can both be in the buffer pool, or on disk. There's no commands or API or any data structure to do any of that correlation. So you'd be doing manual dumps of both disk images and memory images, and following pointers manually.
A much easier way of tracing data changes is by examining the stream of changes in the binary log. That's independent of InnoDB.