3

By database pages i mean :

https://dev.mysql.com/doc/internals/en/innodb-page-structure.html

Now these pages get loaded to memory when we issue a query against it, and it gets changed there only and get marked as dirty

I'm not sure whether this depends on O.S or Database, but my question is how long do these pages usually stay dirty in memory?

Lets say we have a database for a high load web server with a lot traffic, and the buffer size is like 1gb or something(not sure how much database servers usually have), now how much of these 1gb could be dirty pages?

and if the power is lost with no backup power, then all of the changes to these dirty pages get lost correct? (Basically i want to know if a power outage occurs, if there is no power backup and there are a lot of inserts and queries happening, what is the estimated percentage of dirty data inside memory that is going to get lost?)

For example is there a chance that these dirty pages ever stay more than 12 or 24 hours on busy servers?

EDIT: by dirty pages i mean the page is modified in memory, for example one row inside it is updated or deleted

OneAndOnly
  • 1,048
  • 1
  • 13
  • 33
  • what do you mean `marked as dirty`? – treyBake Dec 21 '18 at 14:03
  • @treyBake meaning the page is modified, for example one row inside it is updated or deleted – OneAndOnly Dec 21 '18 at 14:10
  • Then it's not dirty .. I'm confuddled as to what you're trying to ask ... – treyBake Dec 21 '18 at 14:10
  • @treyBake https://dev.mysql.com/doc/refman/5.6/en/glossary.html#glos_dirty_page – OneAndOnly Dec 21 '18 at 14:11
  • @treyBake it is dirty, they call a page dirty when its modified. – OneAndOnly Dec 21 '18 at 14:12
  • Are they .. in my years, I've never heard it called dirty when modified ... – treyBake Dec 21 '18 at 14:13
  • @OneAndOnly They stay in memory until they get [flushed](https://dev.mysql.com/doc/refman/5.6/en/glossary.html#glos_flush). – Progman Dec 21 '18 at 14:14
  • @Progman but long does it usually take for this flushing to happen by DBMS? if i have 1k pages inside memory and 500 in dirty, how long will it approximatly take for them to get flushed ( how often DBMS flushes and how many pages does it flush) – OneAndOnly Dec 21 '18 at 14:16
  • @OneAndOnly "Flushing can happen because a memory area becomes full and the system needs to free some space [...]" – Progman Dec 21 '18 at 14:17
  • @Progman let me ask it this way : how long can a dirty page usually stay in maximum in busy servers? can we even find a good database server that has 12 or even 24 hour old pages? – OneAndOnly Dec 21 '18 at 18:26
  • 1
    After discussion, I have the opinion that this question is an [XY Problem](http://xyproblem.info). The OP wants to analyze database changes, and they think that directly reading dirty pages is the way to do this. **It's not.** – Bill Karwin Dec 22 '18 at 00:39

1 Answers1

8

how long do these pages usually stay dirty in memory?

It's variable. InnoDB has a background thread that flushed dirty pages to disk. It flushes a modest number of pages, then does it again after 1 second.

So if you do a lot of updates in a short space of time, you would make a lot of pages dirty. Then the flushing thread would gradually flush them to disk. The idea is that this helps to stretch the work out over time, so a sudden spike of updates doesn't overwhelm your disk.

But it means that "how long do these pages stay dirty in memory" can vary quite a bit. I think in typical cases, it would be done in a few minutes.

Different versions of MySQL flush in different ways. Years ago, the main background thread flushed a fixed number of pages every 1 second. Then they came up with adaptive flushing, so it would increase the flush rate automatically if it detected you were making a lot of changes. Then they came up with a dedicated thread called the page cleaner. I think it's even possible to configure MySQL to run multiple page cleaner threads, but that's not necessary for most applications.

You might also be interested in my answers to these past questions:

Lets say ... the buffer size is like 1gb or something(not sure how much database servers usually have)

It really varies and depends on the app. The default innodb buffer pool size out of the box is 128MB, but that's too small for most applications unless it's a test instance.

At my company, we try to maintain the buffer pool at least 10% of the size of data on disk. Some apps need more. The most common size we have is 24GB, but the smallest is 1GB and the largest is 200GB. We manage over 4,000 production MySQL instances.

how much of these 1gb could be dirty pages?

All of them, in theory. MySQL has a config variable calls innodb_max_dirty_pages_pct which you might assume blocks any further dirty pages if you have too many. But it doesn't. You can still modify more pages even if the buffer pool is more dirty (percentage-wise) than that variable.

What the variable really does is if the buffer pool is more than that percent full of dirty pages, the rate of flushing dirty pages is increased (IIRC, it doubles the number of pages it flushes per cycle), until the number falls below that percentage threshold again.

if the power is lost with no backup power, then all of the changes to these dirty pages get lost correct?

Yes, but you won't lose the changes, because they can be reconstructed from the InnoDB redo log -- those two files iblogfile_0 and iblogfile_1 you may have seen in your data dir. Any transaction that created a dirty page must be logged in the redo log during commit.

If you have a power loss (or other kind of restart of the mysqld process), the first thing InnoDB does is scan the redo log to check that every change logged was either flushed before the crash, or else if not, load the original page and reapply the change from the log to make the dirty page again. That's what InnoDB calls crash recovery.

You can watch this happening. Tail the error log on a test instance of MySQL Server, while you kill -9 the mysqld process. mysqld_safe will restart the mysqld process, which will spew a bunch of information into the error log as it performs crash recovery.

If there was only a small amount of dirty pages to recover, this will be pretty quick, perhaps only seconds. If the buffer pool was large and had a lot of dirty pages, it'll take longer. The MySQL Server isn't fully started, and cannot take new client connections, until crash recovery is complete. This has caused many MySQL DBA's many minutes of anxiety while watching the progress of the crash recovery. There's no way to predict how long it takes after a crash.

Since the redo log is needed for crash recovery, if the redo log fills up, MySQL must flush some dirty pages. It won't allow dirty pages to be un-flushed and also unrecoverable from the redo log. If this happens, you'll actually see writes paused by InnoDB until it can do a kind of "emergency flush" of the oldest dirty pages. This used to be a problem for MySQL, but with improvements like adaptive flushing and the page cleaner, it can keep up with the pace of changes much better. You'd have to have a really extraordinary number of writes, and an undersized redo log to experience a hard stop on InnoDB while it does a sync flush.

Here's a good blog about flushing: https://www.percona.com/blog/2011/04/04/innodb-flushing-theory-and-solutions/

P.S.: For an obligatory bash against MyISAM, I'll point out that MyISAM doesn't have a redo log, doesn't have crash recovery, and relies on the host OS file buffer during writes to its data files. If your host has a power failure while there are pending writes in the file buffer and not yet written to disk, you will lose them. MyISAM does not have any real support for the Durability property of ACID.


Re your comment:

A page will probably be flushed by the time the redo log recycles. That is, if you have 2x 48MB redo log files (the default size), and you write enough transactions to it to cycle completely through it and start over at the beginning, any pages in the buffer pool made dirty during that time will need to be flushed. A page cannot remain dirty in the BP if the respective transaction in the redo log is overwritten with new transactions.

As far as I understand, it would be virtually impossible for a dirty page to stay dirty in the buffer pool without being flushed for 12-24 hours.

The possible exception, and I'm just speculating about this, is that a given page gets updated again and again before it's flushed. Therefore it remains a recent dirty page for a long time. Again, I don't know for sure if this overcomes the need to flush a page when the redo log recycles.

Regardless, I think it's highly unlikely.

Also, I'm not sure what you mean by forensic. There's no direct way to examine page versions from the buffer pool. To get info about recent changes from InnoDB, you'd need to examine the undo segment to find previous versions of pages, and correlated them with redo log entries. The dirty page and its previous versions can both be in the buffer pool, or on disk. There's no commands or API or any data structure to do any of that correlation. So you'd be doing manual dumps of both disk images and memory images, and following pointers manually.

A much easier way of tracing data changes is by examining the stream of changes in the binary log. That's independent of InnoDB.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Thanks for this great answer, one question : is there a chance that these dirty pages ever stay more than 12 or 24 hours on busy servers? this is important for forensic because these dirty pages could provide important info about recent changes, so when we arrive at a server how old can we hope these dirty pages to be if we are lucky? (lets say its a good designed database server, not badly configured) – OneAndOnly Dec 21 '18 at 18:23
  • So how long it can usually stay if 12 hour is almost impossible? 2-3 hours? (and yes these are important for forensic because we can compare dirty pages with their pair in the disk to find out changes, google DBcarver) – OneAndOnly Dec 21 '18 at 19:11
  • It's most likely for a page to be flushed promptly, probably in a few seconds after it's made dirty, or a small number of minutes. If the db is busy as you say, the redo log will wrap around quickly and that will force the dirty page to be flushed. Are there exceptions? Maybe. As I said, I don't know. But I'm sure it would be a "long-tail" scenario, very uncommon if it's possible at all. – Bill Karwin Dec 21 '18 at 19:50
  • Even if there were _no_ activity after the page is made dirty, normal flushing rate will continue. The idea is that InnoDB tries to flush dirty pages as soon as possible. They don't just hang out in a dirty state for hours on an idle system. – Bill Karwin Dec 21 '18 at 19:52
  • 1
    @OneAndOnly - It seems like folly to compare dirty pages to disk pages -- You will miss some, many, or even all, of the changes, depending on how fast InnoDB happens to flush them. – Rick James Dec 21 '18 at 21:03
  • I agree with @RickJames. Pulling internal data structures out of InnoDB is not intended usage of MySQL. It undocumented and unsupported, and you risk the data structures changing between versions. – Bill Karwin Dec 21 '18 at 21:45