Some fun around history list
Why this article?
First of all because I was having fun in digging in the code.
Then I was reading a lot about the improvements we will have in MySQL 5.6, and of some already present in 5.5.
Most of them are well cover by people for sure more expert then me, so I read and read, but after a while I start to be also curious, and I start to read the code, and do tests.
I start to do comparison between versions, like 5.1 - 5.5. - 5.6
One of the things I was looking to was how the new Purge thread mechanism works and his implications.
I have to say that it seems working better then the previous versions, and the Dimitry blog (see reference) seems confirm that.
So again why the article? Because I think there are some traps here and there and I feel the need to write about them.
The worse behaviour is when using MySQL 5.5, and this is because in 5.5 we have an intermediate situation, where the purge is not fully rewrite as in 5.6, but also not bound to the main thread.
What is the history list?
MySQL uses (from Innodb 1.1 MySQL 5.5) 1 to 128 Rollback segments, each able to support 1023 transactions.
Each transaction is assigned to one of the rollback segments, and remains tied to that rollback segment for the duration.
This enhancement improves both scalability (higher number of concurrent transactions) and performance (less contention when different transactions access the rollback segments).
History list is tightly bound to undo log representing the number of Header Pages related to undo log segments, segments that are related to finalize transactions,commit or roll back.
That's it, History list represent the number of not yet flush segments in the undo log.
Old and New
Previously, the purge thread was directly controlled by the main thread in InnoDB causing serious side effects for description of which read ( http://dimitrik.free.fr/blog/archives/2010/04/mysql-performance-why-purge-thread-in-innodb.html).
The main change was to move out the purge thread and allow it to run it isolated, such that it will not impact other process.
The move was done in Innodb 1.1 (present in 5.1 and 5.5)
But that is not all, the new version of purge has a mechanism that allow it to be more or less aggressive, in relation to the increase or decrease of the History List length.
This option is enable in the 5.6 release and is taking the innodb_purge_threads as the "up to value" it can use to increase the number of threads for purging.
Different behaviour
What is quite obvious is that the behaviour of the new Purge mechanism, is quite different from the previous one, ie 5.0 (innoDB 1.0) or 5.1.
In the previous versions of InnoDB, we were educated to consider the History List something that should always be close to 0, not always but as soon as possible.
Frankly speaking that was always a little bit confuse, but as said the MANUAL, was suggesting in several place to keep it between reduced numbers:
I.e. (http://dev.mysql.com/doc/refman/5.5/en/innodb-multi-versioning.html)
"If you insert and delete rows in smallish batches at about the same rate in the table, the purge thread can start to lag behind and the table can grow bigger and bigger because of all the “dead” rows, making everything disk-bound and very slow In such a case, throttle new row operations, and allocate more resources to the purge thread by tuning the innodb_max_purge_lag system variable."
A classic scenario for such issue, is an application taking track of the activity on the network, that require to write huge number of small insert into the database.
From my tests I have seen an incredible number of entry in the history list in 5.5, that were not present in the previous InnoDB version, and that are not present again in 5.6.
The point is it could happen to have so many transactions, doing INSERT,UPDATE or DELETE that the History grows too much, and the un-flushed undo log as well. To prevent issues, we should tune the value of the innodb_max_purge_lag in order to allow InnoDB to complete the PURGE operations.
Innodb_max_purge_lag is the maximum number in history list we want to have, above which Innodb will start to apply an indiscriminate delay in pushing the operations.
the formula is quiet simple:
1 |
((purge_lag/innodb_max_purge_lag)×10)–5 milliseconds.
|
Or following the code we have:
float ratio = (float) trx_sys->rseg_history_len/ srv_max_purge_lag; ((ratio - .5) * 10000);
If we have a History list of 1200
and we have set innodb_max_purge_lag to 1000
result should be:
1 2 |
((1200/1000)X10)-5= 7 ms delay for operation. following the manual ((1200/100) -5) * 10000 = 7000 <-- Microsecond following the code |
All match and delay will be 7 ms.
Also the max limit in previous version for the delay was of 4295 seconds!!
While in 5.5/5.6 we see a different way of managing the max number of seconds:
1 2 3 4 |
if (delay > srv_max_purge_lag_delay) { delay = srv_max_purge_lag_delay; } |
Where srv_max_purge_lag_delay max value is 10 seconds.
So the max delay, the worse case will be 10 seconds.
In the past as said we were use to see the History list going up and down (relatively) fast, so the force delay was playing his role efficiently.
At the same time, we knew that all operations in the Main threads where slowed down, so the forced delay, was something we had to leave with, or worse things could happen, like the need to perform an emergency page flush from the buffer pool, to allow REDO to recover space.
But something has changed...
... in better obviously... but we must be careful.
Better because now the purge thread works independently, and that it could scale, pending undo flushes do not slow down the other operations.
Also in 5.6, MySQL could be more or less aggressive in relation to the History list to purge.
Those operation remain something we should monitor and tune, for two main reasons:
- space taken by undo log segments is till an issue, and now that the number is increase, it could be even worse.
- Setting a wrong value for innodb_max_purge_lag could kill our performance.
Let us start digging a bit.
First of all History list and purge log thread are still very bad documented in the InnoDB Status Report.
In 5.5 we can count on the History list information, number of transaction purge has being up to, then the number of purged record up to, finally in 5.6 we have the generic state.
Not too much here, better info like number or running threads, real segments used (1-128), number of Header Pages in the segments, and dimension (at least pages as Percona does) would be nice to have.
Undo log is currently stored inside the system tablespace, this means that IT WILL NOT BE POSSIBLE to shrink the size of the table space, once undo log have taken huge amount of space.
That's it, the 80% or more of the size of a system table space is because the undo log, when using innodb_file_per_table. and this was already true when InnoDB was using a single set of segments (1023), now that it can scale up to 130944, and that it supports better more transactions, the size on disk can explode.
Some numbers to have better understanding,
Considering that normal block size is 4K in file system each page is 4 operation, we will have 1,716,504 operation, assuming that each page will be sequential, this means 3 ms for Seek+half rotation then, 1 ms transfer for the first operation then 2ms for the remaining will be 12ms for each page delete on disk.
Meaning 5149.512 seconds (85 minutes) at 6.7 Mb/s given the partial random write, to flush the whole.
Obviously this number changes in respect of the available cache and available spindles.
Also more threads more capacity in write, less time, so the option innodb_purge_threads is more then welcome.
Setting the right value for innodb_max_purge_lag
In setting this value we must keep into consideration the following:
- the value is the number of head pages representing an undo log relation to a running/executed transaction.
- Purge can apply only to complete transaction
- delay apply to all write operation inside InnoDB
Nowadays is not uncommon to have high number in history list in 5.5, in this case "just" 359,585 head pages, is then very important to correctly balance the delay point of start with the real load, like transactions/sec and the kind of operations are ongoing on the server.
To clarify, the relevance also in case of reads, not only of writes, let me cite:
<snip>
Running Very Long Transaction If you’re running very long transaction, be it even SELECT, Innodb will be unable to purge records for changes which are done after this transaction has started, in default REPEATABLE-READ isolation mode. This means very long transactions are very bad causing a lot of garbage to be accommodated in the database. It is not limited to undo slots. When we’re speaking about Long Transactions the time is a bad measure. Having transaction in read only database open for weeks does no harm, however if database has very high update rate, say 10K+ rows are modified every second even 5 minute transaction may be considered long as it will be enough to accumulate about 3 million of row changes.
<snip>
(Peter Z)
But what can really harm your the system is the dealy define by the purge lag to improve the flushing.
Assume we define it to innodb_max_purge_lag=200,000, and we do have the number of pending flush as for the above 359585.
Doing calculation as for the previous formula
((359585/200000)X10)-5= 12.97925 ms delay for operation
Hey that's not too bad, I will delay only 12.97925 ms to operation/Insert to help the purge.
But what is not clear is what an operation is for the delay, or more correctly where do the delay really apply?
Ready? Are you sit comfortable?
2) row0mysql.c
1 2 3 4 5 6 7 8 9 10 11 12 |
/*******************************************************************//** Delays an INSERT, DELETE or UPDATE operation if the purge is lagging. */ static void row_mysql_delay_if_needed(void) /*===========================*/ { if (srv_dml_needed_delay) { os_thread_sleep(srv_dml_needed_delay); } } |
3)os0thread.c
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
/*****************************************************************//** The thread sleeps at least the time given in microseconds. */ UNIV_INTERN void os_thread_sleep( /*============*/ ulint tm) /*!< in: time in microseconds */ { #ifdef __WIN__ Sleep((DWORD) tm / 1000); #else struct timeval t; t.tv_sec = tm / 1000000; t.tv_usec = tm % 1000000; select(0, NULL, NULL, NULL, &t); #endif } |
Do you get it?
delay is per ROW.
So assume you have a system checking connections status and traffic, collecting statistics every minute for your 100000 connected users, each usert generate at least 8 insert, plus a consolidation operation to get average 10 minutes each insert. Each insert per user taking, 0.002 second. All traffic manage by 300 threads.
100000 x 8 = 800000/300 = 2666 insert for transactions x 0.002ms each = 5.3sec to run all the operations.
Now what if we apply the delay given we have History list quite above as say before?
I have to sum the 12ms to the 0.002 which will give us 0.122 sec, which means 325 seconds (5.4 minutes) for each thread!!!!
Do not forget the consolidation process, who needs to run each 10 minutes.
So it has to process (1 reads for user per minute x 100000 users) x 10 minutes, split in 10 threads doing it by user id set, assuming each read per user will take 0.001 ms (given already in memory) and 0.002 ms for write.
Without delay we will have = 1,000,000 / 10 = 100,000 x 0.003 = 300sec (5 minutes) for thread.
With delay it will be for each operation 0.122 = 12200 seconds (203 minutes).
Last but not least the consolidation will collide with the inserts, causing possible increase of the delay because the REPEATBLE_READ, and another possible issue is... the consolidation will cause pages to remain in a dirty state for too long, possibly causing serious issue in the REDO log in case of need to free space.
I did push some number a little bit but not too much and only to make the scenario more clear.
Now just to remove some possible doubt:
Are you sure that it really push it by ROW?
let us create a working test.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
CREATE TABLE `City_test2` ( `ID` int(11) NOT NULL DEFAULT '0', `Name` char(35) NOT NULL DEFAULT '', `CountryCode` char(3) NOT NULL DEFAULT '', `District` char(20) NOT NULL DEFAULT '', `Population` int(11) NOT NULL DEFAULT '0', `satisfaction_grade` char(50) DEFAULT NULL, `previous_satgrade` varchar(50) DEFAULT NULL, KEY `satisfaction_grade` (`satisfaction_grade`,`previous_satgrade`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 insert into City_test2 select * from city; mysql> truncate table City_test2;insert into City_test2 select * from city limit 1000; Query OK, 0 rows affected (0.24 sec) Query OK, 1000 rows affected (0.10 sec) Records: 1000 Duplicates: 0 Warnings: 0 |
So running the query we see the History going up and down but never to 0 as before.
but changing the innodb_max_purge_lag and setting it to a lower value then the history say history is 137, set the lag to 100
in theory there should be the following delay as for instructions:
((purge_lag/innodb_max_purge_lag)×10)–5 milliseconds.
result should be
((137/100)X10)-5=8.7 ms
1 2 3 4 5 6 7 8 |
mysql> set global innodb_max_purge_lag=100; Query OK, 0 rows affected (0.00 sec) mysql> truncate table City_test2;insert into City_test2 select * from city limit 1000; Query OK, 0 rows affected (0.10 sec) Query OK, 1000 rows affected (7.40 sec) Records: 1000 Duplicates: 0 Warnings: 0 |
Close enough from my point of view, and it increase at the increasing of the number of rows.
Changing :
innodb_purge_batch_size
innodb_rollback_segments
innodb_purge_threads
Will change how the purge will work but not the delay, innodb_max_purge_lag is the only possible parameter to use, and is not enough.
Conclusions
The new mechanism for the purge is much more flexible, and capable to scale. The fact that is now separate from the main thread reduce a lot the negative effects.
Never the less the capacity that the undo log purge have now, has also possible risks, huge pending flushes means possible huge space on disk and/or huge delay.
The delay behaviour is not new and it was already present MySQL 5.0, but the new capacity bring it to a higher level of risk, specially in 5.5.
In 5.6 the purging thread is much more efficient and I was having really hard time to do get huge history list, but when I get it I had same behaviour.
Whenever we need to tune the purge lag settings, the value needs to be set not in relation to the history list, but in relation to the maximum acceptable delay in the inserts as cumulative value.
References
http://www.mysqlperformanceblog.com/2011/01/12/innodb-undo-segment-siz-and-transaction-isolation/
http://www.mysqlperformanceblog.com/2010/06/10/reasons-for-run-away-main-innodb-tablespace/
http://dimitrik.free.fr/blog/archives/2010/04/mysql-performance-why-purge-thread-in-innodb.html
http://www.mysqlperformanceblog.com/2011/06/09/aligning-io-on-a-hard-disk-raid-the-benchmarks/
http://blogs.innodb.com/wp/2011/04/mysql-5-6-multi-threaded-purge/
http://mysqldump.azundris.com/feeds/categories/1-Strangeness.rss
{joscommentenable}