Home

Move to Italy

Days
Hours
Minutes
Seconds

Blogs

Support Wikipedia

Login Form



10
Apr
2017
InnoDB Page Merging and Page Splitting
Written by Marco Tusa   

If you met one of the (few) MySQL consultants around the globe and asked him/her to review your queries and/or schemas, I am sure that he/she would tell you something regarding the importance of good primary key(s) design. Especially in the case of InnoDB, I’m sure they started to explain to you about index merges and page splits. These two notions are closely related to performance, and you should take this relationship into consideration when designing any index (not just PKs).

That may sound like mumbo jumbo to you, and you may be right. This is not easy stuff, especially when talking about internals. This is not something you deal with on a regular basis, and often you don’t want to deal with it at all.

But sometimes it’s a necessity. If so, this article is for you.

In this article, I want to shed some light in explaining some of the most unclear, behind the scenes operations in InnoDB: page index creation, page merging and page splitting.

In Innodb all data is an index. You’ve probably heard that as well right? But what exactly does that mean?

 

File-Table Components

Let's say you have MySQL installed, the latest 5.7 version (Percona Server for MySQL, right?  ), and you have a table named wmills in the schema windmills. In the data directory (normally /var/lib/mysql/) you will see that it contains:

 

data/  
windmills/
     wmills.ibd
     wmills.frm

 

This is because the parameter innodb_file_per_table is set to 1 since MySQL 5.6. With that setting, each table in your schema is represented by one file (or many files if the table is partitioned).

What is important here is that the physical container is a file named wmills.ibd. This file is broken up into and contains N number of segments. Each segment is associated with an index.

While a file’s dimensions do not shrink with row-deletions, a segment itself can grow or shrink in relation to a sub-element named extent. An extent can only exist inside a segment and has a fixed dimension of 1MB (in the case of default page size). A page is a sub-element of an extent and has a default size of 16KB.

Given that, an extent can contain a maximum of 64 pages. A page can contain two to N number of rows. The number of rows a page can contain is related to the size of the row, as defined by your table schema. There is a rule within InnoDB that says, at minimum, two rows must fit into a page. Therefore, we have a row-size limit of 8000 bytes.

If you think this sounds like Matryoshka dolls, you are right! An image might help:

segment_extent

InnoDB uses B-trees to organize your data inside pages across extents, within segments.

Roots, Branches, and Leaves

Each page (leaf) contains 2-N rows(s) organized by the primary key. The tree has special pages to manage the different branch(es). These are known as internal nodes (INodes).

Bplustree

This image is just an example, and is not indicative of the real-world output below.

Let’s see the details:

 

ROOT NODE #3: 4 records, 68 bytes
 NODE POINTER RECORD ≥ (id=2)#197
 INTERNAL NODE #197: 464 records, 7888 bytes
 NODE POINTER RECORD ≥ (id=2)#5
 LEAF NODE #5: 57 records, 7524 bytes
 RECORD: (id=2)(uuid="884e471c-0e82-11e7-8bf6-08002734ed50", millid=139, kwatts_s=1956, date="2017-05-01", lo

 

 

Below is the table structure:

 

CREATE TABLE `wmills` (
  `id` bigint(11) NOT NULL AUTO_INCREMENT,
  `uuid` char(36) COLLATE utf8_bin NOT NULL,
  `millid` smallint(6) NOT NULL,
  `kwatts_s` int(11) NOT NULL,
  `date` date NOT NULL,
  `location` varchar(50) COLLATE utf8_bin DEFAULT NULL,
  `active` tinyint(2) NOT NULL DEFAULT '1',
  `time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `strrecordtype` char(3) COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`id`),
  KEY `IDX_millid` (`millid`)
) ENGINE=InnoDB;

 

All styles of B-trees have a point of entry known as the root node. We’ve identified that here as page #3. The root page contains information such as index ID, number of INodes, etc. INode pages contain information about the pages themselves, their value ranges, etc. Finally, we have the leaf nodes, which is where we can find our data. In this example, we can see that leaf node #5 has 57 records for a total of 7524 bytes. Below that line is a record, and you can see the row data.

The concept here is that while you organize your data in tables and rows, InnoDB organizes it in branches, pages, and records. It is very important to keep in mind that InnoDB does not work on a single row basis. InnoDB always operates on pages. Once a page is loaded, it will then scan the page for the requested row/record.

Is that clear up to now? Good. Let’s continue.

Page Internals

A page can be empty or fully filled (100%). The row-records will be organized by PK. For example, if your table is using an AUTO_INCREMENT, you will have the sequence ID = 1, 2, 3, 4, etc.

Locality_1

A page also has another important attribute: MERGE_THRESHOLD. The default value of this parameter is 50% of the page, and it plays a very important role in InnoDB merge activity:

Locality_2

While you insert data, the page is filled up sequentially if the incoming record can be accommodated inside the page.

When a page is full, the next record will be inserted into the NEXT page:

Locality_4

Given the nature of B-trees, the structure is browsable not only top-down following the branches, but also horizontally across the leaf nodes. This is because each leaf node page has a pointer to the page that contains the NEXT record value in the sequence.

For example, Page #5 has a reference to the next page, Page #6. Page #6 has references backward to the previous page (Page #5) and a forward to the next page (Page #7).

This mechanism of a linked list allows for fast, in-order scans (i.e., Range Scans). As mentioned before, this is what happens when you are inserting and have a PK based on AUTO_INCREMENT. But what happens if I start to delete values?

 

Page Merging

When you delete a record, the record is not physically deleted. Instead, it flags the record as deleted and the space it used becomes reclaimable.

Locality_3

When a page has received enough deletes to match the MERGE_THRESHOLD (50% of the page size by default), InnoDB starts to look to the closest pages (NEXT and PREVIOUS) to see if there is any chance to optimize the space utilization by merging the two pages.

Locality_4

In this example, Page #6 is utilizing less than half of its space. Page #5 received many deletes and is also now less than 50% used. From InnoDB’s perspective, they are mergeable:

Locality_5

The merge operation results in Page #5 containing its previous data plus the data from Page #6. Page #6 becomes an empty page, usable for new data.

Locality_6

The same process also happens when we update a record and the size of the new record brings the page below the threshold.

The rule is: Merges happen on delete and update operations involving close linked pages.

If a merge operation is successful, the index_page_merge_successful metric in INFORMATION_SCHEMA.INNODB_METRICS is incremented.

 

Page Splits

As mentioned above, a page can be filled up to 100%. When this happens, the next page takes new records.

But what if we have the following situation?

Locality_7

Page #10 doesn’t have enough space to accommodate the new (or updated) record. Following the next page logic, the record should go on Page #11. However:

Locality_9

Page #11 is also full, and data cannot be inserted out of order. So what can be done?

Remember the linked list we spoke about? At this moment Page #10 has Prev=9 and Next=11.

What InnoDB will do is (simplifying):

  1. Create a new page
  2. Identify where the original page (Page #10) can be split (at the record level)
  3. Move records
  4. Redefine the page relationships

Locality_8

A new Page #12 is created:

Locality_10

Page #11 stays as it is. The thing that changes is the relationship between the pages:

  • Page #10 will have Prev=9 and Next=12
  • Page #12 Prev=10 and Next=11
  • Page #11 Prev=12 and Next=13

The path of the B-tree still sees consistency since it is following a logical organization. However, physically the page is located out of order, and in most cases in a different extent.

As a rule we can say: Page splits happens on Insert or Update, and cause page dislocation (in many cases on different extents).

InnoDB tracks the number of page splits in INFORMATION_SCHEMA.INNODB_METRICS. Look for index_page_splits and index_page_reorg_attempts/successful metrics.

Once the split page is created, the only way to move back is to have the created page drop below the merge threshold. When that happens, InnoDB moves the data from the split page with a merge operation.

The other way is to reorganize the data by OPTIMIZE the table. This can be a very heavy and long process, but often is the only way to recover from a situation where too many pages are located in sparse extents.

Another aspect to keep in mind is that during merge and split operations, InnoDB acquires an x-latch to the index tree. On a busy system, this can easily become a source of concern. This can cause index latch contention. If no merges and splits (aka writes) touch only a single page, this is called an “optimistic” update in InnoDB, and the latch is only taken in S. Merges and splits are called “pessimistic” updates, and take the latch in X.

 

My Primary Key

A good Primary Key (PK) is not only important for retrieving data, but also correctly distributing the data inside the extents while writing (which is also relevant in the case of split and merge operations).

In the first case, I have a simple auto-increment. In the second my PK is based on an ID (1-200 range) and an auto-increment value. In my third, I have the same ID (1-200 range) but associate with a UUID.

When inserting, InnoDB must add pages. This is read as a SPLIT operation:

Page Merging and Page Splitting

The behavior is quite different depending on the kind of Primary Key I use.

The first two cases will have more “compact” data distribution. This means they will also have better space utilization, while the semi-random nature of the UUID will cause a significant “sparse” page distribution (causing a higher number of pages and related split operations).

In the case of merges, the number of attempts to merge is even more different by PK type.

merges_1

On Insert-Update-Delete operations, auto-increment has less page merge attempts and 9.45% less of a success ratio than the other two types. The PK with UUID (on the side other of the spectrum) has a higher number of merge attempts, but at the same time also a significantly higher success ratio at 22.34%, given that the “sparse” distribution left many pages partially empty.

The PK values with similar numbers also come from a secondary index.

Conclusion

MySQL/InnoDB constantly performs these operations, and you have very limited visibility of them. But they can bite you, and bite hard, especially if using a spindle storage VS SSD (which have different issues, by the way).

The sad story is there is also very little we can do to optimize this on the server side using parameters or some other magic. But the good news is there is A LOT that can be done at design time.

Use a proper Primary Key and design a secondary index, keeping in mind that you shouldn’t abuse of them. Plan proper maintenance windows on the tables that you know will have very high levels of inserts/deletes/updates.

This is an important point to keep in mind. In InnoDB you cannot have fragmented records, but you can have a nightmare at the page-extent level. Ignoring table maintenance will cause more work at the IO level, memory and InnoDB buffer pool.

You must rebuild some tables at regular intervals. Use whatever tricks it requires, including partitioning and external tools (pt-osc). Do not let a table to become gigantic and fully fragmented.

Wasting disk space? Need to load three pages instead one to retrieve the record set you need? Each search causes significantly more reads?
That’s your fault; there is no excuse for being sloppy!

Happy MySQL to everyone!

 

Acknowledgments

Laurynas Biveinis: who had the time and patience to explain some internals to me.

Jeremy Cole: for his project InnoDB_ruby (that I use constantly).

Last Updated on Monday, 10 April 2017 22:57
 
15
Jan
2017
Setup ProxySQL as High Available (and not a SPOF)
Written by Marco Tusa   

During the last few months we had a lot of opportunities to present and discuss about a very powerful tool that will become more and more used in the architectures supporting MySQL, ProxySQL.

ProxySQL is becoming every day more flexible, solid, performant and used (http://www.proxysql.com/ and recent http://www.proxysql.com/compare).

 

This is it, the tool is a winner in comparing it with similar ones, and we all need to have a clear(er) idea on how integrate it in our architectures in order to achieve the best results.

 

The first to keep in mind is that ProxySQL is not natively supporting any high availability solution, in short we can setup a cluster of MySQL(s) and achieve 4 or even 5 nines of HA, but if we include ProxySQL, as it is, and as single block, our HA will include a single point of failure (SPOF) that will drag us down in case of crash.

 

To solve this, the most common solution so far had be to setup ProxySQL as part of a tile architecture, where Application/ProxySQL where deploy together.

tileProxy

 

This is a good solution for some cases, and for sure it reduce the network hops, but what it may be less than practical when our architecture has a very high number of tiles.
Say 100 or 400 application servers, not so unusual nowadays. 
In that case managing the ProxySQL will be challenging, but most problematic it will be the fact that ProxySQL must perform several checks on the destination servers (MySQL), and if we have 400 instance of ProxySQL we will end up keeping our databases busy just because the checks.

In short ... is not a smart move.

 

Another possible approach, used so far was to have two layers of ProxySQL, one close to the application, another in the middle to finally connect to the database.

I personally don't like this approach for many reasons, but the most relevants are that this approach create additional complexity in the management of the platform, and it adds network hops.

ProxyCascade

 

So what can be done?

I like to have things simple, I love the KISS principle, I love to have things simple and because I am lazy I love to reuse the wheel instead re-invent things that someone else had already invent.

Last thing I like to have my customers not depending from me or any other colleague, once I am done, and gone, they must be able to manage their things, understand their things, fix their things by themselves.

 

Anyhow as said I like simple things. So my point here is the following:  

  • excluding the cases where a tile (application/ProxySQL) make sense;  
  • or when in the cloud and tools like ELB (Elastic load balancer) exist; 
  • or on architecture already including a balancer.

What I can use for the remaining cases?

The answer comes with existing solutions and combining existing blocks, KeepAlived + ProxySQl + MySQL.

keepalived_logo

For KeepAlived explanation visit http://www.keepalived.org/.

Short description
"Keepalived is a routing software written in C. The main goal of this project is to provide simple and robust facilities for loadbalancing and high-availability to Linux system and Linux based infrastructures. Loadbalancing framework relies on well-known and widely used Linux Virtual Server (IPVS) kernel module providing Layer4 loadbalancing. Keepalived implements a set of checkers to dynamically and adaptively maintain and manage loadbalanced server pool according their health. On the other hand high-availability is achieved by VRRP protocol. VRRP is a fundamental brick for router failover. In addition, Keepalived implements a set of hooks to the VRRP finite state machine providing low-level and high-speed protocol interactions. Keepalived frameworks can be used independently or all together to provide resilient infrastructures."

Bingo! this is exactly what we need for our ProxySQL setup.

Below I will show how to setup: 

  • Simple solution base on a single VIP 
  • More complex solution using multiple VIPs 
  • Even more complex solution using virtual VIPs and virtual servers.

Just remind that what we want to achieve is to prevent ProxySQL to become a SPOF, that's it. 

While achieving that we need to reduce as much as possible the network hops and keep the solution SIMPLE.

 

Another important concept to keep in mind is that ProxySQL (re)start take place in less then a second. 

This means that if it crash and it can be restarted by the angel process, having it doing so and recovery the service is much more efficient than to have any kind of failover mechanism to take place.

As such whenever you plan your solution keep in mind the ~1 second time of ProxySQL restart as base line.

 

Ready?

Let's go.

Setup

Choose 3 machines that will host the combination of Keepalive and ProxySQL.

In the following example I will use 3 machines for ProxySQL and Keepalived and 3 hosting PXC, but you can have the Keepalived+ProxySQL whenever you like also on the same PXC box.

For the following examples we will have:

PXC
node1 192.168.0.5 galera1h1n5
node2 192.168.0.21 galera2h2n21
node3 192.168.0.231 galera1h3n31
 
ProxySQL-Keepalived
test1 192.168.0.11
test2 192.168.0.12
test3 192.168.0.235
 
VIP 192.168.0.88 /89/90
 

 

 

To check I will use this table, please create it in your MySQL server:

DROP TABLE  test.`testtable2`;
 CREATE TABLE test.`testtable2` (
  `autoInc` bigint(11) NOT NULL AUTO_INCREMENT,
  `a` varchar(100) COLLATE utf8_bin NOT NULL,
  `b` varchar(100) COLLATE utf8_bin NOT NULL,
  `host` varchar(100) COLLATE utf8_bin NOT NULL,
  `userhost` varchar(100) COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`autoInc`)
) ENGINE=InnoDB ROW_FORMAT=DYNAMIC;

 

 

And this bash TEST command to use later

while [ 1 ];do export mydate=$(date +'%Y-%m-%d %H:%M:%S.%6N');
mysql --defaults-extra-file=./my.cnf -h 192.168.0.88 -P 3311  --skip-column-names  
-b -e "BEGIN;set @userHost='a';select concat(user,'_', host) into @userHost from information_schema.processlist  where user = 'load_RW' limit 1;insert into test.testtable2 values(NULL,'$mydate',SYSDATE(6),@@hostname,@userHost);commit;select * from test.testtable2 order by 1 DESC limit 1" ;
sleep 1;done

 

  1. Install ProxySQL (refer to https://github.com/sysown/proxysql/wiki#installation)
  2. Install Keepalived (yum install keepalived; apt-get install keepalived)
  3. Setup ProxySQL users and servers

Once you have your ProxySQL up (run the same on all ProxySQL nodes, it is much simpler), connect to the Admin interface and:

 

DELETE FROM mysql_replication_hostgroups WHERE writer_hostgroup=500 ;
DELETE FROM mysql_servers WHERE hostgroup_id IN (500,501);
 
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.0.5',500,3306,1000000000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.0.5',501,3306,100);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.0.21',500,3306,1000000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.0.21',501,3306,1000000000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.0.231',500,3306,100);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.0.231',501,3306,1000000000);
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;
 
DELETE FROM mysql_users WHERE username='load_RW';
INSERT INTO mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) VALUES ('load_RW','test',1,500,'test',1);
LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK;
 
DELETE FROM mysql_query_rules WHERE rule_id IN (200,201);
INSERT INTO mysql_query_rules (rule_id,username,destination_hostgroup,active,retries,match_digest,apply) VALUES(200,'load_RW',501,1,3,'^SELECT.*FOR UPDATE',1);
INSERT INTO mysql_query_rules (rule_id,username,destination_hostgroup,active,retries,match_digest,apply) VALUES(201,'load_RW',501,1,3,'^SELECT ',1); 
 
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;

 

 

Create a my.cnf file in your default dir with

[mysql]
user=load_RW
password=test

 

Simple Setup using a single VIP 3 ProxySQL 3 Galera nodes

proxy_keep_single

 

First setup the keepalive configuration file (/etc/keepalived/keepalived.conf):

global_defs {
  # Keepalived process identifier
  lvs_id proxy_HA
}
# Script used to check if Proxy is running
vrrp_script check_proxy {
  script "killall -0 proxysql"
  interval 2
  weight 2
}
# Virtual interface
# The priority specifies the order in which the assigned interface to take over in a failover
vrrp_instance VI_01 {
  state MASTER 
  interface em1
  virtual_router_id 51
  priority <calculate on the WEIGHT for each node>
 
  # The virtual ip address shared between the two loadbalancers
  virtual_ipaddress {
    192.168.0.88 dev em1
  }
  track_script {
    check_proxy
  }
}
 

 

 

Given the above and given I want to have test1 as main priority will be set as:

 

test1 = 101
test2 = 100
test3 = 99 

 

 

Modify the config in each node following the above values and (re)start keepalived.

If all is set correctly in the system log of the TEST1 machine you will see:

 

Jan 10 17:56:56 mysqlt1 systemd: Started LVS and VRRP High Availability Monitor.
Jan 10 17:56:56 mysqlt1 Keepalived_healthcheckers[6183]: Configuration is using : 6436 Bytes
Jan 10 17:56:56 mysqlt1 Keepalived_healthcheckers[6183]: Using LinkWatch kernel netlink reflector...
Jan 10 17:56:56 mysqlt1 Keepalived_vrrp[6184]: Configuration is using : 63090 Bytes
Jan 10 17:56:56 mysqlt1 Keepalived_vrrp[6184]: Using LinkWatch kernel netlink reflector...
Jan 10 17:56:56 mysqlt1 Keepalived_vrrp[6184]: VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(10,11)]
Jan 10 17:56:56 mysqlt1 Keepalived_vrrp[6184]: VRRP_Script(check_proxy) succeeded
Jan 10 17:56:57 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) Transition to MASTER STATE
Jan 10 17:56:57 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) Received lower prio advert, forcing new election
Jan 10 17:56:57 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) Received higher prio advert
Jan 10 17:56:57 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) Entering BACKUP STATE
Jan 10 17:56:58 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) forcing a new MASTER election
...
Jan 10 17:57:00 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) Transition to MASTER STATE
Jan 10 17:57:01 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) Entering MASTER STATE <-- MASTER
Jan 10 17:57:01 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) setting protocol VIPs.
Jan 10 17:57:01 mysqlt1 Keepalived_healthcheckers[6183]: Netlink reflector reports IP 192.168.0.88 added
Jan 10 17:57:01 mysqlt1 avahi-daemon[937]: Registering new address record for 192.168.0.88 on em1.IPv4.
Jan 10 17:57:01 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) Sending gratuitous ARPs on em1 for 192.168.0.88
 

 

 

While in the other two:

 

Jan 10 17:56:59 mysqlt2 Keepalived_vrrp[13107]: VRRP_Instance(VI_01) Entering BACKUP STATE <--- 

 

Which means node is there as ... :D Backup.

 

Now is time to test our connection to our ProxySQL pool.

From an application node or just from your laptop.

Open 3 terminals and in each one:

 

 watch -n 1 'mysql -h <IP OF THE REAL PROXY (test1|test2|test3)> -P 3310 -uadmin -padmin -t -e "select * from stats_mysql_connection_pool where hostgroup in (500,501,9500,9501) order by hostgroup,srv_host ;" -e " select srv_host,command,avg(time_ms), count(ThreadID) from stats_mysql_processlist group by srv_host,command;" -e "select * from stats_mysql_commands_counters where  Total_Time_us > 0;"'

 

 

You will see that unless you are already sending queries to proxies, you have the Proxies just doing nothing.
Time to start the test bash as I indicate above.
If everything is working correctly you will see the bash command reporting this:

 

+----+----------------------------+----------------------------+-------------+----------------------------+
| 49 | 2017-01-10 18:12:07.739152 | 2017-01-10 18:12:07.733282 | galera1h1n5 | load_RW_192.168.0.11:33273 |
+----+----------------------------+----------------------------+-------------+----------------------------+
  ID    execution time in the bash   exec time inside mysql     node hostname   user and where the connection is coming from

 

 

The other 3 running bash commands will show that ONLY the ProxySQL in TEST1 is currently getting/serving requests, because is the one with the VIP.

Like:

 

+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host      | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 500       | 192.168.0.21  | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 629        |
| 500       | 192.168.0.231 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 510        |
| 500       | 192.168.0.5   | 3306     | ONLINE | 0        | 0        | 3      | 0       | 18      | 882             | 303             | 502        |
| 501       | 192.168.0.21  | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 629        |
| 501       | 192.168.0.231 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 510        |
| 501       | 192.168.0.5   | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 502        |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
+---------+---------------+-----------+-----------+-----------+---------+---------+----------+----------+-----------+-----------+--------+--------+---------+----------+
| Command | Total_Time_us | Total_cnt | cnt_100us | cnt_500us | cnt_1ms | cnt_5ms | cnt_10ms | cnt_50ms | cnt_100ms | cnt_500ms | cnt_1s | cnt_5s | cnt_10s | cnt_INFs |
+---------+---------------+-----------+-----------+-----------+---------+---------+----------+----------+-----------+-----------+--------+--------+---------+----------+
| BEGIN   | 9051          | 3         | 0         | 0         | 0       | 3       | 0        | 0        | 0         | 0         | 0      | 0      | 0       | 0        |
| COMMIT  | 47853         | 3         | 0         | 0         | 0       | 0       | 0        | 3        | 0         | 0         | 0      | 0      | 0       | 0        |
| INSERT  | 3032          | 3         | 0         | 0         | 1       | 2       | 0        | 0        | 0         | 0         | 0      | 0      | 0       | 0        |
| SELECT  | 8216          | 9         | 3         | 0         | 3       | 3       | 0        | 0        | 0         | 0         | 0      | 0      | 0       | 0        |
| SET     | 2154          | 3         | 0         | 0         | 3       | 0       | 0        | 0        | 0         | 0         | 0      | 0      | 0       | 0        |
+---------+---------------+-----------+-----------+-----------+---------+---------+----------+----------+-----------+-----------+--------+--------+---------+----------+

 

 

So nothing special right, all as expected. 

Time to see if the failover-failback works along the chain.

Let us kill the ProxySQL on TEST1 while the test bash command is running.

 

killall -9 proxysql 

 

 

Here is what you will get:

 

+----+----------------------------+----------------------------+-------------+----------------------------+
| 91 | 2017-01-10 18:19:06.188233 | 2017-01-10 18:19:06.183327 | galera1h1n5 | load_RW_192.168.0.11:33964 |
+----+----------------------------+----------------------------+-------------+----------------------------+
ERROR 2003 (HY000): Can't connect to MySQL server on '192.168.0.88' (111)
+----+----------------------------+----------------------------+-------------+----------------------------+
| 94 | 2017-01-10 18:19:08.250093 | 2017-01-10 18:19:11.250927 | galera1h1n5 | load_RW_192.168.0.12:39635 | <-- note 
+----+----------------------------+----------------------------+-------------+----------------------------+
 

 

 

the source had change  but not the PXC node.

If you check the system log for TEST1:

 

Jan 10 18:19:06 mysqlt1 Keepalived_vrrp[6184]: VRRP_Script(check_proxy) failed
Jan 10 18:19:07 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) Received higher prio advert
Jan 10 18:19:07 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) Entering BACKUP STATE
Jan 10 18:19:07 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) removing protocol VIPs.
Jan 10 18:19:07 mysqlt1 Keepalived_healthcheckers[6183]: Netlink reflector reports IP 192.168.0.88 removed

 

 

 

While on TEST2

 

Jan 10 18:19:08 mysqlt2 Keepalived_vrrp[13107]: VRRP_Instance(VI_01) Transition to MASTER STATE
Jan 10 18:19:09 mysqlt2 Keepalived_vrrp[13107]: VRRP_Instance(VI_01) Entering MASTER STATE
Jan 10 18:19:09 mysqlt2 Keepalived_vrrp[13107]: VRRP_Instance(VI_01) setting protocol VIPs.
Jan 10 18:19:09 mysqlt2 Keepalived_healthcheckers[13106]: Netlink reflector reports IP 192.168.0.88 added
Jan 10 18:19:09 mysqlt2 Keepalived_vrrp[13107]: VRRP_Instance(VI_01) Sending gratuitous ARPs on em1 for 192.168.0.88

 

 

Simple ... and elegant. No need to re-invent the wheel and works smooth.


The total time for the recovery given the ProxySQL crash had be of 5.06 seconds,

considering the wider window ( last application start, last recovery in PXC  2017-01-10 18:19:06.188233|2017-01-10 18:19:11.250927)

As such the worse scenario, keeping in mind we run the check for the ProxySQL every 2 seconds (real recover max window 5-2=3 sec).

 

OK what about fail-back?

Let us restart the proxysql service:

 

/etc/init.d/proxysql start (or systemctl)

 

 

Here the output:

 

+-----+----------------------------+----------------------------+-------------+----------------------------+
| 403 | 2017-01-10 18:29:34.550304 | 2017-01-10 18:29:34.545970 | galera1h1n5 | load_RW_192.168.0.12:40330 |
+-----+----------------------------+----------------------------+-------------+----------------------------+
+-----+----------------------------+----------------------------+-------------+----------------------------+
| 406 | 2017-01-10 18:29:35.597984 | 2017-01-10 18:29:38.599496 | galera1h1n5 | load_RW_192.168.0.11:34640 |
+-----+----------------------------+----------------------------+-------------+----------------------------+

 

 

Worse recovery time = 4.04 seconds of which 2 of delay because the check interval.

 

Of course the test is running every second and is running one single operation, as such the impact is minimal (no error in fail-back), and recovery longer.

But I think I have made clear the concept here. 

Let see another thing... is the failover working as expected? Test1 -> 2 -> 3 ??

 

Let us kill 1 - 2 and see:

 

Kill Test1 :
+-----+----------------------------+----------------------------+-------------+----------------------------+
| 448 | 2017-01-10 18:35:43.092878 | 2017-01-10 18:35:43.086484 | galera1h1n5 | load_RW_192.168.0.11:35240 |
+-----+----------------------------+----------------------------+-------------+----------------------------+
+-----+----------------------------+----------------------------+-------------+----------------------------+
| 451 | 2017-01-10 18:35:47.188307 | 2017-01-10 18:35:50.191465 | galera1h1n5 | load_RW_192.168.0.12:40935 |
+-----+----------------------------+----------------------------+-------------+----------------------------+
...
Kill Test2
+-----+----------------------------+----------------------------+-------------+----------------------------+
| 463 | 2017-01-10 18:35:54.379280 | 2017-01-10 18:35:54.373331 | galera1h1n5 | load_RW_192.168.0.12:40948 |
+-----+----------------------------+----------------------------+-------------+----------------------------+
+-----+----------------------------+----------------------------+-------------+-----------------------------+
| 466 | 2017-01-10 18:36:08.603754 | 2017-01-10 18:36:09.602075 | galera1h1n5 | load_RW_192.168.0.235:33268 |
+-----+----------------------------+----------------------------+-------------+-----------------------------+

 

This image is where you should be at the end:

proxy_keep_single_failover

 

In this case given I have done one kill immediately after the other, Keepalived had take a bit more in failing over, but still it did correctly and following the planned chain.

Fail-back as smooth as usual:

1
2
3
4
5
6
+-----+----------------------------+----------------------------+-------------+-----------------------------+
| 502 | 2017-01-10 18:39:18.749687 | 2017-01-10 18:39:18.749688 | galera1h1n5 | load_RW_192.168.0.235:33738 |
+-----+----------------------------+----------------------------+-------------+-----------------------------+
+-----+----------------------------+----------------------------+-------------+----------------------------+
| 505 | 2017-01-10 18:39:19.794888 | 2017-01-10 18:39:22.800800 | galera1h1n5 | load_RW_192.168.0.11:35476 |
+-----+----------------------------+----------------------------+-------------+----------------------------+

 

Let us see now another case.

The case above is nice and simple, but as a cavet.

I can access only one ProxySQL a time, which may be good or not.

In any case it may be nice to have the possibility to choose, and with Keepalived you can. 

We can actually set an X number of VIP and associate them to each test box.

The result will be that each server hosting ProxySQL will also host a VIP, and will be eventually able to fail-over to any of the other two servers.

proxy_keep_multiple

 

Failing-over/Back will be fully managed by Keepalived, checking as we did before if ProxySQL is running.
Example of configuration for one node can be the one below:

 

global_defs {
  # Keepalived process identifier
  lvs_id proxy_HA
}
# Script used to check if Proxy is running
vrrp_script check_proxy {
  script "killall -0 proxysql"
  interval 2
  weight 3
}
 
# Virtual interface 1
# The priority specifies the order in which the assigned interface to take over in a failover
vrrp_instance VI_01 {
  state MASTER
  interface em1
  virtual_router_id 51
  priority 102
 
  # The virtual ip address shared between the two loadbalancers
  virtual_ipaddress {
    192.168.0.88 dev em1
  }
  track_script {
    check_proxy
  }
}
 
# Virtual interface 2
# The priority specifies the order in which the assigned interface to take over in a failover
vrrp_instance VI_02 {
  state MASTER
  interface em1
  virtual_router_id 52
  priority 100
 
  # The virtual ip address shared between the two loadbalancers
  virtual_ipaddress {
    192.168.0.89 dev em1
  }
  track_script {
    check_proxy
  }
}
 
# Virtual interface 3
# The priority specifies the order in which the assigned interface to take over in a failover
vrrp_instance VI_03 {
  state MASTER
  interface em1
  virtual_router_id 53
  priority 99
 
  # The virtual ip address shared between the two loadbalancers
  virtual_ipaddress {
    192.168.0.90 dev em1
  }
  track_script {
    check_proxy
  }
}

 

 

The tricky part in this case is to play with the PRIORITY for each VIP and each server such that you will NOT assign the same ip twice.

The whole set of configs can be found here

 

Performing the check with the test bash as above we have:

 

Test 1 crash
+-----+----------------------------+----------------------------+-------------+----------------------------+
| 422 | 2017-01-11 18:30:14.411668 | 2017-01-11 18:30:14.344009 | galera1h1n5 | load_RW_192.168.0.11:55962 |
+-----+----------------------------+----------------------------+-------------+----------------------------+
ERROR 2003 (HY000): Can't connect to MySQL server on '192.168.0.88' (111)
+-----+----------------------------+----------------------------+-------------+----------------------------+
| 426 | 2017-01-11 18:30:18.531279 | 2017-01-11 18:30:21.473536 | galera1h1n5 | load_RW_192.168.0.12:49728 | <-- new server
+-----+----------------------------+----------------------------+-------------+----------------------------+
....
Test 2 crash
+-----+----------------------------+----------------------------+-------------+----------------------------+
| 450 | 2017-01-11 18:30:27.885213 | 2017-01-11 18:30:27.819432 | galera1h1n5 | load_RW_192.168.0.12:49745 |
+-----+----------------------------+----------------------------+-------------+----------------------------+
ERROR 2003 (HY000): Can't connect to MySQL server on '192.168.0.88' (111)
+-----+----------------------------+----------------------------+-------------+-----------------------------+
| 454 | 2017-01-11 18:30:30.971708 | 2017-01-11 18:30:37.916263 | galera1h1n5 | load_RW_192.168.0.235:33336 | <-- new server
+-----+----------------------------+----------------------------+-------------+-----------------------------+
 

 

 

Final state of IPs on Test3:

 

enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:c2:16:3f brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.235/24 brd 192.168.0.255 scope global enp0s8   <-- Real IP
       valid_lft forever preferred_lft forever
    inet 192.168.0.90/32 scope global enp0s8    <--- VIP 3
       valid_lft forever preferred_lft forever
    inet 192.168.0.89/32 scope global enp0s8    <--- VIP 2
       valid_lft forever preferred_lft forever
    inet 192.168.0.88/32 scope global enp0s8    <--- VIP 1
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fec2:163f/64 scope link 
       valid_lft forever preferred_lft forever
 

 

And this is the image:

proxy_keep_multiple_full_failover

 

 

Recovery times:

 

  test 1 crash = 7.06 sec (worse case scenario)
  test 2 crash = 10.03 sec (worse case scenario)

 

 

Conclusions

In this example I had just use a test that checks the process, but a check can be anything reporting 0|1, the limit is define only from what you need.

The times for the failover can be significant shorter, reducing the check time and considering only the time taken to move the VIP, I had prefer to show the worse case scenario considering an application with a second interval, but that is a pessimistic view of what normally happens with real traffic. 

I was looking for a simple, simple simple way to add HA to ProxySQL, something that can be easily integrate with automation and that is actually also well established and maintained.

In my opinion using Keepalived is a good solution because it match all the above expectations. 

Implementing a set of ProxySQL and have Keepalived manage the failover between them is pretty easy, but you can expand the usage (and the complexity) if you need, counting on tools that are already part of the Linux stack, no need to re-invent the wheel with crazy mechanism.

If you want to have fun doing crazy things... at least start from something that helps you to go beyond the basiscs. 

For instance I was also playing a bit with keepalived and virtual server, creating set of redundant Proxysql with load balancers and ... .. but this is another story (blog).

 

Great MySQL & ProxySQL to all!

Last Updated on Saturday, 14 January 2017 21:47
 
More Articles...
«StartPrev12345678910NextEnd»

Page 2 of 23
 

Connecting from

Your IP: 54.196.107.247

Location:

Who's Online

We have 114 guests online