Sharding with ProxySQL
Recently a colleague of mine ask me to provide a simple example on how ProxySQL can perform sharding.
ProxySQL is a very powerful platform that allow us to manipulate and manage our connections and query in a simple but effective way.
In this article I will show you how.
{autotoc enabled=yes}
Before starting is better to clarify some basic concepts.
ProxySQL organize its internal set of servers in Host Groups (HG), each HG can be associate to users and to Query Rules (QR).
Each QR can be final (apply = 1) or can let ProxySQL continue to parse other QR.
A QR can be a rewrite action, or can be a simple match, it can have a specific target HG, or be generic, finally QR are defined using regex.
You can see QR as a sequence of filters and transformation that you can arrange as you like.
These simple basic rules give us enormous flexibility, and allow us to create very simple actions, like a simple query re-write or very complex chains that could see dozens of QR concatenated.
Documentation can be found here
The information related to HG or QR is easily accessible using the the ProxySQL administrator interface, in the tables mysql_servers, mysql_query_rules and stats.stats_mysql_query_rules; the last one allow us to evaluate if and how the rule(s) is used.
About sharding, what ProxySQL can do for us to achieve what we need in a (relatively easy) way?
Some people/company include sharding logic in the application, and use multiple connection to reach the different targets, or have some logic to split the load across several schemas/tables.
ProxySQL allow us to simplify the way connectivity and query distribution is suppose to work reading data in the query or accepting HINTS.
No matter which kind of requirements the sharding exercise can be summarize in few different categories.
- By split the data inside the same container (like having a shard by State where each State is a schema)
- By physical data location (this can have multiple mysql servers in the same room, as well as having them geographically distributed)
- Combination of the two, where I do split by state using a server dedicated and again split by schema/table by whatever (say by gender)
In the following examples I will show how to use ProxySQL to cover the three different scenario defined above and a bit more.
The example below will report text from the Admin ProxySQL interface and from MySQL console.I will mark each one as follow:
- Mc for MySQL console
- Pa for ProxySQL Admin
Please note that mysql console MUST use the -c flag to pass the comments in the query. This because the default behaviour, in mysql console, is to remove the comments.
I am going to illustrate procedures that you can replicate on your laptop.
This because I want you to test directly the ProxySQL functionalities.
For the example describe below I have used PrxySQL v1.2.2 that is going to become the master in few days.
You can download it from :
- git clone https://github.com/sysown/proxysql.git
- git checkout v1.2.2
- Then to compile :
- cd make
- make install
If you need full instructions on how to install and configure ProxySQL than read here and here
Finally you need to have the WORLD test db loaded, world test DB can be found here
First example/exercise is :
Shard inside the same MySQL Server using 3 different schemas split by continent.
Obviously you can have any number of shards and number of relative schemas.
What is relevant here is to demonstrate how traffic can be redirect to different targets (schemas) maintaining the same structure (tables).
This discriminating the target on the base of some relevant information in the Data or pass by the application.
Ok let us roll the ball.
Having :
[Mc] +---------------+-------------+ | Continent | count(Code) | +---------------+-------------+ | Asia | 51 | <-- | Europe | 46 | <-- | North America | 37 | | Africa | 58 | <-- | Oceania | 28 | | Antarctica | 5 | | South America | 14 | +---------------+-------------+
For this exercise you can use single host or multiple servers in replica.
Summarizing you will need:
- 3 hosts: 192.168.1.[5-6-7] (only one needed now but the others are for future use)
- 3 schemas: Continent X + world schema
- 1 user : user_shardRW
- 3 hostgroups: 10, 20, 30 (for future use)
We will create the following Schemas Asia, Africa, Europe first.
[Mc] CREATE schema [Asia|Europe|Africa]; CREATE TABLE Asia.City AS SELECT a.* FROM world.City a JOIN Country ON a.CountryCode = Country.code WHERE Continent='Asia' ; CREATE TABLE Europe.City AS SELECT a.* FROM world.City a JOIN Country ON a.CountryCode = Country.code WHERE Continent='Europe' ; CREATE TABLE Africa.City AS SELECT a.* FROM world.City a JOIN Country ON a.CountryCode = Country.code WHERE Continent='Africa' ; CREATE TABLE Asia.Country AS SELECT * FROM world.Country WHERE Continent='Asia' ; CREATE TABLE Europe.Country AS SELECT * FROM world.Country WHERE Continent='Europe' ; CREATE TABLE Africa.Country AS SELECT * FROM world.Country WHERE Continent='Africa' ;
Create the user:
[Mc] GRANT ALL ON *.* TO user_shardRW@'%' IDENTIFIED BY 'test';
Now let us start to configure the ProxySQL:
[Pa] INSERT INTO mysql_users (username,password,active,default_hostgroup,default_schema) VALUES ('user_shardRW','test',1,10,'test_shard1'); LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK; INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.5',10,3306,100); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.6',20,3306,100); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.7',30,3306,100); LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;
With this we have