In the last year we had heard a lot of talking around ProxySQL and mirroring features. Frankly speaking the need to have a “mirroring” tool is something that comes up over and over. Question is “why so”?
The answer is that Mirroring is not an easy task, aside simplistic approaches, if we think carefully about it, we can easily realize that mirroring is not just pushing data from A to B.
When you look at yourself in in the mirror your image is not partial, you don’t miss an eye or part of the nose. What you see is the reflection of the whole face. If you smile or move your head, you will see the same behavior in the mirror, and if not, you will look for a poltergeist.
This is it, mirroring is not only reporting the image back, but also the movements, in some way the behavior.
If you smile a specific set of muscle will contract and other release to allow you that specific expression. A mirror will not do that, but it will mimic your face reflecting the light and showing what your body is doing in full.
Well that in Database transactions is not going to work so easily. We cannot easily mirror server A on server B, unless we will successful replicate B as a copy of A (easy?), and B also receiving exactly the same load in the same moment/time. Also if we succeed there initially, there are various non deterministic factor that will bring the two to diverge in the behavior, like network package transmission.
In datastore we have some level of Mirroring, but that is mainly passive mirroring. A volume is created and maintained as an exact copy of another, all independent behavior is suppressed, data is pass from A to B as it is, reads may happen.
In short, talking of mirroring for database behavior in real production environments, it is not only misleading but conceptually wrong, because we will always have a broken mirror image, like a face reflected without eyes and moving after or before we do. Who wants to have a mirror drive containing different data of its master?
But here the great things. Because the point is that we may not need mirroring at all. We may need something else, something actually more realistic and useful.
In the last years talking with customer and colleagues, what I heard over and over was the need to have of a tool that can allow to replicate, mimic the load of server A on server B.
This is slightly different from a Mirroring functionality, and another good news is that, this is actually achievable. Probably no tool will be able to replicate with 100% accuracy the load performed on a server to another but it would “good enough” to cover most of the cases.
What at then end we are talking about is to use a traffic/Load dispatcher. The tool does not pretend to mirror or mimic the original behavior, but it will use the original load to replicate it on the destination.
Just to mention Percona Playback was one of the tools that was going in that direction.
Anyhow, in my opinion there are 3 main cases where this “good enough” make sense:
- Functional tests (Like to check if a query behave the same on server A and B)
- Load comparison tests (to identify which server between A and B is managing the traffic better)
- Empiric scalability test (Using traffic from A to multiply it and progressively Load B until saturation)
Also in this “simplified” approach we still have a lot to consider and be concerned.
For instance, in our point 1, also for a simple SELECT, we will need to be sure that collecting or redirecting the SELECT itself will not modify the behavior in A.
Secondly, we will receive a dataset from A, the application behavior will, eventually, adjust in regards to the received data.
But what about B? If the server B will send back the data set faster, should it wait for A? Or if for any reason (optimizer) we receive a different data set? Should we consider the differentiation as optimization or an error?
The easiest solution is to ignore B as whole but that may raise some consistency concern when performing full CRUD operations.
Another issue, and this is more related to point 2-3, what about transactions and data consistency?
Let say I have a transaction applying N different queries (again full CRUD model), should we ensure to have the same execution order in B as in A?
Should we respect the transaction isolation, the data validation against UK/FK, rollbacks and commit?
In my opinion the answer is YES, at least for whatever may change the data, we must ensure consistency.
Select can be “skip” but update/delete/insert cannot, and not to be too strict but for simple basic data consistency.
I think that at this point I had made my opinion clear.
Thanks to everybody.