TusaCentral - Group Replication VS Percona XtraDB Cluster: The True Cost of Consistency

Overview

When building high-availability MySQL environments, the choice between MySQL Group Replication (GR) and Percona XtraDB Cluster (PXC) often comes down to how they handle the eternal database dilemma: data consistency versus performance. dolphin vs goath small

While both provide "synchronous-like" replication, they approach the problem of stale reads—reading data that has been committed on one node but not yet applied on another—in distinct ways. Understanding these differences, and the performance penalties associated with fixing them, is critical for any production environment.

Technology Overviews

MySQL Group Replication (GR)

Group Replication is the native, albeit more recent, high-availability solution built by Oracle for MySQL. It is based on a distributed state machine architecture and uses the Paxos consensus protocol.

Mechanism: When a transaction is committed, it is sent to all group members. The members must agree (consensus) on the order of transactions. Once a majority agrees, the transaction is "certified" and committed on the originator.
Replication Type: Virtually synchronous. The consensus ensures the data is received and ordered across nodes, but the actual applying of the data to the database happens asynchronously in the background.

Percona XtraDB Cluster (PXC)

PXC is an open-source enterprise solution based on Percona Server for MySQL and the Galera Replication library, which is the first and most mature virtually synchronous solution for MySQL.

Mechanism: When a node commits a transaction, it sends it to all other members of the Primary component (active group). All nodes must certify the transaction (check for conflicts), this is done on each node in the cluster, including the node that originates the write-set, before the originating node can finalize the commit.
Replication Type: Strictly synchronous (up to the certification level), asynchronous afterward. If the certification test fails, the node drops the write-set and the cluster rolls back the original transaction. If the test succeeds, however, the transaction commits and the write-set is applied to the rest of the cluster.

The Battle Against "Stale Reads": Why It Matters

The most critical distinction for developers is whether a SELECT query on Node B will immediately see the INSERT just performed on Node A.

In a distributed system, there is a microsecond-to-millisecond gap between a transaction being globally ordered (everyone knows it happened) and being locally applied (the data is physically readable in the table). Reading executed on a secondary during this gap results in a stale read.

Why is avoiding stale reads so critical?

While a stale read might just mean a user temporarily sees their old profile picture after updating it, in many business cases, it breaks the application's core logic:

Financial Transactions: A user deposits $100 on the Primary node and immediately refreshes their balance page, which reads from a Replica. If the read is stale, the balance hasn't updated. The user panics, thinking their money is lost.
E-commerce & Inventory: A customer buys the last item in stock. The next user immediately loads the product page. A stale read tells the second user the item is still available, leading to a cancelled order and a frustrated customer.
Security & Access: A user changes their password or updates a critical permission. If the next authentication request hits a node lagging by just a fraction of a second, their valid login might be rejected, or a revoked session might still be active.

To prevent these scenarios, we must tell the database to enforce strict consistency. But how do GR and PXC handle this, and what does it cost?

Consistency Controls Comparison

Both Group Replication and Percona XtraDB Cluster provide built-in mechanisms to enforce consistency and eliminate stale reads when your application demands it. However, they approach this problem using entirely different variables and distinct levels of granularity. The table below breaks down the specific controls each technology offers, highlighting exactly what it takes to force a node to serve fresh data.

Feature	MySQL Group Replication	Percona XtraDB Cluster
Default Behavior	Reads on secondaries may be stale because the applier thread might be lagging after consensus.	Reads on secondaries may be stale due to asynchronous background applying.
Stale Read Fix	Uses the group_replication_consistency variable.	Uses the wsrep-sync-wait variable.
Consistency Levels	Offers EVENTUAL, BEFORE, AFTER, and BEFORE_AND_AFTER.	Offers granular levels from 0 (default, no checks) up to 7 (checks on all READ, UPDATE, DELETE, INSERT, and REPLACE statements).
The Fix	Setting to AFTER ensures the next read is fresh.	Setting to 7 ensures we have a comparable scenario with GR. However in PXC setting wsrep_sync_wait = 1 will be enough to avoid stale reads.

The True Cost of Being Consistent

If we know stale reads are bad, why don't we just enforce strict consistency everywhere?

An image can help to understand:

dirty comparative2

Because in distributed databases, consistency is incredibly expensive. To test this, we used a 3-node internal lab environment to run a Sysbench-based TPC-C derivative test (50/50 read/write split, running for 600 seconds, scaling from 1 to 1024 threads).

You can find the detailed machine specifications here. The benchmarks were executed using a TPC-C derivative test based on sysbench. Finally—and crucially—you can review the configuration files used for the tests. I maintained the same baseline MySQL configuration across the board, only adjusting the parameters specific to each replication technology.

Scenario 1: Default (Relaxed) Consistency

(GR = EVENTUAL, PXC = wsrep-sync-wait 0)

I want to remind, that MySQL CE and Percona Server are running using Group Replication, while PXC is using galera.

With default settings, both systems allow stale reads.

Both technologies scales well up to 128 threads:

Group Replication performs exceptionally well, handling up to 15K operations/sec before dropping off after 128 threads.
PXC (Galera) is slightly less efficient at peak but scales very nicely and predictably.

At this level, the lag between the moment of commit and the moment the server returns the answer is minimal. But we are entirely exposed to stale reads.

Scenario 2: Enforced Consistency (The Cost)

(GR = AFTER, PXC = wsrep-sync-wait 7)

When we configure the servers to prevent stale reads, the systems must wait for transactions to be fully applied before returning a read. This is where the architectural differences become glaringly apparent:

PXC (Galera): Performance drops but not too much from a peak of ~9K ops/sec (in the previous test) to roughly ~8.5K ops/sec. This is a hit but not huge and the database remains highly functional and stable.
Group Replication: Performance catastrophically drops from ~15K ops/sec (in the previous test) to a staggering ~3.8K ops/sec.

This is the crucial takeaway

Enforcing strict consistency in Group Replication results in a massive ~75% performance penalty. The latency between the commit and the server response increases significantly compared to PXC.

The intermediate way

There is another approach which is to inject the higher consistency only when it is really needed.

The Solution: Session-Level Consistency You do not need, and should not use, full consistency at the global level for general cases. Instead, force consistency only when and where it is critical.

While for Group Replication there is no support for SQL injection hints like SELECT /*+ SET_VAR(...) */, you can enforce this at the session level right before a critical read:

SET SESSION group_replication_consistency = 'AFTER';
-- OR for PXC:
SET SESSION wsrep_sync_wait = 7;

To note that PXC offers more flexibility and you can use hints:

select /*+ SET_VAR(wsrep_sync_wait=7) */ @@session.wsrep_sync_wait ,@@global.wsrep_sync_wait;
+---------------------------+--------------------------+
| @@session.wsrep_sync_wait | @@global.wsrep_sync_wait |
+---------------------------+--------------------------+
|                         7 |                        0 |
+---------------------------+--------------------------+

By isolating these variables to specific sessions (like the immediate redirect after a password change or a checkout process), you ensure data integrity exactly where the business requires it, while allowing the rest of your application to enjoy the high-speed performance of relaxed consistency.

PXC: The performance drop is minimal and the solution is able to provide a consistent delivery with nice scalability up to 256 threads.

Group Replication: The solution suffers from a significant drop, not as if we set the AFTER condition at global level, but still we see a drop of ~52%.

Comparing the two solutions we can see that PXC is able to deal with the additional requested consistency better.

Additional differences

But these are not the only differences we can immediately see. Performing a comparison about resources utilization, we can see that while both solutions move the same amount of data as IO operations:

Yes, for exactly the same load and traffic Group Replication consumes 8GB more than PXC, which in this environment represents 26% memory more, over total available.

Cost that is reflected also as CPU utilization.

Conclusion: How to Survive the Cost

How impactful is enforcing strict consistency at a global level in a production environment? Massively. If you blindly enforce strict consistency globally without understanding your architecture, you will decimate your database throughput. Here is the reality of how the two solutions handle that tax:

The Group Replication Reality: By default (using EVENTUAL consistency), MySQL Group Replication behaves essentially as semi-synchronous replication paired with an automated topology manager (see The Failover Brownout: Rethinking High Availability in MySQL Group Replication). The Primary is allowed to forge ahead and serve traffic even if the Secondaries are lagging significantly behind. The moment you demand strict consistency, the Primary is violently tethered back to the rest of the cluster, and its performance drops off a cliff as it waits for the slowest node.
The PXC Advantage: Percona XtraDB Cluster (PXC) absorbs the "consistency penalty" much more gracefully. While varying consistency levels exist in PXC, adjusting them does not cause the same dramatic throughput shock seen in MGR. This is because PXC enforces a virtually synchronous, high-consistency baseline from the start. It simply does not allow the node receiving writes to deviate too far from the rest of the cluster. You pay a baseline performance tax upfront, but in exchange, you get guaranteed, ironclad High Availability out of the box.

The Final Verdict Modifying consistency values at the global server level should only be done after rigorous load testing and a complete understanding of the performance tax you are about to pay.

Ultimately, it comes down to choosing the right tool for your specific SLA:

If your architecture demands a true, virtually synchronous solution with strict High Availability out of the box, PXC is the purpose-built engine for the job.
If you are looking for a highly automated, semi-synchronous solution, Group Replication delivers excellent default performance—but tuning it to mimic PXC's strict consistency will cost you heavily in throughput.

References

https://www.google.com/url?q=https://mariadb.com/docs/galera-cluster/galera-architecture/certification-based-replication&sa=D&source=docs&ust=1777342808813139&usg=AOvVaw3SAf2g7NO9d681ZJ0VVEMB

https://docs.percona.com/percona-xtradb-cluster/5.7/wsrep-system-index.html#wsrep_sync_wait

Sidebar

Main Menu Mobile

MySQL Blogs

Group Replication VS Percona XtraDB Cluster: The True Cost of Consistency