PHP Developer News

How To Inject an Empty XA Transaction in MySQL

If you are using XA transactions, then you’ve likely run into a few replication issues with the 2PCs (2 Phase Commits). Here is a common error we see in Percona’s Managed Services and a few ways to handle it, including injecting an empty XA transaction.
Last_Error: Error 'XAER_NOTA: Unknown XID' on query. Default database: 'punisher'. Query: 'XA COMMIT X'1a',X'a1',1'

What Does it Mean?
It means that replication has tried to commit an XID (XA transaction ID) that does not exist on the server. We can verify that it does not exist by checking:
replica1 [localhost:20002] {msandbox} ((none)) > XA RECOVER CONVERT XID;
| formatID | gtrid_length | bqual_length | data   |
|        1 |            1 |            1 | 0x2BB2 |
1 row in set (0.00 sec)
In this case, there is a prepared XA transaction on the server but it is XID X’2B’,X’B2’,1? not X’1a’,X’a1’,1?. So indeed, the XID does not exist.
How Do We Fix It?
A few ways… When using anonymous replication, it can be skipped like any other error:
When using GTIDs, it can be skipped by the typical injecting an empty GTID:
Another option is that we can inject an empty XA transaction, much like we do with GTID. Then we can resume replication so it can naturally commit that XID.
To prepare an empty XA, first copy the SQL + XID from the error. In this case “XA COMMIT X’1a’,X’a1?,1”.
Now transform it into three statements, and run them on the erred replica.
XA START X'1a',X'a1',1;
XA END X'1a',X'a1',1;
XA PREPARE X'1a',X'a1',1;
This will have created a prepared XA transaction on the server. We can verify by running:
replica1 [localhost:20002] {msandbox} ((none)) > XA RECOVER CONVERT XID;
| formatID | gtrid_length | bqual_length | data   |
|        1 |            1 |            1 | 0x2BB2 |
|        1 |            1 |            1 | 0x1AA1 | <--- this is the transaction we just created
2 rows in set (0.00 sec)
So, let’s start replication:
replica1 [localhost:20002] {msandbox} ((none)) > START SLAVE;
ERROR 1399 (XAE07): XAER_RMFAIL: The command cannot be executed when global transaction is in the  PREPARED state
Uh oh, now what? When you prepare an XA transaction on a server, your session cannot execute any other SQL. You must disconnect from MySQL, reconnect, then start replication.
replica1 [localhost:20002] {msandbox} ((none)) > exit
replica1 [localhost:20002] {msandbox} ((none)) > START SLAVE;
Query OK, 0 rows affected (0.02 sec)
Regardless of how you handled the error, it is recommended to run a checksum to validate data consistency.
How Does This Happen?
2PCs write to the binlogs in two…phases ???? The first phase contains the {XA START/transaction SQL/XA END/XA PREPARE}. Think of all those statements as a single GTID. Once the XA PREPARE command has run, that whole transaction is written to the binary log so these statements will always be written together. Example:
# at 903
#200908 20:53:35 server id 100  end_log_pos 1004 CRC32 0xd2f9e5c0       Query   thread_id=4     exec_time=0     error_code=0
SET TIMESTAMP=1599598415/*!*/;
XA START X'1a',X'a1',1
# at 1004
#200908 20:53:35 server id 100  end_log_pos 1055 CRC32 0xad24c30d       Table_map: `punisher`.`t1` mapped to number 108
# at 1055
#200908 20:53:35 server id 100  end_log_pos 1100 CRC32 0xf7100e24       Write_rows: table id 108 flags: STMT_END_F
### INSERT INTO `punisher`.`t1`
### SET
###   @1=2
###   @2='2020-09-08 20:53:35'
# at 1100
#200908 20:53:44 server id 100  end_log_pos 1191 CRC32 0x314c857d       Query   thread_id=4     exec_time=0     error_code=0
SET TIMESTAMP=1599598424/*!*/;
XA END X'1a',X'a1',1
# at 1191
#200908 20:53:44 server id 100  end_log_pos 1229 CRC32 0x829495e8       XA PREPARE X'1a',X'a1',1
XA PREPARE X'1a',X'a1',1
Now we have a prepared XA on the source and replicas (which can hold row locks and block other transactions). The transaction can now be committed or rolled back, this depends on the second phase and this is where the problems come in.
The second phase commit/rollback can come seconds later, minutes later, days later, or even never. It all depends on when/if the Transaction Manager issues the command. In this case, it was 4 minutes later:
# at 1294
#200908 20:57:37 server id 100  end_log_pos 1388 CRC32 0xe38c4e46       Query   thread_id=4     exec_time=0     error_code=0
SET TIMESTAMP=1599598477/*!*/;
XA COMMIT X'1a',X'a1',1
There could be hundreds or thousands of other transactions written to the binary log in between the first and second phases. They could even be written to different binlogs.
This explanation is just to show how 2PCs work to understand the separate parts of an XA transaction.
Now to try and give some answer to “how does this happen?”…it could be from restoring a backup but MySQL did not get told to prepare some XID on the server. Now replication starts and it is reading events from the source’s binlog. Then it comes across an XA COMMIT but that XID was not prepared so it errors. Ultimately, these issues usually come down to some bug (here is one for example).
Do you have XAs blocking other transactions? Check out Planes, Trains, and Automobiles: MySQL XA Transactions.
Here is another post on how to troubleshoot XA recovery.
Percona Server for MySQL is also working on making XA RECOVER CONVERT XID more helpful!

Most Popular in Database