PHP Developer News

Attack No-PK Replication Lag with MySQL/Percona Server 8 Invisible Columns!

The most common issue when using row-based replication (RBR) is replication lag due to the lack of Primary keys.
The problem is that any replicated DML will do a full table scan for each modified row on the replica. This bug report explains it more in-depth: https://bugs.mysql.com/bug.php?id=53375
For example, if a delete is executed on the following table definition:CREATE TABLE `joinit` (
  `i` int NOT NULL,
  `s` varchar(64) DEFAULT NULL,
  `t` time NOT NULL,
  `g` int NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 
With this amount of rows:mysql> select count(*) from joinit;
+----------+
| count(*) |
+----------+
|  1048576 |
+----------+ 
The delete being:mysql> flush status ;

mysql> delete from joinit where i > 5 and i < 150;
Query OK, 88 rows affected (0.04 sec)

mysql> show status like '%handler%';
+----------------------------+---------+
| Variable_name              | Value   |
+----------------------------+---------+
| Handler_commit             | 2       |
| Handler_delete             | 1       |

| Handler_read_rnd_next      | 1048577 |
…It can be seen that the delete on the Primary requires a full table scan (Handler_read_rnd_next matches row amount + 1) to delete 88 rows.
The additional problem is that each of the rows being deleted will be recorded in the binary log individually like this:#220112 18:29:05 server id 1  end_log_pos 3248339 CRC32 0xdd9d1cb2 Delete_rows: table id 106 flags: STMT_END_F
### DELETE FROM `test2`.`joinit`
### WHERE
###   @1=6
###   @2='764d302b-73d5-11ec-afc8-00163ef3b519'
###   @3='18:28:39'
###   @4=27
### DELETE FROM `test2`.`joinit`
### WHERE
###   @1=7
###   @2='764d30bc-73d5-11ec-afc8-00163ef3b519'
###   @3='18:28:39'
###   @4=5

{88 items}Which will result in 88 full table scans on the replica, and hence the performance degradation.
For these cases, the recommendation is to add a primary key to the table, but sometimes adding a PK might not be easy because:
There are no existing columns that could be considered a PK.
Or adding a new column (as the PK) is not possible as it might impact queries from a 3rd party tool that we have no control over (or too complex to fix with query rewrite plugin).
The solution is to use MySQL/Percona Server for MySQL 8 and add an invisible column! 
Adding a new column (named “newc”) invisible as a primary key can be done with the following line:ALTER TABLE joinit ADD COLUMN newc INT UNSIGNED NOT NULL AUTO_INCREMENT INVISIBLE PRIMARY KEY FIRST;Note, adding a PK is an expensive operation that requires a table rebuild as shown here.
After adding an invisible PK, the table will look like this:CREATE TABLE `joinit` (
  `newc` int NOT NULL AUTO_INCREMENT /*!80023 INVISIBLE */,
  `i` int NOT NULL,
  `s` varchar(64) DEFAULT NULL,
  `t` time NOT NULL,
  `g` int NOT NULL,
  PRIMARY KEY (`newc`)
) ENGINE=InnoDB AUTO_INCREMENT=1048576 DEFAULT CHARSET=latin1 
Deleting a row now will be recorded in the binary log like this:### DELETE FROM `test`.`joinit`
### WHERE
###   @1=1048577
###   @2=1
###   @3='string'
###   @4='17:23:04'
###   @5=5
# at 430
#220112 17:24:56 server id 1  end_log_pos 461 CRC32 0x826f3af6 Xid = 71
COMMIT/*!*/;Where @1 is the first column ( the PK in this case) which the replica can use to find the matching row without having to do a full table scan.
The operation executed on the replica would be similar to the following which requires only one scan to find the matching row:mysql> flush status ;
Query OK, 0 rows affected (0.01 sec)

mysql> delete from joinit where newc = 1048578;
Query OK, 1 row affected (0.00 sec)

mysql> show status like '%handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 2     |

| Handler_read_key           | 1     |

| Handler_read_rnd_next      | 0     |
… 
Also as the name suggests, an invisible column won’t show nor it needs to be referenced when doing operations over the table, i.e:mysql> select * from joinit limit 2; 
+---+--------------------------------------+----------+----+
| i | s                                    | t        | g  |
+---+--------------------------------------+----------+----+
| 2 | ecc6cbed-73c9-11ec-afc8-00163ef3b519 | 17:06:03 | 58 |
| 3 | ecc7d9bb-73c9-11ec-afc8-00163ef3b519 | 17:06:03 | 56 |
+---+--------------------------------------+----------+----+
2 rows in set (0.00 sec)

mysql> insert into joinit values (4, "string", now(), 5);
Query OK, 1 row affected (0.01 sec) 
But if needed, the new column (newc) can be fetched if explicitly queried:mysql> select newc, i, s, t, g from joinit limit 2; 
+------+---+--------------------------------------+----------+----+
| newc | i | s                                    | t        | g  |
+------+---+--------------------------------------+----------+----+
|    1 | 2 | ecc6cbed-73c9-11ec-afc8-00163ef3b519 | 17:06:03 | 58 |
|    2 | 3 | ecc7d9bb-73c9-11ec-afc8-00163ef3b519 | 17:06:03 | 56 |
+------+---+--------------------------------------+----------+----+
2 rows in set (0.00 sec) 
What If…?
What if MySQL automatically detects that the PK is missing for InnoDB tables and adds the invisible PK?
Taking into account that an internal six bytes PK is already added when the PK is missing, it might be a good idea to allow the possibility of making the PK visible if you need to. 
This means that when you execute this CREATE TABLE statement:CREATE TABLE `joinit` (
  `i` int NOT NULL,
  `s` varchar(64) DEFAULT NULL,
  `t` time NOT NULL,
  `g` int NOT NULL,
  PRIMARY KEY (`newc`)
) ENGINE=InnoDB AUTO_INCREMENT=1048576 DEFAULT CHARSET=latin1 
Will be automatically translated to:CREATE TABLE `joinit` (
  `newc` int NOT NULL AUTO_INCREMENT /*!80023 INVISIBLE */,
  `i` int NOT NULL,
  `s` varchar(64) DEFAULT NULL,
  `t` time NOT NULL,
  `g` int NOT NULL,
  PRIMARY KEY (`newc`)
) ENGINE=InnoDB AUTO_INCREMENT=1048576 DEFAULT CHARSET=latin1 
And then we can execute this command:ALTER TABLE joint ALTER COLUMN newc SET VISIBLE;To make it visible.
Conclusion
Missing primary keys is a problem for scaling databases, as replication will require a full table scan for each updated/delete row, and the more data the more lag.
Adding a PK might not be always possible because of 3rd party tools or restrictions, but adding an invisible primary key will do the trick and have the benefits of adding a PK without impacting syntax and operations from 3rd party clients/tools. What will be awesome is to make MySQL able to detect the missing PK, add it automatically, and change it to visible if you need to.