pt-table-checksum没有检测到差异

时间:2017-09-07 15:03:01

标签: checksum database-replication percona

我有一个简单的master->与MariaDB的奴隶设置:

Master:Ubuntu 16.04 LTS与MariaDB 10.2.8和percona-toolkit 3.0.4

奴隶:Ubuntu 16.04 LTS与MariaDB 10.2.7

复制正常运行,现在我想检查主服务器和从服务器之间的数据是否相同。

我在master上安装了percona-toolkit并创建了一个校验和用户:

MariaDB> GRANT REPLICATION SLAVE,PROCESS,SUPER, SELECT ON *.* TO `pt_checksum`@'%' IDENTIFIED BY 'password';
MariaDB> GRANT ALL PRIVILEGES ON percona.* TO `pt_checksum`@'%';
MariaDB> FLUSH PRIVILEGES;

我还在slave conf中添加了report_host,以便它呈现给master:

MariaDB [(none)]> show slave hosts;
+-----------+-----------+------+-----------+
| Server_id | Host      | Port | Master_id |
+-----------+-----------+------+-----------+
|         2 | 10.0.0.49 | 3306 |         1 |
+-----------+-----------+------+-----------+
1 row in set (0.00 sec)

为了测试pt-table-checksum我从slave上的测试数据库中的Tickets表中删除了一行。我已经确认这行确实丢失但仍然出现在主人身上。

但是pt-table-checksum没有报告这种差异:

# pt-table-checksum --databases=shop_test --tables=Tickets --host=localhost --user=pt_checksum --password=... --no-check-binlog-format --no-check-replication-filters
        TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
09-07T16:15:02      0      0       14       1       0   0.013 shop_test.Tickets

所以我在我的环境中设置了PTDEBUG = 1,但似乎主设备与从设备连接良好。我试图从输出中选出相关的位:

# MasterSlave:5175 9725 Connected to h=localhost,p=...,u=pt_checksum
# MasterSlave:5184 9725 SELECT @@SERVER_ID
# MasterSlave:5186 9725 Working on server ID 1
# MasterSlave:5219 9725 Looking for slaves on h=localhost,p=...,u=pt_checksum using methods processlist hosts
# MasterSlave:5226 9725 Finding slaves with _find_slaves_by_processlist
# MasterSlave:5288 9725 DBI::db=HASH(0x31c5190) SHOW GRANTS FOR CURRENT_USER()
# MasterSlave:5318 9725 DBI::db=HASH(0x31c5190) SHOW FULL PROCESSLIST
# DSNParser:1417 9725 Parsing h=10.0.0.49
[...]
# MasterSlave:5231 9725 Found 1 slaves
# MasterSlave:5208 9725 Recursing from h=localhost,p=...,u=pt_checksum to h=10.0.0.49,p=...,u=pt_checksum
# MasterSlave:5155 9725 Recursion methods: processlist hosts
[...]
# MasterSlave:5175 9725 Connected to h=10.0.0.49,p=...,u=pt_checksum
# MasterSlave:5184 9725 SELECT @@SERVER_ID
# MasterSlave:5186 9725 Working on server ID 2
# MasterSlave:5097 9725 Found slave: h=10.0.0.49,p=...,u=pt_checksum
[...]
# pt_table_checksum:9793 9725 Exit status 0 oktorun 1
# Cxn:3764 9725 Destroying cxn
# Cxn:3774 9725 DBI::db=HASH(0x31cd218) Disconnecting dbh on slaveserver h=10.0.0.49
# Cxn:3764 9725 Destroying cxn
# Cxn:3774 9725 DBI::db=HASH(0x31c5190) Disconnecting dbh on masterserver h=localhost

我没有想法,为什么没有检测到丢失的行?

1 个答案:

答案 0 :(得分:1)

我在周末注意到了一个新的错误报告,我今天已经确认这确实是我遇到的问题。

解决方法是添加--set-vars binlog_format=statement

当我设置此选项时,差异会在第二次运行后显示出来。

在第一次运行期间,从站上的校验和表更改为:

MariaDB [percona]> select tbl, this_crc, this_cnt, master_crc,master_cnt from checksums where tbl = 'Tickets' and db = 'shop_test';
+---------+----------+----------+------------+------------+
| tbl     | this_crc | this_cnt | master_crc | master_cnt |
+---------+----------+----------+------------+------------+
| Tickets | f30abebe |       14 | f30abebe   |         14 |
+---------+----------+----------+------------+------------+

...到...

MariaDB [percona]> select tbl, this_crc, this_cnt, master_crc,master_cnt from checksums where tbl = 'Tickets' and db = 'shop_test';
+---------+----------+----------+------------+------------+
| tbl     | this_crc | this_cnt | master_crc | master_cnt |
+---------+----------+----------+------------+------------+
| Tickets | 284ec207 |       13 | f30abebe   |         14 |
+---------+----------+----------+------------+------------+

在第二次运行之后,差异也存在于pt-checksum-table输出中:

# pt-table-checksum --tables=shop_test.Tickets --host=localhost --user=pt_checksum --password=... --no-check-binlog-format --no-check-replication-filters --set-vars binlog_format=statement
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
09-11T11:17:37      0      1       14       1       0   0.022 shop_test.Tickets

我使用SHOW VARIABLES LIKE 'binlog_format'检查了binlog_format仍然是“MIXED”,所以显然它只会在会话期间发生变化。根据文档,据我所知,这应该是自动发生的:

  

这仅适用于基于语句的复制(pt-table-checksum   将持续时间的binlog格式切换为STATEMENT   如果您的服务器使用基于行的复制,则会话。)

错误报告: {{3}}

相关问题