各种故障
应用获取不到连接池数据库响应慢SQL慢服务器load高SWAP表不见了MySQLcrash主机Hung…观察你的系统
(相关资料图)
• MySQL
– 活动进程(Process list)
– 日志文件(slow log, alert log, general query log, binlog)
– Status variables ( com_select, com_insert,.etc )
– InnoDB(物理读、逻辑读、 innodb status)
– 参数配置
– Stack trace(plus source code)
• SQL
– 执行计划, explain
• OS
– 内存, SWAP, /proc/meminfo
– CPU, load, ps
– IO (磁盘、网络)
• Iostat
• Profile
– Oprofile
– gprof
Case 1: XXX系统报连接池满
iostat
orzdba
slowlog
What’sinslowlog?
Mk-query-digest
mk-query-digest全面分析slowlog
explain
查看执行计划
– 选择了不好的索引
哪些SQL在执行
• Slowlog
– Set global long_query_time=0
• Generallog
• Binlog
– For DML, mysqlbinlog binlog解析
• Processlist
– If some query is really slow
• Tcpdump
– Tcpdump + mk-query-digest
Case 2: 很多MySQL线程都卡住了
Processlist
Id: 1842782 User: provide Host: 192.168.0.1:59068 db: provide Command: Query
Time: 2326
State: Waiting for table
Info: update table_xxxx set sold=sold+1, money=money+39800, Gmt_create=now() where xxxx_id=1 and day="2011-10-07 00:00:00
Id: 1657130 User: provide Host: 192.168.0.2 :40093 db: provide Command: Query
Time: 184551
State: Sending data
Info: select xxxx_id, sum(sold) as sold from table_xxxx where xxxx_id in (select xxxx_id from table_xxxx where Gmt_create >= "2011-10-05 08:59:00") group by xxxx_id
1044systemuserConnect27406FlushingtablesFLUSHTABLES
Processlist分析
– 谁是因,谁是果?
• Systemuserexecuteflushtables
– System user是谁, mysql主从复制( io thread, sql thread)
– Binlog
•谁最先执行了flushtables
– 人工执行?
– App? 没有权限
– 定时任务,备份
• Xtrabackup会执行flushtableswithreadlock, 不记录到binlog
• Mysqldump理论上不会执行flushtables ,但如果有bug呢
(http://bugs.mysql.com/bug.php?id=35157)
Case 3: 服务器load高
调查问题
– SQL层面未见明显异常
– 业务没有变动,没有发布
– 调用量没有明显变化
Iostat
– r/s, w/s
– await, svctm
– avgrq-sz
Blktrace,btt
IO调度算法
– cfq -> deadline
Case 4: DDL lost table
alert.log大量报错
– 持续10几分钟后, Table lost。
• 几百个进程都block在”openingtables”,这些表都不是DDL的那个表
丢表时的alert.log
Pstack-master thread
Pstack–alter table
Case 5: MyISAM
Orzdba
vmstat
strace mysqld
Oprofile global
Oprofile mysqld
pstack
Summary