You don’t hear this often… MySQL server crashed and the sites hosted in that server went offline for hours together. But, when a similar thing happened to my VPS, it wasn’t the case. My site was still online, while I was troubleshooting the issue with MySQL server. Ultimately, I could not figure out the issue and I had to purge the entire MySQL installation. However, I was still cool during the entire process. You may ask how. Here is what happened and how you can prevent the same for your own VPS too…
First Things First – The MySQL Logs
As part of regular tweaking of MySQL, I changed a few things in my.cnf, then restarted the mysqld. Bump! It didn’t start. I reverted the changes. Still it didn’t budge. Immediately I knew that MySQL server has crashed for unknown reason. Upon checking the log, here is what I found…
120525 06:54:11 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql 120525 6:54:11 [Note] Plugin 'FEDERATED' is disabled. 06:54:11 UTC - mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=31457280 read_buffer_size=524288 max_used_connections=0 max_threads=100 thread_count=0 connection_count=0 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 133850 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0xffffffffbfb01d80 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = ffffffffbfb03ab8 thread_stack 0x20000 /usr/libexec/mysqld(my_print_stacktrace+0x2d)[0x83b70e2] /usr/libexec/mysqld(handle_fatal_signal+0x4a2)[0x829d156] [0xb777c400] /usr/libexec/mysqld(_Z18ha_resolve_by_nameP3THDPK19st_mysql_lex_string+0xad)[0x82a104c] /usr/libexec/mysqld(_Z14open_table_defP3THDP11TABLE_SHAREj+0x1822)[0x82213c2] /usr/libexec/mysqld(_Z15get_table_shareP3THDP10TABLE_LISTPcjjPij+0x197)[0x8169869] /usr/libexec/mysqld(_Z10open_tableP3THDP10TABLE_LISTP11st_mem_rootP18Open_table_context+0x545)[0x817010a] /usr/libexec/mysqld(_Z11open_tablesP3THDPP10TABLE_LISTPjjP19Prelocking_strategy+0x456)[0x8171683] /usr/libexec/mysqld(_Z20open_and_lock_tablesP3THDP10TABLE_LISTbjP19Prelocking_strategy+0x54)[0x8171f86] /usr/libexec/mysqld[0x81adf41] /usr/libexec/mysqld(_Z11plugin_initPiPPci+0x8dc)[0x81b0db9] /usr/libexec/mysqld[0x8133a34] /usr/libexec/mysqld(_Z11mysqld_mainiPPc+0x42a)[0x8136cc8] /usr/libexec/mysqld(main+0x27)[0x812ce33] /lib/i686/nosegneg/libc.so.6(__libc_start_main+0xe6)[0xb7288ce6] /usr/libexec/mysqld[0x812cd95] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0): is an invalid pointer Connection ID (thread ID): 0 Status: NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. 120525 06:54:11 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
I tried everything I could, for about 90 minutes. Then I gave up and purged the entire MySQL installation and installed it again. Viola! It started fine! Then it was only a matter of taking the latest database backup and putting it back.
Where is the Magic?
During this entire period, I didn’t get a single notification from Pingdom regarding my site being down. Thanks to Varnish. If you do not know already, Varnish works on top of WordPress (or any other application). So, even if your Nginx, php-fpm and MySQL server fail, your site would still be online. It works like a magic, even if the backend fails for some reason.
I wonder why web hosting companies just can’t start using Varnish (Gandi.net already does, though)!