As for now we finished with scheduled upgrades and software instalations as well as fixing ongoing problems found on 64.74.112.74 server and
I'm here to answer your questions.
Why we had a downtime.Well the maintanance was announced on Friday and there was stated that we'll have a short downtimes during maintanance.
The downtime appeared to be longer because some hackers found a hole in XMB 1.8 emailfriend.php script and started to send huge chain of spam through the server. In order to prevent server from going down completely we had to stop apache in order to get server load lower so that we could start investigation and close a goles.
As a result we chmod 0 emailfriend.php fom XMB1.8 package to prevent it from being launched and filtered out spammers on the firewall once we located their ip range.
We also had to tune default kernel we started out with restoring accounts on thusday to adress load problems reported by our clients and improve Dual Xeon gyper threading support. In case new kernel doesn't go up it takes 5-10 minutes to realize it and reboot box with an old kernel so that was second factor of the downtime.
Third we had to fix recently discovered fronpage extentions problem and recompile apache/php with all required modules. When compilation process has an error it also means short downtime. During compilation we had several partialy working apache installations and failing up they filled /usr partition with core dumps so some users could notice 'disk full' messages though we were erasing them imediately.
Server load.We had increased server load lately because of:
1. spammer. When you have email queue 1000+ messages exim max out CPU usage. Such load was for about 20 minutes and went away once spammer was stopped.
2. software installation. When we were compiling & installing software it created load bursts lasting for 30-50 minutes.
3. default kernel. That was explained above.
Result.
After maintanance server load looks like the following:
QUOTE
Linux 2.4.20-19.7smp (server10.fastbighost.com) 07/29/2003
12:00:00 AM CPU %user %nice %system %idle
12:10:00 AM all 2.39 0.10 2.55 94.97
12:20:00 AM all 2.04 0.02 2.15 95.79
12:30:00 AM all 2.23 0.02 2.29 95.45
12:40:00 AM all 2.31 0.02 2.43 95.24
12:50:00 AM all 3.22 0.03 2.80 93.95
01:00:00 AM all 2.33 0.03 2.08 95.56
01:10:00 AM all 3.46 0.03 3.24 93.27
01:20:00 AM all 3.34 0.06 2.45 94.16
01:30:00 AM all 2.86 0.02 2.32 94.80
01:40:00 AM all 2.55 0.02 2.33 95.10
01:50:00 AM all 2.68 0.01 2.19 95.12
02:00:00 AM all 3.31 0.02 2.61 94.06
02:10:01 AM all 2.50 0.03 2.20 95.27
02:20:00 AM all 18.72 0.43 7.97 72.88
02:30:00 AM all 22.78 0.03 7.90 69.30
02:40:00 AM all 2.19 0.05 2.13 95.63
02:50:00 AM all 1.54 1.47 2.38 94.61
03:00:00 AM all 2.12 0.24 2.35 95.29
03:10:00 AM all 2.80 0.69 2.70 93.82
03:20:00 AM all 1.67 0.75 2.38 95.19
03:30:01 AM all 2.51 0.11 2.79 94.59
03:40:01 AM all 3.57 0.43 3.31 92.69
03:40:01 AM CPU %user %nice %system %idle
03:50:01 AM all 1.74 0.07 3.27 94.91
04:00:00 AM all 1.73 0.21 2.98 95.08
04:10:00 AM all 2.28 0.61 3.45 93.67
04:20:00 AM all 1.52 1.13 2.47 94.87
04:30:00 AM all 1.68 0.74 2.37 95.22
04:40:00 AM all 3.49 0.54 4.10 91.87
04:50:00 AM all 1.84 0.32 2.05 95.79
05:00:00 AM all 5.71 1.13 2.32 90.85
05:10:00 AM all 1.82 1.30 2.17 94.71
05:20:00 AM all 2.17 1.02 2.07 94.74
05:30:00 AM all 1.92 0.08 2.26 95.73
05:40:00 AM all 2.23 0.03 2.31 95.43
05:50:00 AM all 1.74 0.09 2.60 95.57
06:00:00 AM all 1.42 0.02 1.83 96.73
06:10:00 AM all 1.65 0.02 2.09 96.24
06:20:00 AM all 1.42 0.05 1.95 96.59
06:30:00 AM all 1.87 0.02 1.77 96.35
06:40:00 AM all 1.35 0.02 1.92 96.72
06:50:00 AM all 1.16 0.03 1.80 97.02
07:00:00 AM all 1.63 0.02 2.65 95.70
07:10:00 AM all 1.99 0.02 2.06 95.92
07:20:00 AM all 1.92 0.02 2.23 95.83
07:20:00 AM CPU %user %nice %system %idle
07:30:00 AM all 1.67 0.02 2.18 96.14
07:40:00 AM all 2.33 0.02 2.31 95.33
07:50:00 AM all 6.96 0.02 2.84 90.18
08:00:00 AM all 2.98 0.02 2.44 94.56
08:10:00 AM all 5.21 0.05 3.09 91.65
08:20:00 AM all 4.59 0.07 3.05 92.29
08:30:00 AM all 29.29 0.04 20.05 50.62
08:40:00 AM all 17.38 0.02 15.41 67.18
08:50:00 AM all 1.31 0.02 1.52 97.15
09:00:00 AM all 4.21 0.02 4.17 91.60
09:10:00 AM all 16.28 0.03 10.27 73.42
09:20:00 AM all 3.71 0.03 3.29 92.97
09:30:00 AM all 3.73 0.26 3.71 92.30
09:40:00 AM all 1.85 0.04 2.06 96.04
09:50:00 AM all 3.03 0.02 2.13 94.82
10:00:24 AM all 17.05 0.06 13.88 69.02
10:10:00 AM all 15.94 0.02 8.74 75.29
10:20:00 AM all 4.31 0.04 3.01 92.64
10:30:00 AM all 2.73 0.07 2.83 94.37
10:40:00 AM all 2.33 0.02 1.93 95.73
10:50:00 AM all 4.68 0.03 6.38 88.91
11:00:00 AM all 15.95 0.02 12.17 71.86
11:00:00 AM CPU %user %nice %system %idle
11:10:00 AM all 16.03 0.12 8.78 75.07
11:20:00 AM all 15.98 0.09 8.83 75.09
11:30:00 AM all 3.78 0.12 4.09 92.01
11:40:00 AM all 2.96 0.06 5.93 91.05
11:50:00 AM all 2.82 0.02 2.10 95.06
12:00:00 PM all 5.24 0.09 2.15 92.52
12:10:00 PM all 2.32 0.02 2.14 95.51
12:20:00 PM all 3.69 0.05 3.04 93.21
12:30:00 PM all 2.30 0.14 2.17 95.39
12:40:00 PM all 3.28 0.03 3.27 93.42
12:50:00 PM all 2.95 0.02 2.43 94.60
01:00:00 PM all 2.09 0.02 2.31 95.58
01:10:00 PM all 2.39 0.02 2.36 95.24
01:20:00 PM all 2.79 0.02 2.37 94.82
01:30:00 PM all 3.13 0.02 2.69 94.15
01:40:00 PM all 2.62 0.04 2.37 94.97
01:50:00 PM all 3.10 0.02 3.16 93.73
02:00:00 PM all 2.80 0.03 2.44 94.73
02:10:00 PM all 2.94 0.02 2.41 94.63
02:20:00 PM all 2.46 0.08 2.35 95.11
02:30:00 PM all 2.61 0.02 2.45 94.91
02:40:00 PM all 2.71 0.02 2.72 94.54
02:40:00 PM CPU %user %nice %system %idle
02:50:00 PM all 2.41 0.02 2.37 95.21
Curren server time is 02:52:36 PM
You can see short bursts when we were compiling new software but overall load is more than moderate.
Email backupsAn announce will be posted shortly about this.
Being not responsiveSorry for that, but we had a huge amount of problems yesterday & today. Cpanel automatic update caused exim and ftp user database to go away so on some server peole were not able to login to mail/ftp. We had to deal with that problems and fix mail&ftp for our customers. Also 3 servers had ftp extentions not working. DarkOrb already released a patch for this problem but it is rather complex and requires some time to accomplish with a risk of downtime when apache compilation doen't go through. We are yet to finish with it for all our servers.
Having large amount of requests we are concentrating on server wide problems first in order not to be overhelmed with helpdesk requests and get responce to the normal.
Thank you all for understanding. I also understand and wish good luck to those who decided to cancel and go elsewhere. I wish you good luck with a new hosting company, though I doubt you'll ever find 100% uptime because there is no way to perform server upgrades/maintanance we had to do with no downtime to do and every company should do software upgrades/recompilation periodically.