Defaced web site

Each morning I check all of my websites. I found the easiest way is to use the tabbed interface of Firefox. I open all of the sites and then tell Firefox to use the current pages as my home page. The next time I start Firefox it automagically loads each page in a new tab. On Saturday I was surprised and shocked to find one of my sites had been defaced. Instead of my normal drab page I found a semi-nude female and some sort of political announcement. After the initial shock wore off I was kind of bemused. The site is a low traffic site so in a wierd way I was surprised they chose my site. Well, after going through shock, bemusement, and surprise I got down to business and started changing passwords, changing the home page back to the original, and checking for any other changed files.

My host provider provides daily, weekly, and monthly backups. Now that I had a problem I started looking closely at all of the administrative type problems I had been ignoring. Some of the problems I noticed were:

  1. The backup seemed to be very large relative to the size of the website.
  2. The backups complained about trailing garbage when I opened them in Winzip.

The second problem led me astray with the first problem for a while. I fixed the second problem by using Cygwin and gunzip to expand the file without error messages. I eventually found out that this is not unusual and can be ignored. Assured I was using a good backup I used Winzip to sort the files in the backup based on file size. I quickly found the culprit, Spamassassin’s autowhitelist. It was 45 MB. From there it was not hard to find out that the backup also included files I had deleted. It is nice to know they are there but it is a pain when you are anxious for the download to complete.

After a little playing around I found a way to pull out only the web site directories. I pulled out the directory tree for the weekly backup and then created a md5 digest for the files in that tree. I repeated this process with a daily backup and compared the digest to the new tree. I found only a few changes and I could explain all of them. Whew!

Okay, here’s the roll call of utilities that helped me. Although I used Cygwin’s utilities and Winzip to figure things out, I found that the command line version of 7-zip is a faster and more convenient solution. I never did figure out a convenient way to extract just one directory from a tar file with Winzip. I ended up creating shell scripts for Cygwin and a bat file for Windows so I will not have to reinvent the wheel next time. Although I did work briefly with Fsum and it maybe faster, md5summer is the more convenient solution of creating and comparing md5 digests. For those who are curious, it takes about six minutes for my P3-700 to calculate the md5 digest(638K) of the directory tree.