Exchange Server Event ID Descriptions Are Missing

This week I had the pleasure of trying to figure out why the Exchange 2003 server was having problems again. It has been a long time since I looked closely at Exchange but the event log was not helpful. All of the Exchange Server Event ID Descriptions were missing. I did not need the descriptions to tell me we had run out of space so I set about purging old email messages and setting the “Deleted Item Retention” to zero. After the regular Exchange maintenance completed I still was getting some messages, so I set out to fix the description problem.

In my case I found out that the event ID descriptions were missing as described in XGEN: Exchange Server Event ID Descriptions Are Missing. Unfortunately this KB did not provide a sample to work from so I had to go elsewhere. Eventually I found a sample registry and manually entered the keys for MSExchangeIs Mailbox Store, MSExchangeIs Public Store, MSExchangeIS, and MSExchangeSA. Now I can look in the event log to see how  we are to filling up the Exchange database.

Pivot Table Analysis of the Event Log

Recently I had to investigate a problem with our SMTP server. One of things I wanted to know was when the SMTP problems started. Like most computer problems multiple event IDs were were being triggered each time a problem occurred. The Pivot Table Wizard is a great tool for quickly summarizing the event log data. Here is how I did it.

  1. Open the Event Viewer, filter your view to the event source you are interested in, export the list, and transfer the exported list back to your work station.
  2. Open a new blank worksheet in Excel and import the data using the Import External Data Wizard.
  3. Open the Pivot Table Wizard. Drop the “Date” into the row area. Drop the “Event” field into the column area. Drop any other field into the data area. I used the “Source” field. You should now have a pivot table that has columns for each event ID and a count of the number of events per day per event ID.
  4. I prefer the data to sorted in descending order so I went into the Advanced Field properties for the “Date” and set it to descending.

In my case with the pivot table analysis I could see that one event ID, 4000, was the primary event. The rest of the event IDs were secondary events.

Expanding a RAID1 array with bigger disk drives

Problem: You have an existing RAID1 array and now you need more disk space. You have purchased two identical 300 GB disk drives to replace the existing 147 GB disk drives. What is the quickest way to replace the disk drives with the least amount of down time?

Answer: This week I ran into a situation this week. The easy part of the answer was to replace one disk drive with a new 300 GB drive and let the RAID controller synchronize the drives. Then you replace the last 147 GB drive with the 300 GB disk drive. The hard part of the question was whether you could partition the remaining disk space into a logical volume without rebooting. The answer is yes. It took about a two and half hours to mirror the first disk. During the first hour Exchange was really sluggish. The next hour and a half the response time was okay. It took about an hour and a half to mirror the second drive. The response time was okay during the entire mirroring operation. When the mirroring was complete I used the Compaq/HP disk array software to check the disk drives. My research on Internet said that it was unlikely that the disk array software would show the disk space that was not part of the existing RAID1 array as being available. I was mildly amused to see that it showed that 292 GB was available(i.e. 146 GB per drive). I used the disk array software to create a 146 GB RAID1 volume. When I went into Disk Management I could see 146 GB was available to be partitioned and formatted. Except for the first hour of mirroring this whole operation was pretty painless and did not require a reboot.

Server 500 error, Codeplex, and ISA 2004

I recently tried to visit Codeplex and got a an error page with a Server 500 error. It did not take too long to figure out that there was a configuration problem on my firewall, ISA 2004. There were several proposed fixes but the one that worked for me I found on a Techarena forum and it said to either turn on or off the HTTP Compression filter. I turned it on and it worked.

I think I had turned off the compression filter in ISA 2004 SP1 days. According to Lazyadmin HTTP Compression started working in SP2 and he has recommendations for configuring it in his post, Enabling HTTP Compression in ISA 2004.

Changing ownership and deleting unknown accounts from objects

Yesterday I decided to fix an old problem. I had some directories and files with the unknown accounts in the access control lists(ACL). This can occur when you migrate user files to a new server. The easy way to fix this problem is right click on the directory and follow the menus to change ownership, delete the unknown account, and grant full access to the new owner. Another way of changing ownership is to use the command line utility, SubInACL. That is what I chose to use yesterday.

Sometime ago I had downloaded and installed the Windows 2003 Resource Kit which includes SubInACL. This is the utility to change ownership. After a lot of attempts and re-reading the help multiple times, I gave up. It did not work. So I downloaded FileACL and after a few attempts I figured out the command line to change the object. As an example the following command will grant full access to user1, revoke access to the unknown account, and change ownership of the directory, subdirectory, and files.

fileacl "Pinnacle Studio" /s user1:f /r S-1-5-21-73586283-1644491937-682003330-1123 /o user1 /sub /files

It bothered me that SubInACL did not work properly so I decided to spend a few minutes to find out why. After a little searching I found that the version(4.0) included in the resource kit did not work for several people and that there was a newer version, Download details: SubInACL (SubInACL.exe). The new version(5.2) works Windows 2003. I wonder how this slipped by quality control.

When Microsoft’s recommendations do not fix your userdata persistence error(0x800A0046)

About once a month I go to the Windows Update and let it check my computer. If Windows Update is working properly, the Windows Update cupboard will be bare. Sometime in December Windows Update stopped working for me and it started giving me a userdata persistence error. The help system said that all of my problems would disappear if I would just enable userdata persistence in my browser. So what do you do when your browser already has userdata persistence enabled? While I pondered that problem I ran Microsoft Baseline Security Analyzer to get my updates.

Today I found my solution. While I was investigating another problem, I found KB943144 – Updates are not installed successfully from Windows Update…. In this article it tells you how to manually re-install Windows Update. This was just what the doctor ordered!

Implementing the Change Password feature with Outlook Web Access

 

SUMMARY

This article discusses how to implement the Change Password feature in Microsoft Outlook Web Access (OWA) to allow OWA users to change their domain passwords. This article also describes some of the common troubleshooting scenarios where you might use this feature.

Implementing the Change Password feature with Outlook Web Access

If you are running SBS and ISA(aka SBS Premium) this can get a bit more complicated. With the default ISA setup you will get the infamous 403 page when you click on the Change Password button under OWA Options. This 403 is coming from ISA. To fix this 403 message you need to go to the SBS OWA Publishing Rule properties in ISA. I clicked on the Paths tab and added a new path, “/iisadmpwd/*”. When I tried the Change Password button again, I got a new and different 403. I got a 403.6 from IIS.

My solution for getting rid of the second 403 was to go to IIS Administration and bring up the properties for IISADMPWD. I clicked on the Directory Security tab and then clicked on the IP Address and domain restrictions button. I added a new exception using my external static IP address. I used the external IP address since the ISA log said that was the client address. I tried to use the IP address for the WAN adapter but it did not work.

Enjoy! 🙂

A little bit of 529’s

Susan says:

Health Monitor Alert screen

So let’s say you want to be alerted when someone does a password attempt on your system. Go into the health monitor, copy the Account Lockout alert service and edit it to look for event 529 in the event logs. Adjust the Actions to not only log to the system but to email you when someone does a bad password attempt and voila… you now have a early warning system when someone from remote is banging on things.

I personally limit the access to port 25 to only those ports that need access to the servers at ExchangeDefender.com and don’t get drive bys… but if you are concerned…..

A little bit of 529’s

Tracking down the cause of a 537 login problem

Recently I re-installed my desktop software on to a different disk drive and I started getting several event ID 537 error messages in the server security log. The workstation was not showing any operational problems so this is a low priority problem. The biggest problem is that the shear number of errors distorts my server monitoring report. Its annoying me about a non-critical problem. Somewhere in my past I had fixed this problem but unfortunately I could not remember how I did it. In fact I am not sure I ever knew how I fixed it! This particular error occurs with a status code of 0xC00002EE. Microsoft has an ambiguous explanation for this error message. Suffice to say there explanation was not helpful. Yesterday I got annoyed enough to solve the problem.

If you search the web for the 537 event ID, you will find several proposed solutions. Here are some of the solutions:

  1. Check to see if your workstation has a different time than the server.
  2. Remove HP printer monitoring software.
  3. Remove network interface card monitoring software.

None these solutions applied to me so I had to dig deeper into the problem.  The first thing I found out was that these errors appeared in clusters of three at periodic intervals. It looked like something at the workstation was triggering this error at periodic intervals so I started browsing the event logs on the workstation for an error at this time. I found that the automatic certificate enrollment was failing. Hmm….

When you look at the error message for the automatic certificate enrollment, you can see that it was failing because the “RPC server is unavailable”. So I cranked up the certificate snap-in and saw that the computer did not have a computer certificate. I also figured out that I could trigger the enrollment error by manually asking for a new certificate. Although there are a lot of causes of enrollment errors, I quickly focused on problems related to the ISA 2004 server and the RPC filter. ISA is a great firewall but sometimes it can be over protective. One of the recommended fixes was  to turn off “strict RPC Compliance”.  So I disabled that option, restarted the ISA firewall, and manually asked for a certificate. No Luck! I got the same error.

The next trick that some people advised was to disable the RPC filter in ISA. So I disabled the RPC filter, restarted the firewall, and manually asked for the certificate again. This time it worked and I have a computer certificate that is valid for a year. I enabled the RPC filter and restarted the firewall to put everything back to normal.

This morning I checked the log file on both the server and the workstation. Both log files look normal. The errors are gone and no new errors have appeared. Since I believe that I am running a fully patched ISA server, the RPC problem is a curious problem. For kicks I checked the date on the rpcflter.dll. It looks like the latest and greatest version so I suspect that the RPC problem probably lies elsewhere. Oh well! At least I know how to get rid of the problem for a year.

Template Menu for Word

Today I was reading the SBS Diva’s blog about a custom toolbar menu she has been using in Office 2003 for ten years. It sounded like a great idea for the Word 2003 templates I use to create business letters. My basic template prompts me to select a name from my Contacts in Outlook 2003 and then a small macro fills in the name and address fields in the letter and envelope. The template is a real time saver for me so I have created three versions of the same basic template with different letterheads and footers. I suspect this menu will be helpful if you have three or more templates you need to use on a semi-regular basis.

Recently I have been writing more letters so I am naturally interested in saving a few steps in bringing up the template. My natural work flow is to start Word, click on File-New, and then click on one of the recently used templates. This is not too bad but a custom toolbar menu is a little more intuitive and quicker. Here is how I created my toolbar menu.

  1. The first step is to create a macro that loads the template. I recorded a macro since my VBA memory is fuzzy. The macro has only one command in it so you may want to create the module from scratch and save it in the normal.dot.

    Documents.Add Template:= "\\srv1\Users\whuber\My Documents\Templates\WEHC Ltr.dot",_
            NewTemplate:=False, DocumentType:=0

  2. To create the custom toolbar, click on View-Toolbars-Customize menu item. A Customize popup window will appear. Click on the New button to create the custom toolbar. I called my toolbar, My Templates.
  3. Now we are going to place a menu in the toolbar. With the Customize popup still open click on the Commands tab. Go to the bottom of the Categories list and click on New Menu. In the Commands area drag and drop the New Menu command on to your newly created toolbar.
  4. Now we are going to fill in the menu items. The first thing we have to do is to go over to the toolbar and click on the New Menu icon to open the menu up. Go back to the Customize popup and click on the Macros item in the categories lists. You should see the macros you have created in step 1. You can now drag and drop the macros on to the open menu in your custom toolbar. When you have all of the menu items created you can now right click on the menu items and change the menu labels to something more descriptive.

I think I spent more time writing about how to create the menu than actually creating the menu. Oh well! Enjoy!

Installing the Messaging Security Agent from the Security Dashboard

SMEX Error MessageThis week I upgraded the Trend Micro SMB installation on my “dog food” server to version 3.6. It kind of worked. The virus checking stuff upgraded nicely but the Messaging Security portion did not. I got this message, “Error 1923.Service Trend Micro Messaging Security Agent Remote Configuration Server(ScanMail_RemoteConfig) could not be installed”.

I researched the problem and it said I should check my privileges. After researching what privileges it was complaining about, I figured out that the privileges for the Administrator userid were just fine. So I rebooted and tried to install Messaging Security portion again. I was unsuccessful but this time it told me to install it from the Security Dashboard. I don’t remember seeing that message before but I was game. After a little research I found these instructions on how do this.

Installing the Messaging Security Agent from the Security Dashboard

These instructions were a little too short for me since the installation process asked me a few more questions than were included in the instructions. The installation process asked me which directory to install Messaging Security in and the “shared” directory. I was not sure what they wanted for the shared directory since this field was prefilled with C$. C$ looks like a “share” to me and I was clueless about a shared directory. If Trend Micro has a shared directory they want me to use, they hid it well. Since I was installing these files on my “H” drive, I assumed they wanted the “share” for the drive, H$. Anyway that is what I gave it. When I pressed the enter key, a screen showing the installation status popped up. The status screen updated several times over the next ten minutes before it finally completed. Now when I check the “Live Status” and “Security Settings” screens they show me that the Anti-spam is working. Since Microsoft’s Intelligent Messaging Filter catches most of the spam for my “dog food” server I got through this unscathed.

SYDI and System Documentation

SYDI is a program to document your system. There are a lot of programs you can use to document your systems. Some programs are very sophisticated and provides lots of detail. Although these programs do not cost much, they inevitably have licensing issues and they provide a lot more detail than I care about. SYDI is a bunch of visual basic scripts that probe the system using WMI to create a XML file. At this moment SYDI provides enough documentation for me. With another script you can transform this XML file into either a Word document or HTML file. The documentation is not fancy but it is sufficient.

Recently I was updating my documentation for my server at home and decided that I was going to start saving versions of the server documentation. I initially changed the scripts to embed the date in the filename. I have since changed my mind and I have decided to store the XML files in a SVN depository. This way I can keep multiple versions of the XML file and compare these versions with the built-in Diff program or WinMerge. I still like the idea of embedding the date in the file name on the latest Word or HTML file.

As I was mucking about the scripts I decided to make a small contribution to the SYDI project and modify the XSL to generate valid XHTML code. I sent the XSL file to the developers and I will let them decide if they want to include it in the next release.

WSUS 3.0 problem partially fixed…Hmm

To make sure an upgrade works I install it first at home. I remain a little suspicious until I see that everything is working. As a test I release an unimportant patch to make sure WSUS still works. After a day I noticed that the patch had not been applied. A little checking showed me that none of my computers had checked in. That’s not good. A little more checking with the client diagnostic utility confirmed that the clients were talking but the server was responding with an error message. My setup under WSUS 2.0 used port 8531 for https and port 8530 for http. I switched the GPO to tell the clients to use the non-ssl port 8530 instead of 8531 and the clients could communicate with the WSUS server. Using 8530 is a temporary solution but I think I have the problem narrowed down. Hopefully this will be the only problem I have with WSUS 3.0.

WSUS 3.0 and ISA 2004 SP3 Updates

Windows Server Update Services(WSUS)

I was feeling a little adventurous yesterday and decided to update the Windows Server Update Services(WSUS). This package is a great tool for managing and tracking the updates to windows computers in a small business server environment. There are other products that may do a little better job but you cannot beat the price(Free). My version of WSUS was working okay but the console had always been very slow and occasionally I yearned for a little custom reporting. I could read between the lines, too. Microsoft really wants us to upgrade to 3.0 so you better be ready soon! I opted to get it done when it fit in my schedule. I cannot really complain about the slowness since I am running it on a server with less than the recommended CPU power. Before I could upgrade I had to install two packages:

  1. Microsoft .Net Framework Version 2.0
  2. Microsoft Report Viewer 2005 SP1

Although my version of WSUS had been migrated over to SQL Server, I did not need to alter the registry as indicated in the README file. With all of the prerequisites in place, I invoked the upgrade. The inplace upgrade took a long time but it completed without error. That is always a good sign. The only part of the upgrade I had not paid attention to was that the new console had completely replaced the old web-based console. The old console was no longer available. The good news is that I could run the upgrade on my workstation and I could install the new console as long as I had met the prerequisites(i.e. .Net 2.0 and Report Viewer 2005). After looking it was finished I went back to see what it had left behind. The SUSDB was gone. I did find a new SQL Server instance called “Microsoft ##SSEE” that was visible in Server Management console. It probably is a SQL Server 2005 Express database since it wants the SQL Server Management Studio to manage it.

Internet Security and Acceleration Server 2004 SP3

This service pack was released today, 5/1/2007. I did not see any advance warning in the mailing lists. Since I started updating the server yesterday and it was still in good condition for more updates, I went ahead and applied this one. This one installed without problems. I will add a new server configuration report for my records.
[tags]sbs, isa 2004, wsus[/tags]

More ideas on freeing up space on a windows server boot partition

Recently I have been plagued with low disk space on the boot partition of my SBS2K3 server. I was confused why this was occurring since I thought I had this under control. When I found some spare time I started to re-examine the typical culprits(e.g. log files and temp files). I found some files but they were not big enough to cause the problem I was seeing. Since I needed about 1 Gig free space for some major upgrades, I decided to go ahead and move the SBSMonitoring database to another disk drive and delete old patch files. This almost got me to 1 Gb. I even decided to replace Acrobat with the lighter weight PDF reader, Foxit Reader.

Today I found the problem! I had previously set up Shadow copies to store the copies for all disk drives on my backup drive. While I was using JDiskReport to look at the disk space I noticed that there were three large directories in the System Volume information for the boot partition. After a little snooping I figured out my shadow copies were being stored on the boot partition and they were being duplicated. As an example, I was getting a copy at 7:00 and 7:02. I do not know how things got screwed up but I suspect it occurred during a power outage when the backup drive went offline before the server. Fixing it was easy. I deleted the existing shadow copies and changed the settings to point to my backup drive. To fix the scheduling problem, I went into the Tasks control panel and deleted the extra jobs.

I was thinking of getting rid of Java and JDiskReport until it lead me to the source of my problem. I probably will remove it but not right now. TreeSize Free is a free, lightweight alternative. The Professional version is even better. It was much easier to rationalize removing Acrobat from the server. For those interested in using Foxit Reader to replace Adobe Acrobat on your server, the link is listed below. Acrobat uses about 80 Mb of disk space when installed. Foxit uses about 2 Mb, launches much quicker than Acrobat, and does not need to be installed. I do not browse the Internet from the server but I do look at Trend Micro reports while logged as Administrator on the server.

Now with Foxit Reader 2.0, you don’t have to endure such pain any more. The following is a list of compelling advantages of Foxit Reader 2.0:

  • Incredibly small: The download size of Foxit Reader is only 1.5 M which is a fraction of Acrobat Reader 20 M size
  • Breezing-fast: When you run Foxit Reader, it launches instantly without any delay. You are not forced to view an annoying splash window displaying company logo, author names, etc.
  • Annotation tool: Have you ever wished to annotate (or comment on) a PDF document when you are reading it? Foxit Reader 2.0 allows you to draw graphics, highlight text, type text and make notes on a PDF document and then print out or save the annotated document.
  • Text converter: You may convert the whole PDF document into a simple text file.
  • High security and privacy: Foxit Reader highly respects the security and privacy of users and will never connect to Internet without users’ permission. While other PDF Reader often silently connects to the Internet in the background. Foxit PDF Reader does not contain any spyware or adware.

Source: Foxit Software
[tags]sbs[/tags]

Publishing ISA Reports on your Sharepoint site

Here’s the problem. You want to look at your firewall reports regularly. You have gone so far as to set up ISA to publish the daily and monthly reports to a directory on the server but getting to yesterday’s report is a real pain in the butt. It would be nice to send the report via email as a pdf like Trend does or to have it appear on the home page of your sharepoint site. Although I may do the email option in the nearby future I have already completed the second option. Here is how I did it.

  1. Publish the ISA reports you are interested in to a directory if you have not already done it.
  2. Add a virtual directory to your default web site and point it at your report directory. For this example I will use srv1 as the server name and isa as the virtual directory name. This virtual directory points to my ISA reports directory located at h:\reports. To get to the Daily report for 4/25/2007 I would use the following URL, http://srv1/isa/Daily_(4.25.2007-4.25.2007)/report.htm. As you can see entering this URL can get pretty tedious.
  3. To solve this problem I created a small web page with some javascript that calculates the URL to yesterday’s ISA report and then redirects you there. I called that page, daily.htm, and put it in the Reports directory. So if I wanted to see yesterday’s ISA report, I would enter the following URL into my browser, http://srv1/isa/daily.htm, and the latest ISA daily report would pop up.
  4. Now since we have a URL that will always point to the latest ISA daily report, the Page Viewer Web Part becomes a simple solution to the problem. The Page Viewer Web Part gives me a peak at the Daily report and it makes it easy for me to browse the rest of the report. I created a similar web page that produces Monthly report. I put links to both pages and the directory in my Sharepoint Links list and My Favorites.

Although I used this technique for looking at firewall reports it could be easily modified to show a web page with key business indicators that you create daily, weekly, or monthly.

Here is the code for the daily.htm web page.

< !DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en_US" lang="en_US">
<head>
<meta name="generator" content="HTML Tidy" />
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
<script type="text/javascript">
//< !&#91;CDATA&#91;
function getRptDate()
{
var now = new Date();
var ydate = new Date(now.getTime() - 86400000);
var yday = ydate.getDate();
var ymon = ydate.getMonth() + 1;
var yyear = ydate.getFullYear();
var datetext = ymon + '.' + yday + '.' + yyear + '-' + ymon + '.' + yday + '.' + yyear;
return datetext;
}
//&#93;&#93;>
</script>
</head>
<body>
<script type="text/javascript">
<!--
var d = getRptDate();
var path = 'http://srv1/isa/';
window.location = path +'Daily_(' + d + ')/report.htm';
//-->
</script>
</body>
</html>

Colligo Reader and Contributor – Your Offline SharePoint Solution

Colligo Reader is free for individual use and a great way to try the rich offline experience that Colligo for SharePoint delivers. Reader provides read-only access to SharePoint content offline, including documents, lists, and metadata. It is so simple to use that training is not required. Download Colligo Reader for free today!

Source: Colligo Reader and Contributor – Your Offline SharePoint Solution

The good news is that our board is asking me questions. The bad news is that I do not always have the answer on the tip of my tongue and I need to look it up. For the last year I had been bringing my laptop to the meetings so that I could answer those unexpected questions. About two months ago our nonprofit moved their board meeting to an offsite location. It is a nice conference room in a good location but it does not have public Wi-Fi access. This is where Colligo Reader has been a great help. I store all of my nonprofit reports, letters, and worksheets in a Sharepoint site I run. Before I go to the meeting I let Colligo synchronize the files. This is a great solution for the mobile workforce and especially for those people trying to keep their nonprofit work from consuming the rest of the day.

2003 SP2 is okay!

I installed SP2 late on Friday without any obvious issues besides that it took a long time to install. Late in the evening IIS shut down for an unknown reason. I did not catch it until the following day. It caused several problems in programs dependent on IIS. After I restarted IIS everything has been stable for the last two days. Everything I tried still works.

Win 2003 sp2 on Microsoft update now

Windows Update Screenshot

On Microsoft update and apparently soon on WSUS is Windows 2003 sp2. I want you to go over to your servers right now and turn off auto updates if you have then enabled on the servers. For R2 WSUS boxes you won’t get SP2 automatically as SP’s are offered up but not auto installed. But I don’t want you to accidentally install this service pack as for us SBSers we need to read KB932600 …but as of right now the link isn’t yet live..

Windows Small Business Server 2003 R2 (SBS 2003 customers – Please read Knowledge Base Article 932600 before installing SP2) …

I did not see this one coming. I am busy with other things and do not want to deal with SP2 until someone else figures out the problems. I do not have any problems that SP2 will fix. In fact I am pretty happy with my servers. With the tax season peak just around the corner this is the wrong time to drop in unannounced with SP2. Fortunately my servers will only update when I bless it. So I should be in good shape.

Hard Disk MTBF: Flap or Farce?

 

Data sheets for hard drives have always included a specification for reliability expressed in hours: commonly known as MTBF (mean time between failures), or sometimes the mean time to failure. Same difference: One way assumes that a drive will be fixed, and the other, replaced. Nowadays, this number is around a million hours for an “enterprise” hard drive. Some drives are rated at 1.5 million hours.

Now, that’s a good stretch to time. After all, a year is only 8,760 hours. One million hours comes to a bit more than 114 years. Some may be scratching their heads, since the hard drive itself has only been around for 50 years (IBM’s giant 350 Disk Storage Unit for its RAMAC computer). This can be confusing.

Instead, the MTBF is a statistical measure based on a calculation extrapolated from less-lengthy readings. It all means that drives are very reliable, with a failure rate well under 1 percent per year. Go Team Storage!

However, several papers covering large-scale storage presented at FAST ’07, the USENIX conference on File and Storage Technologies, held recently in San Jose, Calif., are kicking up a stir online about MTBF.

The Best Paper award was handed to “Disk Failures in the Real World: What Does an MTTF of 1,000,000 Hours Mean to You?” by Bianca Schroeder and Garth Gibson of Carnegie Mellon University in Pittsburgh.

Their study tracked a whopping set of drives used at large-scale storage sites, including high-performance computing and Web servers. The data suggests that a number of common wisdoms surrounding disk reliability are wrong.

For example, they found that annual disk replacements rates were more in the range of 2 to 4 percent and were as high as 13 percent for some sites. Yikes.

Source: Hard Disk MTBF: Flap or Farce?

I found this fascinating article about MTBF and disk failures yesterday. I have known for some time that you must take the MTBF figures with a grain of salt. Disk drives appear to fail more often than what the MTBF figures would leave you to believe. The differences between “enterprise” disk drives and “retail” disk drives appear to be indistinguishable in the real world. Yet as an IT professional we will always recommend the component with the higher perceived quality even though we have misgivings about the statistics. For most businesses the cost of down time due to a disk failure is much higher than the additional cost for quality. Although we hate to admit it, there is a significant subjective component to our component recommendation.