Arithmetic 101

This note is intended to serve two purposes: give you some idea of the limitations in recording data files and show you how simple arithmetic can be used to investigate problems you might otherwise ascribe to hardware, software, media, or an evil spell.

Problem: I have a simple layout with a lot of files to burn as data to a CD-R - and it doesn't work! Somebody said to defrag the hard drive, but why should I do that? Do I need an AV SCSI drive for this sort of simple job? No - all you need is to think about what's going on and to apply some logic.

Suppose we start with 40,000 small files. How small? first guess would be about 16K each; if we make them all just under that, they will fit onto a standard blank. In fact, how much smaller than 16K they may be makes little difference and can make the situation worse, as you will see. Let's assume that I'm writing them from a defragged drive and that I have good caching of that drive so that its directory stays in RAM. Then retrieving that file takes a check of the directory cache, one seek to the file start and 16K data transfer.

On a typical hard drive, you can transfer 3 MB/sec or more and seek time is around 10 msec. Transferring 16K will then take 10 + 16/3 or about 15 msec. If the files are smaller, the number could be as low as 10 msec - the seek time. That means that the drive will transfer 70-100 files per second and that 40,000 files will take 400-600 seconds (7-10 minutes) to write.

Now some more arithmetic. If you write a 650 MB disc at 8x, you will be writing for a bit more than 9 minutes. It takes 10 minutes to fetch the data and 9 minutes to write them without stopping - underrun! Suppose the files are smaller? Oh, then you will be writing less, that will take less time and underrun!

Now, suppose the disc with the files is fragmented. Each fragment requires a seek. If it's fragmented badly, then many fewer files will cause the same problems as a lot of files on a defragged disc. If we make an image on a defragged disc, there is only one seek required (to the start of the image) and you could write at 20x - regardless of the number of files or their fragmentation.

Please notice the following:

We can carry this a bit further still and move from the HD to a fixed-length packet disc. Here, each packet is 32K and they are deliberately separated on the disc to reduce scrubbing when files are deleted or replaced. So each 32K read requires one seek to the start of the packet and 32K or less of data reading. On a CD-ROM, seek time is relatively long. Your assignment, should you choose to take it, is to find the seek time on your drive(s), do the arithmetic and see if you can understand why it takes so long to read a fixed-length packet disc and why the rotation speed is not the issue.


E-mail me at cdrecording@mrichter.com
Return to Mike's home page