Nearly all documents created in offices these days begin their lives on a computer, as a computer file. Computer files are quite dynamic in nature. They change over time as they are accessed. Computer files are not immortal, but the act of deleting a file does not destroy it. Nonetheless, the very act of using a computer overwrites computer files.
This document describes how documents are created, what happens to them after their creation or attempted destruction, and explains the following important points:
- Documents when deleted are not necessarily destroyed, and as such, may be recoverable by a professional computer forensics examiner.
- Files retrieved by commercially available data recovery programs are not likely to include all relevant documents or information.
- Continued use of a computer after file deletion may cause the destruction of previously deleted files or documents: Time is of the essence.
- Copying of computer media can be performed without disrupting the data on the computer.
Document Creation and Storage
Microsoft Word is the leading word processing program for office computers. When a document is begun in Microsoft Word, three things happen:
- The new document is displayed on the screen.
- A “temporary” work file is created on the hard disk. Let us call this file “Work File A.” This file is invisible to the user.
- Data begins to churn though the virtual memory file, which is a physical file on the hard disk. Let us call this file the “VM file.” This file is also invisible to the user.
When the author saves the document he or she is writing, a fourth thing happens:
- A file with a name given to it by the author is created on the computer’s hard disk. Let us call this file the “User Document.”
As the writer continues to write or update the document, changes occur within the User Document, these changes are reflected in Work File A, and much of this data is written into the VM file. As the writer changes and updates the User Document, much of the previous edition is invisibly archived into the User File as well as into the other two files we have mentioned. When the file is closed, “Work File A” is not saved as a document accessible to the user, but it continues to exist as a deleted file on the computer’s hard disk.
When the User Document is opened again, a new temporary and invisible Work File is created. It may be named “Work File B.” There may be several iterations of the creation of a Work File on a given hard disk, one corresponding to each time the User File is opened and viewed and/or modified, and correspondingly named “Work File C,” “Work File D,” etc.
If the User Document is saved with a different name, the document is still maintained on the hard disk with its original names, as well as with the new name.
Email and other documents behave in much the same way, although the specifics differ somewhat from program to program. Microsoft Outlook saves its email files somewhat differently from Microsoft Outlook Express, for instance.
Saving a File
When a document is named, it is saved. It may be saved with a name such as “Untitled” even if not given a unique name by the author. When the file is saved, there are several attributes saved with it. One is the date the file was created; one is the date the file was last changed, or modified; one is the date the file was last accessed. This information is kept as part of a file listing called a “directory.” This file listing is viewed as a “folder” by the computer user. The computer saves a long version and a short version of the name as two adjacent directory listings as well.
The space on a computer’s hard disk is divided up into pieces called “sectors.” Each of these sectors contains 512 bytes of space, and a character (such as a letter or a number) generally takes up 2 bytes. Therefore, a sector can hold about 256 characters. When a file is about to be saved, the computer sets aside a “cluster” of space for it. A cluster is generally about 64 sectors. This cluster is assigned to the file whether or not the file needs all of the space in the cluster, and cannot be assigned to another file (except through malfunction) as long as the file still exists as a file. So, even if a file consists of one letter, which is 2 bytes in size, the computer allocates approximately 32,000 (actually 32,768) bytes of space. The file is then actually written to the first 2 bytes of the cluster, leaving the great majority of the cluster unchanged. Whenever a file exceeds one cluster in size, the computer sets aside another entire cluster for it.
When a file is deleted, the file does not simply go away. It remains invisible on the hard disk. Furthermore, the deletion of the file does not affect the preexisting Work Files, and has little one-to-one correspondence with any changes to the VM file. The computer keeps track of physical locations to where a data file may be written. When a file is written, the computer (actually, the operating system) makes one or more entries into its index of file information that includes whether or not a specific spot on a hard disk contains a file. When a file is deleted, the computer marks the physical location of the file as available to be used. It also changes the name of the file by altering its first character. If a file was named, “Computer file,” its name would change to “somputer file.” Having the “s” character at the beginning of its name tells the computer that this file listing is available to be overwritten. But until another file is saved to that directory, and saved at that spot in the directory, the file name is not overwritten. Furthermore, if the name of the new file that is written to the same location in the directory is shorter than the original name, only part of the original name is overwritten. For instance, if the original file were named “Computer file” and the new file were named “Joe,” then while the directory entry would appear to the user to be “Joe’” the name actually in the directory listing would look something like “Joeputer file.”
Similarly, when a file is overwritten, much of the previous content of the file may remain intact. If, for instance, a file that took up 4 entire clusters is deleted, and another file that measures 256 bytes is written, then 3 1/2 clusters, or 7/8 of the original data is maintained and may be able to be recovered. This is the kind of work a computer forensic examiner performs. When a file is simply deleted, and not overwritten, it is fairly trivial for a computer forensic examiner to recover, or recreate, the file.
Some email programs work in a slightly different fashion. In Microsoft Outlook, all emails are kept in one large file. When emails are deleted and even when they are purged, the content of the deleted email is not necessarily deleted from the large Outlook file. These deleted emails may be recovered through a manual process. In Microsoft Outlook Express and other email programs, such as Qualcomm Eudora Pro, each mailbox has its own file, and all emails from a given mailbox are kept in that one file. Again, deleted emails may not necessarily be removed from that file, and may be recoverable by a manual process.
Dangers of Continuing Computer Use
Whenever new data, files, or documents are written to the computer, there is a danger that the data remaining from previously deleted files will be overwritten. Overwritten data is not recoverable through any means available to most computer forensic examiners. The very act of starting up a computer, shutting it down, or even looking for files will alter the actual contents of the hard disk. Therefore, the longer the computer is in use after a file has been written or deleted, the greater are the odds that data of interest will be spoiled or destroyed.
Concerns regarding privacy of data.
It may, of course, be important that private information that is not relevant to the case not be revealed to unauthorized persons. I have been the examiner of record in several cases wherein all information I deemed to be relevant to the case was first produced to the court and/or to the other side before such information was allowed to be revealed to my client. In my experience, this is common practice in the field. It is not uncommon (although it is more expensive) for a neutral expert to be a gatekeeper for such information.
When a document is written, multiple files are created.
When a document is deleted, the original is not destroyed, and none of the additional, invisible files may be affected at all.
The computer(s) in question may contain information that is relevant to discovery, but has not been produced, due at least to the fact that it is not simple or even possible for the use of a standard data recovery tool to produce all such documents.
Continued use of the computer(s) in question in the case is likely to spoil evidence.
A computer forensic specialist can make identical copies of the hard disks in question without disrupting the data on the computer. These copies may then be properly examined in a lab.
A computer forensic specialist is likely to be able to produce relevant documents or portions thereof by examining the hard disk copies.
It is possible to keep any data not relevant to the case from being revealed to the plaintiffs through the vetting process described above.
Copyright, Steven Burgess, 2004
Subscribe to our free and informative weekly forensics newsletter!