Both computer forensics experts and data recovery technicians seek to recover deleted data. Data recovery is primarily interested in bringing back files, while computer forensics tends to dig deeper, looking not just for deleted documents, but also for metadata (data about data – such as file attributes, descriptions, dates, and other information) and meaningful snippets of unrecoverable files. One area of particular interest is email.
When most documents are written to a computer’s hard disk, each newly created document has its own directory entry (what the user sees as a listing in a folder). If a file has been deleted, but has not been overwritten by another document, the recovery process is a relatively trivial part of e-discovery or of data recovery. But when the data of interest is from deleted email, the discovery process is likely to differ significantly from that of data recovery. Individual emails are stored differently than individual files. Different types of email programs store data differently on the user’s hard disk and require different schemes for finding useful information. As a result, the deletion of emails and recovering of deleted emails differs not only from that for other types of documents, but also between different types of email programs.
There are three main types of email in common usage – Microsoft Outlook (often paired with a Microsoft Exchange Server), text-based email client programs, and web-based email, or webmail.
In Microsoft Outlook, all emails are kept in one large, encrypted, non-text file – the PST, or Personal Folders file. Outlook has additional functions and additional content as well. There is an integrated address book, multiple mailboxes, a calendar, and a scheduler, all of which are contained within the PST file. When one looks into a PST file with a file editor or word processing application, there is little or nothing intelligible to the human eye. The file content looks like nearly random characters.
In general, the PST file must be loaded into Outlook to be read. When an email is deleted, or even when it is purged, it may be kept within the body of the single large file, but become inaccessible to the program. Some deleted data may be recovered by manipulating the file though a manual process, repairing the resultant file, and then loading back into Outlook.
Text-based email programs include Microsoft Outlook Express, Qualcomm Eudora Pro, Mozilla Thunderbird, Macintosh Mail, and others.
In text-based mail applications, each mailbox has its own file, and all emails from a given mailbox are kept in that one file. For instance, there is likely to be a single file for all of the emails in the Inbox, one for all in the Outbox, one for each user-generated mailbox, and so on.
These mailbox files are primarily text files, When an individual email is deleted, the text may be “orphaned,” or released from the body of the file, but may still be recoverable as a file remnant that may contain the body of the email as well as information such dates, times, and sender.
A standard data recovery process would not recover such deleted email, for the mailbox that had contained them might still be intact – just not still holding the specific deleted emails. Part of electronic discovery would include searching the unallocated (when a file is written, the operating system allocates a specific area of the hard disk to that file. When the file is deleted, that space is de-allocated, and is referred to as unallocated space) portion of the hard disk for specific terms or phrases that are likely to be within the body of suspect emails. A search may also be performed for email headers that are also text-based. The resulting data may then be gathered and displayed as text files.
A third form of email is Web-accessed email. Many, if not most, commercial email providers offer the user the opportunity to access email via a web browser. America Online is another email provider that generally does not store email on the user’s computer by default. Email is stored on a remote computer, or distributed across many remote computers, that may be any place on the Internet. Much like viewing a television program does not actually store it on your TV, most webmail is not stored on your local computer.
As webmail servers host hundreds or even millions of users and their email, the storage of such email is extremely dynamic. When emails are erased in such an environment, remnants of individual emails and files tend to be overwritten quickly and repeatedly. There may be some remnants found on the user’s computer in a Virtual memory or a buffer file, however. The recent US Attorney’s scandal highlighted the use of such web-based email (see Why Email Matters: the Science Behind the US Attorney Scandal, by Steve Burgess).
There is always a chance that remaining deleted files, or remnants thereof may be overwritten. Due to this possibility, it is best to immediately turn off any computer where the recoverability of data is in question. The longer the computer remains in use, the greater the likelihood of useful data being irreparably destroyed. If a user’s computer is likely to be used or inspected during legal matters, or if document discovery is expected, the computer should be turned off to avoid spoliation of evidence.
If precautions are taken once a file is deleted, the file is likely to be recoverable. The same is true of email. While deleted or trashed email may not be recoverable as a complete mailbox file, the content of said email and its metadata might be discoverable or recoverable through the different methodologies available to computer forensics specialists.
Subscribe to our free and informative weekly forensics newsletter!