Audio File Challenges for Computer Forensics & eDiscovery

By Steve Burgess

Unified communications is the term used for integrating all communications – data and voice – over the Internet. This can include data in its myriad forms such as email, instant messaging data, data generated by business computer applications, faxes, and text messages. But key sources include voice sent via network avenues or stored on digital devices, such as VOIP (Voice Over Internet Protocol), voice mail, audio-video, web conferencing, white boarding, and .wav files. Such integrated communications can save money from operating budgets.

Savings accrue from doing away with, among other expenses, long distance charges when using VOIP, from dispensing with the need for travel to meetings when they can be held in a virtual environment, or from travel to far-away classes when an instructor or team can be using a whiteboard from disparate physical locations. Savings like these accrue to the 26% of businesses that have adopted them. But when litigation demands discoverable data, .wav and voice-based files can be difficult and costly for a computer forensics expert or an e-discovery system to search and index.

There are many tools designed for searching text files, and even for text from deleted files. These range from computer forensic suites such as EnCase and Access Forensic Toolkit that each costs thousands of dollars, to open source tools, including hex editors that cost the user nothing at all. The more extensive packages may be less expensive in the long run when billable humans are added to the mix.

There are many wildly expensive e-discovery systems in place to assist in storing and indexing the large masses of data that are generated on a daily basis in the corporate environment. Services may be outsourced, or brought in-company. Again the cost of putting the systems and procedures into place may pale against the sanctions and fines that could result from not being ready for litigation, should it arise.

There are also many effective tools for scanning paper documents into text files, which are then searchable.

While many of the tools for searching and storing data are effective, and accurate, when it comes to audio, no such level of accuracy or ease yet exists for the purpose of searching for specific information. There are currently three means of searching audio: phonetic search, transcribing by hand, and automatic transcription.

Phonetic search technology matches wave patterns, or phonemes, to a library of known wave patterns. For example, the acronym “B2B” would be represented by the following phonemes: “_B _IY _T _UW _B _IY” (Wikipedia example from Nexidia, a company involved in speech recognition systems). Given the wide variation in modes of speaking, pronunciation, accents and dialects, the accuracy of this method is spotty. It produces many false hits. And while it may identify sections and phrases that are of interest, it doesn’t transcribe the audio into text – the audio must then be listened to.

Manual transcription of audio so that transcribed text can then be automatically searched, is time-consuming. As it depends upon a listener to type the words as they are heard, this labor-intensive task can also be very expensive. There may be security concerns, as the audio goes outside the company (or perhaps the country) to be transcribed.

Machine transcription is the one automated means of converting audio to text. But it suffers from accuracy issues. It compares “heard” audio with known libraries, again facing issues of differing pronunciations, terms not in existing libraries, and clarity of recording. While high-quality recordings can lend themselves to recognition rates of 85% or so (a positive-looking number until compared with the nearly 100% accuracy of pure text searches), when dealing with voice mail, accuracy dips down as low as 40%.

The new Federal Rules of Civil Procedure (FRCP) require companies to have a means of identifying key communications and data sources. That data must then be saved. For the sake of efficiency, both in the optimizing amount of storage required, and diminishing the volume of data that must be identified and produced for litigation, it is also important to be able to accurately identify data that is unnecessary.

While requirements for retention of data increase, and storage costs go down, identifying what audio should be kept and what should be deleted can be costly. As such information is digitized, it must nonetheless be stored and indexed (or searched after the fact). The technology is not mature, and is evolving. There may be an opening for an innovative company to prosper here, especially if able to produce some kind of breakthrough in voice-to-text technology. In the meanwhile, companies face a difficult issue in deciding what stays and what goes.

Steve Burgess is a freelance technology writer, a practicing computer forensics specialist as the principal of Burgess Forensics, and a contributor to the recently released Scientific Evidence in Civil and Criminal Cases, 5th Edition by Moenssens, et al. Mr. Burgess may be reached at or via email at steve at burgessforensics dot com

Subscribe to our free and informative weekly forensics newsletter!


Pin It on Pinterest

Share This