Audio File Challenges for Computer Forensics & eDiscovery

by Steve Burgess | Aug 17, 2015 | Uncategorized

By Steve Burgess

Unified communications is the term used for integrating all communications – data and voice – over the Internet. This can include data in its myriad forms such as email, instant messaging data, data generated by business computer applications, faxes, and text messages. But key sources include voice sent via network avenues or stored on digital devices, such as VOIP (Voice Over Internet Protocol), voice mail, audio-video, web conferencing, white boarding, and .wav files. Such integrated communications can save money from operating budgets.

Savings accrue from doing away with, among other expenses, long distance charges when using VOIP, from dispensing with the need for travel to meetings when they can be held in a virtual environment, or from travel to far-away classes when an instructor or team can be using a whiteboard from disparate physical locations. Savings like these accrue to the 26% of businesses that have adopted them. But when litigation demands discoverable data, .wav and voice-based files can be difficult and costly for a computer forensics expert or an e-discovery system to search and index.

There are many tools designed for searching text files, and even for text from deleted files. These range from computer forensic suites such as EnCase and Access Forensic Toolkit that each costs thousands of dollars, to open source tools, including hex editors that cost the user nothing at all. The more extensive packages may be less expensive in the long run when billable humans are added to the mix.

There are many wildly expensive e-discovery systems in place to assist in storing and indexing the large masses of data that are generated on a daily basis in the corporate environment. Services may be outsourced, or brought in-company. Again the cost of putting the systems and procedures into place may pale against the sanctions and fines that could result from not being ready for litigation, should it arise.

There are also many effective tools for scanning paper documents into text files, which are then searchable.

While many of the tools for searching and storing data are effective, and accurate, when it comes to audio, no such level of accuracy or ease yet exists for the purpose of searching for specific information. There are currently three means of searching audio: phonetic search, transcribing by hand, and automatic transcription.

Phonetic search technology matches wave patterns, or phonemes, to a library of known wave patterns. For example, the acronym “B2B” would be represented by the following phonemes: “_B _IY _T _UW _B _IY” (Wikipedia example from Nexidia, a company involved in speech recognition systems). Given the wide variation in modes of speaking, pronunciation, accents and dialects, the accuracy of this method is spotty. It produces many false hits. And while it may identify sections and phrases that are of interest, it doesn’t transcribe the audio into text – the audio must then be listened to.

Manual transcription of audio so that transcribed text can then be automatically searched, is time-consuming. As it depends upon a listener to type the words as they are heard, this labor-intensive task can also be very expensive. There may be security concerns, as the audio goes outside the company (or perhaps the country) to be transcribed.

Machine transcription is the one automated means of converting audio to text. But it suffers from accuracy issues. It compares “heard” audio with known libraries, again facing issues of differing pronunciations, terms not in existing libraries, and clarity of recording. While high-quality recordings can lend themselves to recognition rates of 85% or so (a positive-looking number until compared with the nearly 100% accuracy of pure text searches), when dealing with voice mail, accuracy dips down as low as 40%.

The new Federal Rules of Civil Procedure (FRCP) require companies to have a means of identifying key communications and data sources. That data must then be saved. For the sake of efficiency, both in the optimizing amount of storage required, and diminishing the volume of data that must be identified and produced for litigation, it is also important to be able to accurately identify data that is unnecessary.

While requirements for retention of data increase, and storage costs go down, identifying what audio should be kept and what should be deleted can be costly. As such information is digitized, it must nonetheless be stored and indexed (or searched after the fact). The technology is not mature, and is evolving. There may be an opening for an innovative company to prosper here, especially if able to produce some kind of breakthrough in voice-to-text technology. In the meanwhile, companies face a difficult issue in deciding what stays and what goes.

Steve Burgess is a freelance technology writer, a practicing computer forensics specialist as the principal of Burgess Forensics, and a contributor to the recently released Scientific Evidence in Civil and Criminal Cases, 5th Edition by Moenssens, et al. Mr. Burgess may be reached at https://www.burgessforensics.com or via email at steve at burgessforensics dot com

Subscribe to our free and informative weekly forensics newsletter!

Email spoofing, scamming, and hacking

by Steve Burgess | August 8, 2024 | Uncategorized | 0 Comments

Email spoofing, scamming, and hacking, Copyright 2024 by Steve Burgess Email domain spoofing scams With fortunes, privacy, and identity fraud at stake, we have had a number of cases involving phishing and spoofing in the past few years and into the present where...

AT&T Data Breach and Hack: What Does it Mean to Me?

by Steve Burgess | July 18, 2024 | Uncategorized | 0 Comments

AT&T Data Breach and Hack: What Does it Mean to Me? copyright 2024, Steven Burgess It was ginormous. It included almost all wireless customers from 2022. Did you have an AT&T phone or other account in 2022? You’re one of 110 million (gasp). You be hacked, my...

Somebody deleted stuff off my phone (I swear it wasn’t me!). Can I get it back?

by Steve Burgess | June 20, 2024 | Uncategorized | 0 Comments

- Copyright Steve Burgess 2024 Your phone is suddenly losing text, videos, photos. What’s happening? Are they gone forever? Have I been hacked? How do I avoid this in the future? What’s happening? Of course, it’s hard to tell without some history of the phone’s use,...

CSI Cases from Burgess Forensics #69 A Case of Hiphop Beef

by Steve Burgess | February 11, 2023 | Uncategorized | 0 Comments

The Stories are true; the names and places have been changed to protect the potentially guilty. It was almost closing time on Friday and my thoughts were turning to Barbequeing some of that mouth-watering Santa Maria tri-tip while my nose was turned to the scent of...

Email as a signed contract vs. fraudulent emails

by Steve Burgess | April 14, 2022 | Uncategorized | 0 Comments

Email as a signed contract vs. fraudulent emails We all send and receive email, but did you know that what you say in an email can be interpreted as a legal contract? And that sometimes, emails are fraudulent? Both are true. The Statute of Frauds Although email didn’t...

El Salvador Adopts BitCoin

by Steve Burgess | July 21, 2021 | Uncategorized | 0 Comments

El Salvador Adopts BitCoin copyright Steve Burgess, 2021 El Salvador just passed a law to make BitCoin (BTC) legal tender and is the first country to do so. It did something similar back in 2001, when it made the US Dollar the official currency, replacing the...

Keeping Your Bitcoin Safe

by Steve Burgess | July 7, 2021 | Uncategorized | 2 Comments

BitCoin. Everybody wants some. But what’s the best way to keep it safe once you’ve got it? And how to get it? First things first – you get BitCoin (and Etherium, and DogeCoin) from a cryptocurrency exchange, like you would from a “regular” currency exchange to turn...

Cyberbullying and Covid-19: 2021 Update

by Steve Burgess | May 28, 2021 | Uncategorized | 0 Comments

California defines a cyberbully as anyone who sends any online communication to deliberately frighten, embarrass, harass, or otherwise target another. The Cyberbullying Research Center defines it as “willful and repeated harm inflicted through the use of computers,...

Cybersecurity & Covid-19: Vulnerability and What to Do About It

by Steve Burgess | March 31, 2020 | Uncategorized | 0 Comments

Cybersecurity & Covid-19: Vulnerability and What to Do About It Steve Burgess, 2020 As if we didn’t have enough to worry about. With so many of us working from home (close to 90% of American corporations are encouraging or requiring employees to do so) and having...

Indian Summer Lovin’ – Tech Tips For a Warm Autumn

by Steve Burgess | October 26, 2019 | Uncategorized | 0 Comments

by Natalie Miller, 2019 With Indian Summer temperatures rising, here are some tips to help you make sure your devices are ready to conquer these warm days of Fall like you are. Check Those Pockets! Taking a dip in the pool, going for a paddle in a kayak, and jumping...

← Previous Next →

Recent Posts

Categories