How do I search content, within audio files/streams?

12,988

If you want to search for text (i.e. what is being said) inside an audio stream you would have to process it with some kind of speech recognition algorithm and store the text as meta data associated with the files. For video you could also do text recognition for text inside the video. Evernote already does this for text inside image files, but has no support for audio as far as I know.

Something similar is possible when using audio to search for audio. I don't know the details of these algorithms, but I'm guessing they involve some kind of frequency analysis. Shazam is using this kind of technology to identify songs based on audio clips.

Here are some Wikipedia articles that may be useful:

Share:
12,988
Pascal
Author by

Pascal

Technology enthusiast Programmer (Multiple languages) Geek

Updated on June 03, 2022

Comments

  • Pascal
    Pascal almost 2 years

    I have always wondered how many different search techniques existed, for searching text, for searching images and even for videos.

    However, I have never come across a solution that searched for content within audio files.

    For example: Let us assume that I have about 200 podcasts downloaded to my PC in the form of mp3, wav and ogg files. They are all named generically say podcast1.mp3, podcast2.mp3, etc. So, it is not possible to know what the content is, without actually hearing them. Lets say that, I am interested in finding out, which the podcasts talk about 'game programming'. I want the results to be shown as:

    • Podcast1.mp3 - 3 result(s) at time index(es) - 0:16:21, 0:43:45, 1:12:31
    • Podcast21.ogg - 1 result(s) at time index(es) - 0:12:01

    So my questions:

    • How could one approach this problem?
    • Are there are suitable algorithms developed to do something like this?

    One idea the cropped up in my mind was that, one could use a 'speech-to-text' software to get transcripts along with time indexes for each of the audio files, then parse the transcript to get the output.

    I was considering this as one of my hobby projects. Thanks!