AVMS Webinar: Enhancing Access to Audiovisual Resources with the AI Model Whisper
This webinar provides an in-depth exploration of Open AI’s Whisper, an advanced AI-powered speech recognition model renowned for its ability to transcribe and translate across an impressive range of 97 languages. Whisper’s exceptional capabilities stem from its training on a diverse and extensive dataset of audio recordings, amounting to 680,000 hours. The model’s proficiency lies in its deep neural networks, which handle complex speech patterns and contextual nuances with a high degree of accuracy. Notably, Whisper is released under an open-source MIT license, allowing its integration into other services.
We will demonstrate Whisper’s practical applications in digital library services, focusing on the TIB AV-Portal (av.tib.eu) and the Serbian portal zavicajna.digitalna.rs. We will show how Whisper enhances searchability and findability of content, improves linking of named entities, subtitles videos, and supports multilingual understanding and searching.
However, Whisper is not without its challenges. We will discuss ‘hallucinations’, where the AI generates incorrect or nonsensical responses. Furthermore, we will discuss the problem that less represented languages, such as Serbian, often yield suboptimal results. Strategies to mitigate these issues, such as filtering out transcripts with excessive repetitive loops and fine-tuning to enhance the accuracy of Serbian texts, will be explored.
Concluding the session, the webinar will introduce Subtitle Edit, an open-source video subtitle editor. This tool is adept at creating transcripts using both the official and fine-tuned models of Whisper on low-resource computers. Participants will learn how to utilize Whisper for speech recognition and then import the resulting texts into Subtitle Edit for further refinement and processing.