Transcription Teacup Club

This document demonstrates how to build an automated workflow for transcribing audio files and exporting the results to ELAN. By chaining different processors, we can achieve this complex task step by step in a simple and efficient way. The audio corpus is taken from the first chapter of the Teacup Club by Eliza ARMSTRONG.

Transcription of a 60 seconds extract

For testing, we want to work on a small sample because speech-to-text is really slow If the sample works correctly, feel free to disable this processor to have the full transcription
  • Starting point in seconds to start cutting from
  • Duration of cut segment
We will use OpenAI Whisper to do the actual speech-to-text This will populate the corpus with one new annotation per transcribed sentence
  • We should specify the language
  • We want the largest model for the best transcription
Whisper only transcribes, it does not differenciate the different speakers So we have the transcription as annotations, but they are not yet associated to their speaker We can use Pyannote to do the association (known as diarization) This will populate the corpus with new annotations associated to their speaker tier
  • Your HuggingFace token, required to download pyannote gated models
  • The number of speakers present in the audio
We give to Pyannote our annotations from Whisper via a simple selection
Pyannote can't guess the names of the speakers, so we will set them manually
  • We simply provide a JSON object mapping the names
We insert a density view, so we can see all annotations and their media at once
  • Whether to display the annotation value on mouse over
  • Whether to display the media snapshot on mouse over
  • Whether to display the media player on click
Let's display a different tier name for whisper, the original "stt-whisper" is not human-friendly
We want to see all tiers in the density view, so a simple select will do
We insert also a timeline view, perfect for visualizing dialogs
  • Whether to display the annotation's video clip
In the timeline we want to see all tiers...
...except whisper:
Now for the final step, we want to export this transcription so we can do further work with ELAN
  • We want the audio segment to be copied next to the EAF file
After running this example, a folder should be created next to this XML document file It will contain the EAF file ready to be opened in ELAN, along with the audio file