Transcription Teacup Club

Transcription of a 60 seconds extract

For testing, we want to work on a small sample because speech-to-text is really slow If the sample works correctly, feel free to disable this processor to have the full transcription

Starting point in seconds to start cutting from
Duration of cut segment

We will use OpenAI Whisper to do the actual speech-to-text This will populate the corpus with one new annotation per transcribed sentence

We should specify the language
We want the largest model for the best transcription

Whisper only transcribes, it does not differenciate the different speakers So we have the transcription as annotations, but they are not yet associated to their speaker We can use Pyannote to do the association (known as diarization) This will populate the corpus with new annotations associated to their speaker tier

Your HuggingFace token, required to download pyannote gated models
The number of speakers present in the audio

We give to Pyannote our annotations from Whisper via a simple selection

Pyannote can't guess the names of the speakers, so we will set them manually

We simply provide a JSON object mapping the names

We insert a density view, so we can see all annotations and their media at once

Whether to display the annotation value on mouse over
Whether to display the media snapshot on mouse over
Whether to display the media player on click

Let's display a different tier name for whisper, the original "stt-whisper" is not human-friendly

We want to see all tiers in the density view, so a simple select will do

We insert also a timeline view, perfect for visualizing dialogs

Whether to display the annotation's video clip

In the timeline we want to see all tiers...

...except whisper:

Now for the final step, we want to export this transcription so we can do further work with ELAN

We want the audio segment to be copied next to the EAF file

After running this example, a folder should be created next to this XML document file It will contain the EAF file ready to be opened in ELAN, along with the audio file