The Audio Video Annotations Analysis Toolkit 
Corpora analysis is a complex task, requiring to learn editors for different file formats and multiple tools, often command-line based, or with programming knowledge prerequisite.
AVAA Toolkit makes it easy to create pipelines connecting ecosystems to process raw data (automated transcriptions, formats conversion..), and query large corpora of annotations coming from various sources to extract advanced statistics and generate beautiful, always up-to-date charts and timelines.
AVAA Toolkit is also a flexible converter ; it takes as input XML files describing the style and operations to generate an HTML document, and takes care of exporting only relevant portions of videos and their thumbnail snapshots, minimizing final document size and potential load times if hosted online.
![]()
Annotations Formats
AVAA Toolkit understands the following file formats
- AZP Advene
- Celluloid Huma-Num Celluloid Platform
- CHA CLAN (Computerized Language ANalysis)
- EAF ELAN (EUDICO Linguistic Annotator)
- MKV Matroska embedded subtitles
- NoScribe NoScribe HTML transcriptions
- OTR oTranscribe
- SRT SubRip
- TEI Text Encoding Initiative
- TEXTGRID Praat
- YouTube YouTube video platform
TEI, CHA and TEXTGRID formats are available thanks to the TEI-CORPO project.
Media Formats
AVAA Toolkit can also process the following media types
- audio
MP3AACOGGWAVOPUSFLAC - video
MOVMKVMP4AVIMTS
Most media processing made possible by FFmpeg