The Audio Video Annotations Analysis Toolkit
Corpora analysis is a complex task, requiring to learn editors for different file formats and multiple tools, often command-line based, or with programming knowledge prerequisite.
AVAA Toolkit makes it easy to create pipelines connecting ecosystems to process raw data (automated transcriptions, formats conversion..), and query large corpora of annotations coming from various sources to extract advanced statistics and generate beautiful, always up-to-date charts and timelines.
AVAA Toolkit is also a flexible converter ; it takes as input XML files describing the style and operations to generate an HTML document, and takes care of exporting only relevant portions of videos and their thumbnail snapshots, minimizing final document size and potential load times if hosted online.
AVAA Toolkit understands the following file formats
AVAA Toolkit can also process the following media types
The following software must be installed:
Simply extract the latest release zip
To start AVAA Toolkit, simply double-click the launcher file avaa-toolkit.exe
In case AVAA fails to start, check the troubleshooting section.
On those systems AVAA Toolkit can be started by running the shell script avaa-toolkit.sh but it must be set as executable first:
Note that Java is required and you might need to install it java.com/download
When AVAA Toolkit is already installed, follow these steps to update:
An editor for AVAA Toolkit's XML documents is available in the browser.
To begin, start AVAA Toolkit by running the launcher (avaa-toolkit.exe on Windows or avaa-toolkit.sh on Linux/macOS),
then navigate with your browser to avaa-toolkit.org
If internet is not available, use the provided offline editor in your installation folder (open index.html)
By default, the editor is allowed to create and edit files in the projects folder.
It is possible to add other folders to the editor, just edit avaa-config.xml file.
AVAA Toolkit is all about querying and filtering annotations. Complex queries can be expressed to extract only specific annotations.
This is done via the SELECT tag, various attributes can be combined to make a curated selection of annotations:
Attributes of type regexp (*-match) have additional options:
When multiple attributes are used, the selection will consist only of the annotations fulfilling all the constraints.
A view defines how annotations are rendered in the page. Each view has its specific attributes that can alter annotations' display and final HTML output. While all views come with basic default style, it is possible to change any visual aspect via CSS to fit custom needs. Because everyone will have different visual requirements, styling is entirely left to the document authors.
The concordancer view displays a table of annotations and their cotext annotations. Attributes allow timerange and count limits, to extract only meaningful relative annotations. Configurable display to see each cotext annotation in its own column, or all combined into one clip.
Input
<VIEW type="concordancer" max-time-diff="15" cotext="1">
...
</VIEW>
<VIEW type="concordancer" max-time-diff="15" cotext="2" show-video="true" video-with-cotext="false" cotext-split="true" cotext-video="true">
...
</VIEW>
The density view plots annotations as filled bars in a timeline, revealing interactions frequency and duration between tiers.
Input
<VIEW type="density" zoom-factor="500">
...
</VIEW>
The density timeline plots annotations as filled bars horizontally, revealing interactions frequency and duration between tiers. Flexible time collapsing options allow compacting empty space between annotations. Currently not compatible with alt-tiers.
Input
<VIEW type="density-timeline" zoom-factor="500">
...
</VIEW>
The form view renders HTML input forms, allowing easy online sharing for collecting external data. Form results are simple JSON files which can then be imported back as virtual tiers for further analysis.
Input
The intercoding view makes it easy to process JSON files resulting from forms, and display them in a meaningful way for intercoding validation and statistics.
Input
Displays JSON serializable data as an interactive tree
Input
<VIEW type="json">
...
</VIEW>
The list view simply renders annotations one after another. It is possible to specify a custom class for switching display to grid mode.
Input
<VIEW type="list" show-tier="false" style="grid">
...
</VIEW>
The table view can display various annotations' (or objects) properties into table columns. It also supports the Extra protocol allowing user-defined properties to be displayed as columns.
Input
<VIEW type="table">
<SET column="tier" name="Actor" />
<SET column="value" name="Transcript" />
<SET column="video" name="Video Clip" />
...
</VIEW>
A special view to display testcase results
The timeline view displays annotations vertically with time markers, and using one column per tier. Ideal for dialogs between 2+ participants.
Input
<VIEW type="timeline" collapse="60">
...
</VIEW>
The wordcloud helps visualize frequency of words and has many customisation attributes. Wordcloud can slow down PDF generation time and can take some time to show-up.
Input
Use charts to visualize data through meaningful representations. Powered by D3.js and Observable
An operation takes input data and transforms it. Operations can also modify or filter the current selection of annotations.
Loads JSON files resulting from forms, as virtual annotations. This operation is deprecated and load-annotations-from-forms should be used instead.
Input
Output
<OP annotations-from-form="my form (*).json" />
Clones each annotation in the selection, so they can be modified without affecting the originals.
Input
Output
<OP clone="" />
Counts the elements in each array element of the input array/object, replacing the array element itself with an integer value
Input
Output
<OP count="" />
Counts by a specific property
Input
<OP count-by="tier" />
Count keys across all input objects
Input
Output
Detects sequences by grouping annotations that are close to each other.
Input
Output
<OP detect-sequences="" _.range="15" _.output="object" />
Calculate sum of annotations' duration by grouping on a property (value/tier/group/participant)
Input
<OP duration-by="tier" _.factor="60000" _.truncate="2" />
Iterates over the array(s) values, and executes the provided function
Input
<OP each="o => o.x = (o.x * 42) + 7" />
Iterates over each file, and executes the provided function
Input
Output
<OP each-file="file => log.info(file.filename)" />
Takes each input annotation and extends its duration by changing their start and/or stop times. This will by default clone all input annotations so the originals stay unaffected.
Input
Output
<OP extend-duration="2" />
Filters the array(s) keeping only items passing the filter expression
Input
<OP filter="annotation => annotation.tier.id.contains('x')" />
Filters the array(s) keeping only annotations whose value matches the provided regexp
Input
<OP filter-by-value="wh." />
Transforms nested objects into a flat array of objects
Input
Output
<OP flatten="{name,value}" />
<OP flatten="{group,name,value}" />
Groups the array elements by the value of a specific element's property. The result is an object whose keys are the property values, mapped to arrays of elements.
Input
Output
<OP group-by="tier" />
Loads JSON files resulting from forms, as virtual annotations. The JSON files must be in the folder of the processed XML file.
Input
Output
<OP load-annotations-from-form="my form (*).json" />
Loads a JSON file or set data directly from embedded JSON. The JSON file must be in the folder of the processed XML file.
Input
Output
<OP data-from-json="" _.file="myfile.json" />
<OP data-from-json="{foo:'bar', x:42}" />
<OP data-from-json="">{json5:"works"}</OP>
Loads a XLS or CSV file.
Input
Output
Creates a new array populated with the results of calling the provided function on each element of the input array(s). Special pseudo-object syntax can be used to facilitate direct mapping of object properties (see examples below).
Input
<OP map="annotation => { return {value: annotation.value, tier: annotation.tier.id} }" />
<OP map="{value: .value, tier: .tier.id}" />
<OP map="{value, tier: .tier.id}" />
Clears a MongoDB Collection (drops the collection)
Loads objects from a MongoDB Collection
Output
Inserts input objects into a MongoDB Collection
Input
Removes objects from a MongoDB Collection
Input
Calculates percentage of an annotations' property value occurrence.
Input
<OP percent-by="tier" />
Randomizes the input selection with a PRNG allowing reproducible results based on an initial seed.
Input
Output
Executes a "reducer" function on each element of the input array(s), in order, passing in the return value from the calculation on the preceding element. The final result of running the reducer across all elements of the array is a single value. (accumulator, currentValue) => accumulator + currentValue, initialValue
Input
<OP reduce="(totalChars, annotation) => totalChars + annotation.value.length, 0" />
Considers each input annotation as a sequence, and selects those from another tier (of the same file) that are included in the sequences.
Input
Output
<OP replace-with-annotations-in-sequence-from-tier="tier id" _.overlap="false" />
Replaces each input annotation with one from another tier (of the same annotations' file), the first found whose start time is after input annotation's start time.
Input
Output
<OP replace-with-next-timecode-annotations-from-tier="tier id" _.range="60" />
Replaces each input annotation with one from another tier that has the same start time.
Input
Output
<OP replace-with-same-timecode-annotations-from-tier="tier id" _.range="0.5" />
Remove XML tags from annotations' value
Input
Output
<OP set-tag="+mytag" />
<OP set-tag="-mytag" />
Saves current data to a CSV file. The file will be saved in the processed XML file's folder, overwriting any existing file.
Input
<OP save-data-to-csv="table.csv" />
Saves current data to a JSON file. The file will be saved in the processed XML file's folder, overwriting any existing file.
Input
<OP save-data-to-json="foobar.json" />
Loads an URL (or a file) and extract data
Input
Output
<OP scrape="" _.url="https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)" />
Transforms input annotations to an array of objects
Input
Output
Sets a global variable which becomes available in HTML blocks as {{varname}}. Global variables can also be used in scripts with variables.varname
Adds or removes annotation's tag
Input
Output
<OP set-tag="+mytag" />
<OP set-tag="-mytag" />
Changes the tier of each input annotation. By default will clone annotations so it does not affect original ones.
Input
Output
<OP set-tier="FAKE TIER" />
Sorts input array by a given field
Input
Output
<OP sort="-start" />
Sorts annotations first by their file, and then by the provided field or compare function. When providing a field, first character must indicate sorting order with +/- (ascending/descending)
Input
Output
<OP sort-by-file="" />
<OP sort-by-file="+start" />
<OP sort-by-file="-start" />
<OP sort-by-file="(a1,a2) => a1.start > a2.start ? -1 : 1" />
Sorts annotations by their group, and then inside each group by their relative start time (using eventual file time-offset) If no field is provided, sorting will be done around //TODO
Input
Output
<OP sort-by-group="" />
Calculate sum of the annotations' property value. (not working yet)
Input
<OP percent-by="tier" />
Computes the type-token of input annotations.
Input
Output
Use processors to analyze or convert raw data such as audio or video, and to manipulate the corpus.
Audio Anonymizer modifies media files by applying audio filters on each input annotation segment. Available modes are - silence: replaces each segment with complete silence (default) - noise: replaces each segment with a configurable noise - beep: replaces each segment with a configurable beep - voice: replaces each segment with a synthetized voice - file: replaces each segment with a custom audio file
Input
Demucs can separate voice and instruments from an audio track
Copy current corpus media files to a folder. The folder will be created in AVAA's temp directory.
Exports current corpus to EAF format.
Exports a selection of annotations to SRT format
Input
Cuts a segment from each corpus media file. This processor also accepts an array of annotations to cut multiple segments. In this case, the corpus will be reduced to relevant annotation files and each media file will be replaced by its cuts, or by a one merged file from all cuts when "concat" attribute is set to true.
Input
This processor calls ffmpeg's denoise feature. - FFT: Denoises audio with FFT. - NLM: Reduces broadband noise using a Non-Local Means algorithm. - RNN: Reduces noise from speech using Recurrent Neural Networks model. Learn more about the RNN models at https://github.com/GregorR/rnnoise-models
This processor calls ffmpeg with a user defined audio filter Learn more about what filters can do at https://www.ffmpeg.org/ffmpeg-filters.html
This processor calls ffmpeg with a user defined filter-complex. Learn more about what filters can do at https://www.ffmpeg.org/ffmpeg-filters.html
Applies a frei0r filter on each corpus media file.
Hardcodes annotations as subtitles on top of video. This processor will automatically use the values of the annotations that generated the clips, whenever they are available. It is possible to use different annotations from other tiers in range, by adding "source-tier" parameters.
Use Media Converter to convert video and audio files into other formats
Input
This processor calls R with user provided data, and integrates the resulting data into the final HTML page. Resulting R output can be graphic files (jpg, png, gif, svg) or tabular text data. Input provided to R as arguments are in order: - temp directory path to work with and create result files - path to a JSON file consisting of the selection (annotations) or data provided to the processor R scripts must follow a specific input/output syntax to be compatible with AVAA (see "Calling R" in the scripting guide).
Reduce the corpus to specific files. Useful to work on a subset of the corpus without modifying the corpus itself. This processor also accepts a selection of annotations, in which case only corpus files of these annotations will be kept.
Input
Takes a selection of annotations and modifies the corpus, using the input annotations as sequences, each sequence being removed from the corpus file, with its associated media segment and all the annotations included in that sequence.
Input
Rename tiers in all the corpus files.
Takes a selection of annotations and recreates the corpus, using the input annotations as sequences, each sequence being transformed into one corpus file with its associated media file and all the annotations/tiers included in that sequence.
Input
Speaker diarization is the process of marking segments of voice with their speaker. This processor takes a selection of annotations, and add to the corpus new annotations associated with their speaker tier.
Input
Output
A speech to text processor using OpenAI Whisper to transcribe and automatically create annotations
Output
Silero's Voice Activity Detector processor creates annotations for each segment of input audio containing voice.
Output
Anonymize videos with these special effects: - deface: automatically detects and blur faces - cartoon: cartoonize the video - cartoon-blur: cartoonize and blur the video
AVAA Toolkit features an advanced pipeline system easing automation of complex tasks.
A pipeline is created for each section of the document, and initially contains a virtual copy of the corpus and its associated media files.
The corpus and its media files are then modified sequentially by each processor inside the pipeline.
The pipeline can be fed different initial media files, by defining the processor-pipeline-input setting.
The corpus mode is useful to process corpus files directly (audio-anonymization, formats conversion...), while for instance all-assets mode could be used to apply effects only on the exported media of the document intended for sharing with peers.
Processors inside a pipeline (that is for now, a section of the document) are executed one after another, each processor using the results of the previous one to work on.
Complex chains of processors can be built to automate heavy tasks alleviating the burden of manually running each step and verifying its consistency.
Views placed after a processor (in the same section) will inherit its modified media files when exporting clips and snapshots.
This can be helpful extracting annotations from cuts of raw media files, to avoid processing long corpus media file when testing samples ; or preprocessing a media file before it is exported into clips during later views generation.
Processors generating annotations will make these annotations immediately available in the main corpus (and not only for the current pipeline), hence for all subsequent views and processors in the document.
Settings can be modified at any time via the Local Settings block.
It is possible to change the style via CSS. The HTML code generated makes it easy to target specific elements or apply styling rules for the whole page. Each view has its own structure of elements, and a simple "Inspect Element" from browser will reveal selectors.
Styles can be defined directly in the XML file, by using a STYLE tag.
These styles will only apply to this specific HTML document.
<STYLE>
.view-timeline td {
border-color:red;
}
.view-timeline tr.tier-header {
text-align:right;
}
</STYLE>
Styles can be defined in a separate CSS file, that must be placed in the include folder.
All the generated HTML documents will load this file and have these styles in common.
h2 {
color:green;
}
section {
border-left: 2px solid gray;
}
Views generate simple HTML code and try to follow common guidelines so that applying styles is straightforward
Annotations' text labels always have the annotation class, so for instance to change the color of all annotations:
.view .annotation {
color:red;
}
AVAA Toolkit can also generate PDF, though interactive features like videos or dynamic charts won't work in this format, for obvious reasons.
Chrome (or Chromium) must be installed on the system, and the cli argument --pdf must be specified.
Chrome executable should be detected automatically, if that fails it is required to provide its path with the --chrome-exe argument.
If everything works correctly, a file.pdf should be generated along the file.html document.
AVAA Toolkit is made for the command line and can integrate seamlessly in any tool chain.
Usage:[options] XML files or folders to process Options: --lang Language of the generated document, if translations are available Default: --watch Watch for xml changes and regenerate documents Default: false --combine Combine documents into one final html file Default: false --pdf Convert html into pdf (chrome required) Default: false --zip-all Zip all generated documents together Default: false --zip-each Zip each generated documents separately Default: false --deployer-user Deployer user name Default: --deployer-pass Deployer password Default: --deploy Upload zip to deployer Default: false --deployer-url Specify a custom deployer URL to upload zip to Default: --chrome-exe Chrome executable path if autodetect fails Default: --debug Debug mode Default: false --verbose Display more information when converting Default: true --path-temp Path for temporary files. Default to ./temp/ --path Path of application for includes. Default to working directory --test Run a XML document as a test suite Default: --gendoc Generate all documentation and exit Default: false --dev Reload scripts before building Default: true --cache-af Cache annotations file in memory for faster exec Default: true --server-allowed-origin A custom origin URL allowed to connect to the server Default: --server Websocket server for editor and interactive sessions Default: false --server-port Websocket server port Default: 42042 --mongo-host Address of the mongodb server Default: 127.0.0.1 --mongo-port Port of the mongodb server Default: 27017 --mongo-db Name of database to work with Default: avaa --download-remote-corpus Whether to automatically download a referenced remote corpus Default: false --conf Custom config file to load Default: avaa-config.xml
Error: A JNI error has occurred, please check your installation and try again Exception in thread "main" java.lang.UnsupportedClassVersionError: org/avaatoolkit/Main has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to X
Solution: Your version of java runtime is outdated, follow these steps
java.net.BindException: Couldn't bind to any port in the range `42042:42042`. at org.glassfish.grizzly.AbstractBindingHandler.bind(AbstractBindingHandler.java) at org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java) at org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:) at org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java) at org.avaatoolkit.server.Daemon.start(Daemon.java) at org.avaatoolkit.Main.main(Main.java)
Solution: The toolkit is already started with the --server argument, close it before running a new instance.
Solution: Your firewall has a strict policy regarding localhost port bindings, add a rule to allow localhost:42042
On some operating systems, the installed java runtime might not be up-to-date and prevent AVAA Toolkit from executing properly.
To run AVAA Toolkit, at least java 11 is required. To install a valid runtime only for AVAA Toolkit:
Some processors require a full FFmpeg version to work.
When generating really short clips (under 1 sec), it is possible that the clips will consist of only one frozen image.
This is because by default FFmpeg will be instructed to do a copy of the video stream (vcopy), which saves considerable processing time, at the expense of less accurate clipping.
When perfect accuracy is required for clips, it is recommended to force FFmpeg re-encoding, for instance by defining the Setting video-codec = h264
This documentation includes attributions to licensed material such as libraries and software modules.
These notices are written explicitly in each relevant component and for convenience listed again below.
Some modules are not included in AVAA Toolkit but rather installed on demand whenever a component requires it.
Other modules are included or integrated in AVAA Toolkit to provide a better overall user experience.
This behavior is indicated by a little icon preceding the license, as well as a tooltip describing its inclusion method.
Additional libraries are packaged with the produced HTML document, and therefore redistributed by the end user.
jQuery simplifies DOM manipulation, some components use it to initialize content in the browser.
D3 has unparalleled flexibility in building custom and dynamic visualizations.
Charts generated by AVAA Toolkit are actually rendered right in the browser with D3.
GSAP is incredible and we deemed its inclusion valuable for providing a robust interactivity and animation framework for future AVAA Toolkit components.
Tipped features easy to use and customizable tooltips.
AVAA Toolkit views sometimes use these tooltips for instance to show snapshots or videos in a small popup when an annotation is clicked or hovered.
FileSaver.js provides a simple interface to save (as a "download") files created directly in the browser.
We believe AVAA Toolkit components can benefit from the presence of the FileSaver library.
AVAA Toolkit itself is built with Java, and makes use of various libraries (via Maven) which are compiled into the final JAR executable distributed to the end user.
A conversion tool between Elan, Clan, Transcriber and Praat formats with TEI as pivot.
A library for extracting things from streaming sites, AVAA Toolkit includes this library to provide an easy API for downloading PeerTube videos.
A FFmpeg CLI Wrapper for Java, used to execute FFmpeg and read progress feedback.
Rhino is the JavaScript engine used to execute all components' scripts.
The best library for parsing command-line arguments.
This library is used to spawn server sockets, and brings WebSocket sessions (that's how the Editor can interact with AVAA Toolkit).
Jsoup simplifies HTML/XML parsing via a CSS selectors syntax.
An artifact of fully-specified annotations to power static-analysis checks, beginning with nullness analysis.
Jchardet is a Java port of the source from mozilla's automatic charset detection algorithm.
A JNA-based (native) Operating System and Hardware Information library, to get processes details and CPU usage.
Apache Commons is a set of commonly needed features implemented as reusable Java components.
A simple facade abstraction for various logging frameworks.
A reliable, generic, fast and flexible logging framework.
Automate Java boilerplate code via annotations.
OkHttp is an efficient HTTP client.
The MongoDB Synchronous Driver provides an easy API for interacting with a MongoDB Server.
We are currently working on that.