The Audio Video Annotations Analysis Toolkit
Corpora analysis is a complex task, requiring to learn editors for different file formats and multiple tools, often command-line based, or with programming knowledge prerequisite.
AVAA Toolkit makes it easy to create pipelines connecting ecosystems to process raw data (automated transcriptions, formats conversion..), and query large corpora of annotations coming from various sources to extract advanced statistics and generate beautiful, always up-to-date charts and timelines.
AVAA Toolkit is also a flexible converter ; it takes as input XML files describing the style and operations to generate an HTML document, and takes care of exporting only relevant portions of videos and their thumbnail snapshots, minimizing final document size and potential load times if hosted online.
AVAA Toolkit understands the following file formats
AVAA Toolkit can also process the following media types
Simply extract the latest release zip
To start AVAA Toolkit, simply double-click the launcher file avaa-toolkit.exe
The executable is not signed, so Windows will ask for confirmation before starting it.
In case AVAA Toolkit fails to start, check the troubleshooting section.
On those systems AVAA Toolkit can be started by running the shell script avaa-toolkit.sh but it must be set as executable first:
Note that at least Java 11 is required, you can install latest version using trusted adoptium.net binaries.
When AVAA Toolkit is already installed, follow these steps to update:
An editor for AVAA Toolkit's XML documents is available in the browser.
To begin, start AVAA Toolkit by running the launcher (avaa-toolkit.exe on Windows or avaa-toolkit.sh on Linux/macOS),
then navigate with your browser to avaa-toolkit.org
If internet is not available, use the provided offline editor in your installation folder (open index.html)
By default, the editor is allowed to create and edit files in the projects folder.
It is possible to add other folders to the editor, just edit avaa-config.xml file.
AVAA Toolkit is all about querying and filtering annotations. Complex queries can be expressed to extract only specific annotations.
This is done via the SELECT tag, various attributes can be combined to make a curated selection of annotations:
Attributes of type regexp (*-match) have additional options:
When multiple attributes are used, the selection will consist only of the annotations fulfilling all the constraints.
A view defines how annotations are rendered in the page. Each view has its specific attributes that can alter annotations' display and final HTML output. While all views come with basic default style, it is possible to change any visual aspect via CSS to fit custom needs. Because everyone will have different visual requirements, styling is entirely left to the document authors.
The concordancer view displays a table of annotations and their cotext annotations. Attributes allow timerange and count limits, to extract only meaningful relative annotations. Configurable display to see each cotext annotation in its own column, or all combined into one clip.
Input
The density view plots annotations as filled bars in a timeline, revealing interactions frequency and duration between tiers.
Input
The density timeline plots annotations as filled bars horizontally, revealing interactions frequency and duration between tiers. Flexible time collapsing options allow compacting empty space between annotations. Currently not compatible with alt-tiers.
Input
The form view renders HTML input forms, allowing easy online sharing for collecting external data. Form results are simple JSON files which can then be imported back as virtual tiers for further analysis.
Input
A special view with no output, for instance to do live queries.
The intercoding view makes it easy to process JSON files resulting from forms, and display them in a meaningful way for intercoding validation and statistics.
Input
Displays JSON serializable data as an interactive tree.
Input
The list view simply renders annotations one after another. It is possible to specify a custom class for switching display to grid mode.
Input
The table view can display various annotations' (or objects) properties into table columns. It also supports the Extra protocol allowing user-defined properties to be displayed as columns.
Input
A special view to display testcase results.
The timeline view displays annotations vertically with time markers, and using one column per tier. Ideal for dialogs between 2+ participants.
Input
Display annotations as a simple transcript, can also export transcript to CSV files.
Input
The wordcloud helps visualize frequency of words and has many customisation attributes. Wordcloud can slow down PDF generation time and can take some time to show-up.
Input
Use charts to visualize data through meaningful representations. Powered by D3.js and Observable
An operation takes input data and transforms it. Operations can also modify or filter the current selection of annotations.
Calculates sum/average of annotations' duration (in milliseconds) by grouping on a property (value/tier/group/participant). If the property is not specified, input object will be directly used as groups.
Input
Calculates sum/average of annotations' pause duration (in milliseconds).
Input
Calculates percentage of an annotations' property value occurrence.
Input
Clones each annotation in the selection, so they can be modified without affecting the originals.
Input
Output
Combine into one annotation all the overlapping annotations
Input
Output
Combine into one annotation all the consecutive annotations of a same tier
Input
Output
Counts the elements in each array element of the input array/object, replacing the array element itself with an integer value
Input
Output
Counts by a specific property
Input
Count keys across all input objects
Input
Output
Detects sequences by grouping annotations that are close to each other.
Input
Output
Iterates over the array(s) values, and executes the provided function
Input
Iterates over each file, and executes the provided function
Input
Output
Takes each input annotation and extends its duration by changing their start and/or stop times. This will by default clone all input annotations so the originals stay unaffected.
Input
Output
Filters the array(s) keeping only items passing the filter expression
Input
Filters the array(s) keeping only annotations whose value matches the provided regexp
Input
Transforms nested objects into a flat array of objects
Input
Output
Groups the array elements by the value of a specific element's property. The result is an object whose keys are the property values, mapped to arrays of elements.
Input
Output
Groups the annotations by specific file tags (comma separated or JSON array of strings). Corpus files can have multiple tags, therefore an annotation could appear in multiple groups. The result is an object whose keys are the chosen tags, mapped to arrays of annotations.
Input
Output
Improves transcribed content with various heuristic techniques.
Input
Output
Loads JSON files resulting from forms, as virtual annotations. The JSON files must be in the folder of the processed XML file.
Input
Output
Loads a JSON file or set data directly from embedded JSON. The JSON file must be in the folder of the processed XML file.
Input
Output
Runs a JS function which result will be set as current selection/data Global variables can also be used in scripts via variables.varname
Loads a XLS or CSV file.
Input
Output
Creates a new array populated with the results of calling the provided function on each element of the input array(s). Special pseudo-object syntax can be used to facilitate direct mapping of object properties: annotation => ({value: annotation.value, tier: annotation.tier.id}) {value: .value, tier: .tier.id} {value, tier: .tier.id}
Input
Clears a MongoDB Collection (drops the collection)
Loads objects from a MongoDB Collection
Output
Inserts input objects into a MongoDB Collection
Input
Removes objects from a MongoDB Collection
Input
Randomizes the input selection with a PRNG allowing reproducible results based on an initial seed.
Input
Output
Executes a "reducer" function on each element of the input array(s), in order, passing in the return value from the calculation on the preceding element. The final result of running the reducer across all elements of the array is a single value. (accumulator, currentValue) => accumulator + currentValue, initialValue
Input
Considers each input annotation as a sequence, and selects those from another tier (of the same file) that are included in the sequences.
Input
Output
Replaces each input annotation with one from another tier (of the same annotations' file), the first found whose start time is after input annotation's start time.
Input
Output
Replaces each input annotation with one from another tier that has the same start time.
Input
Output
Remove XML tags from annotations' value
Input
Output
Saves current data to a CSV file. The file will be saved in the processed XML file's folder, overwriting any existing file.
Input
Saves current data to a JSON file. The file will be saved in the processed XML file's folder, overwriting any existing file.
Input
Loads an URL (or a file) and extract data
Input
Output
Transforms input annotations to an array of objects
Input
Output
Sets a global variable which becomes available in HTML blocks as {{varname}}. It can then be used in scripts with variables.varname, and in attributes via the ${} syntax, like ${0.05*variables.counter} If the "value" attribute is not defined, value saved will be the current selection/data
Adds or removes annotation's tag
Input
Output
Changes the tier of each input annotation. By default will clone annotations so it does not affect original ones.
Input
Output
Sorts input array by a given field
Input
Output
Sorts annotations first by their file, and then by the provided field or compare function. When providing a field, first character must indicate sorting order with +/- (ascending/descending)
Input
Output
Sorts annotations by their group, and then inside each group by their relative start time (using eventual file time-offset) If no field is provided, sorting will be done around //TODO
Input
Output
Calculate sum of the annotations' property value. (not working yet)
Input
Transforms an object of structure {A:{a:1,b:2}} into structure {a:{A:1},b:{A:2}}
Input
Computes the type-token of input annotations.
Input
Output
Use processors to analyze or convert raw data such as audio or video, and to manipulate the corpus.
Audio Anonymizer modifies media files by applying audio filters on each input annotation segment. Available modes are - silence: replaces each segment with complete silence (default) - noise: replaces each segment with a configurable noise - beep: replaces each segment with a configurable beep - voice: replaces each segment with a synthetized voice - file: replaces each segment with a custom audio file
Input
Demucs can separate voice and instruments from an audio track
Exports current corpus media files to a folder. The folder will be created in AVAA's temp directory.
Exports current corpus media files together with a copy of the ORIGINAL corpus files edited to reference the exported media files. This produces a standalone corpus folder which can be easily shared because it does not contains absolute paths anymore.
Exports current corpus to EAF format.
Exports a selection of annotations to SRT format
Input
Cuts a segment from each corpus media file. This processor also accepts an array of annotations to cut multiple segments. In this case, the corpus will be reduced to relevant annotation files and each media file will be replaced by its cuts, or by a one merged file from all cuts when "concat" attribute is set to true.
Input
This processor calls ffmpeg's denoise feature. - FFT: Denoises audio with FFT. - NLM: Reduces broadband noise using a Non-Local Means algorithm. - RNN: Reduces noise from speech using Recurrent Neural Networks model. Learn more about the RNN models at https://github.com/GregorR/rnnoise-models
This processor calls ffmpeg with a user defined audio filter Learn more about what filters can do at https://www.ffmpeg.org/ffmpeg-filters.html
This processor calls ffmpeg with a user defined filter-complex. Learn more about what filters can do at https://www.ffmpeg.org/ffmpeg-filters.html
Applies a frei0r filter on each corpus media file.
Hardcodes annotations as subtitles on top of video. This processor will automatically use the values of the annotations that generated the clips, whenever they are available. It is possible to use different annotations from other tiers in range, by adding "source-tier" parameters.
Use Media Converter to convert video and audio files into other formats
Input
This processor executes a R program and integrates the resulting data into the final HTML page. Resulting R output can be graphic files (jpg, png, gif, svg) or tabular text data. Arguments provided to R are in order: - temp directory path to work with and create result files - path to a JSON file consisting of the selection (annotations) or data provided to the processor R scripts must follow a specific input/output syntax to be compatible with AVAA (see "Calling R" in the scripting guide).
Reduces the corpus to specific files. Useful to work on a subset of the corpus without modifying the corpus itself. This processor also accepts a selection of annotations, in which case only corpus files of these annotations will be kept.
Input
Filters media files from the corpus, keeping only files that match a specific critera. The "exclude" attribute can be used to alternatively exclude these files from the corpus. Useful to work on a subset of the corpus media files without modifying the corpus itself.
Takes a selection of annotations and modifies the corpus, using the input annotations as sequences, each sequence being removed from the corpus file, with its associated media segment and all the annotations included in that sequence.
Input
Rename tiers in all or specific corpus files.
Reset the pipeline corpus to its original state Useful when working with loops.
Takes a selection of annotations and recreates the corpus, using the input annotations as sequences, each sequence being transformed into one corpus file with its associated media file and all the annotations/tiers included in that sequence.
Input
Speaker diarization is the process of marking segments of voice with their speaker. This processor takes a selection of annotations, and add to the corpus new annotations associated with their speaker tier.
Input
Output
A speech to text processor using SYSTRAN Faster Whisper to transcribe and automatically create annotations
Output
A speech to text processor using OpenAI Whisper to transcribe and automatically create annotations
Output
A variation of OpenAI Whisper designed to extract audio events of the 527-class AudioSet, Whisper-AT processor outputs general audio events as annotations.
Output
Silero's Voice Activity Detector processor creates annotations for each segment of input audio containing voice.
Output
Anonymize videos with these special effects: - deface: automatically detects and blur faces - cartoon: cartoonize the video - cartoon-blur: cartoonize and blur the video
AVAA Toolkit features an advanced pipeline system easing automation of complex tasks.
A pipeline is created for each section of the document, and initially contains a virtual copy of the corpus and its associated media files.
The corpus and its media files are then modified sequentially by each processor inside the pipeline.
The pipeline can be fed different initial media files, by defining the processor-pipeline-input setting.
The corpus mode is useful to process corpus files directly (audio-anonymization, formats conversion...), while for instance all-assets mode could be used to apply effects only on the exported media of the document intended for sharing with peers.
Processors inside a pipeline (that is for now, a section of the document) are executed one after another, each processor using the results of the previous one to work on.
Complex chains of processors can be built to automate heavy tasks alleviating the burden of manually running each step and verifying its consistency.
Views placed after a processor (in the same section) will inherit its modified media files when exporting clips and snapshots.
This can be helpful extracting annotations from cuts of raw media files, to avoid processing long corpus media file when testing samples ; or preprocessing a media file before it is exported into clips during later views generation.
Processors generating annotations will make these annotations immediately available in the main corpus (and not only for the current pipeline), hence for all subsequent views and processors in the document.
Settings can be modified at any time via the Local Settings block.
It is possible to change the style via CSS. The HTML code generated makes it easy to target specific elements or apply styling rules for the whole page. Each view has its own structure of elements, and a simple "Inspect Element" from browser will reveal selectors.
Styles can be defined directly in the XML file, by using a STYLE tag.
These styles will only apply to this specific HTML document.
<STYLE>
.view-timeline td {
border-color:red;
}
.view-timeline tr.tier-header {
text-align:right;
}
</STYLE>
Styles can be defined in a separate CSS file, that must be placed in the include folder.
All the generated HTML documents will load this file and have these styles in common.
h2 {
color:green;
}
section {
border-left: 2px solid gray;
}
Views generate simple HTML code and try to follow common guidelines so that applying styles is straightforward
Annotations' text labels always have the annotation class, so for instance to change the color of all annotations:
.view .annotation {
color:red;
}
AVAA Toolkit can also generate PDF, though interactive features like videos or dynamic charts won't work in this format, for obvious reasons.
Chrome (or Chromium) must be installed on the system (alternatively on Windows AVAA Toolkit will try to use Edge).
Chrome/Edge executable should be detected automatically, if that fails it is required to provide its path in avaa-config.xml
If Chrome/Edge is not available, it is recommended to install Chrome Headless Shell and then provide its path in avaa-config.xml
AVAA Toolkit is made for the command line and can integrate seamlessly in any tool chain.
Usage:[options] XML files or folders to process Options: --lang Language of the generated document, if translations are available Default: --watch Watch for xml changes and regenerate documents Default: false --combine Combine documents into one final html file Default: false --pdf Also convert HTML to PDF Default: false --zip-all Zip all generated documents together Default: false --zip-each Zip each generated documents separately Default: false --deployer-user Deployer user name Default: --deployer-pass Deployer password Default: --deploy Upload zip to deployer Default: false --deployer-url Specify a custom deployer URL to upload zip to Default: --debug Debug mode Default: false --verbose Display more information when converting Default: true --path Path of application for includes. Default to working directory --path-temp Path for temporary files. Default to ./temp/ --test Run a XML document as a test suite Default: --gendoc Generate all documentation and exit Default: false --dev Reload scripts before building Default: true --cache-af Cache annotations file in memory for faster exec Default: true --server-allowed-origin A custom origin URL allowed to connect to the server Default: --server Websocket server for editor and interactive sessions Default: false --server-port Websocket server port Default: 42042 --server-ssl Use SSL certificate (for Server Mode) Default: false --mongo-host Address of the mongodb server Default: 127.0.0.1 --mongo-port Port of the mongodb server Default: 27017 --mongo-db Name of database to work with Default: avaa --download-remote-corpus Whether to automatically download a referenced remote corpus Default: false --conf Custom config file to load Default: avaa-config.xml
Error: A JNI error has occurred, please check your installation and try again Exception in thread "main" java.lang.UnsupportedClassVersionError: org/avaatoolkit/Main has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to X
Solution: Your version of java runtime is outdated, follow these steps
java.net.BindException: Couldn't bind to any port in the range `42042:42042`. at org.glassfish.grizzly.AbstractBindingHandler.bind(AbstractBindingHandler.java) at org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java) at org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:) at org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java) at org.avaatoolkit.server.Daemon.start(Daemon.java) at org.avaatoolkit.Main.main(Main.java)
Solution: The toolkit is already started with the --server argument, close it before running a new instance.
Solution: Your firewall has a strict policy regarding localhost port bindings, add a rule to allow localhost:42042
On some operating systems, the installed java runtime might not be up-to-date and prevent AVAA Toolkit from executing properly.
To run AVAA Toolkit, at least java 11 is required. To install a valid runtime:
Alternatively using the OpenJDK archives:
Some processors require a full FFmpeg version to work.
When generating really short clips (under 1 sec), it is possible that the clips will consist of only one frozen image.
This is because by default FFmpeg will be instructed to do a copy of the video stream (vcopy), which saves considerable processing time, at the expense of less accurate clipping.
When perfect accuracy is required for clips, it is recommended to force FFmpeg re-encoding, for instance by defining the Setting video-codec = h264
This documentation includes attributions to licensed material such as libraries and software modules.
These notices are written explicitly in each relevant component and for convenience listed again below.
Some modules are not included in AVAA Toolkit but rather installed on demand whenever a component requires it.
Other modules are included or integrated in AVAA Toolkit to provide a better overall user experience.
This behavior is indicated by a little icon preceding the license, as well as a tooltip describing its inclusion method.
Additional libraries are packaged with the produced HTML document, and therefore redistributed by the end user.
jQuery simplifies DOM manipulation, some components use it to initialize content in the browser.
D3 has unparalleled flexibility in building custom and dynamic visualizations.
Charts generated by AVAA Toolkit are actually rendered right in the browser with D3.
GSAP is incredible and we deemed its inclusion valuable for providing a robust interactivity and animation framework for future AVAA Toolkit components.
Tipped features easy to use and customizable tooltips.
AVAA Toolkit views sometimes use these tooltips for instance to show snapshots or videos in a small popup when an annotation is clicked or hovered.
FileSaver.js provides a simple interface to save (as a "download") files created directly in the browser.
We believe AVAA Toolkit components can benefit from the presence of the FileSaver library.
AVAA Toolkit itself is built with Java, and makes use of various libraries (via Maven) which are compiled into the final JAR executable distributed to the toolkit users.
A library for extracting things from streaming sites, AVAA Toolkit includes this library to provide an easy API for downloading PeerTube videos.
A FFmpeg CLI Wrapper for Java, used to execute FFmpeg and read progress feedback.
Rhino is the JavaScript engine used to execute all components' scripts.
The best library for parsing command-line arguments.
This library is used to spawn server sockets, and brings WebSocket sessions (that's how the Editor can interact with AVAA Toolkit).
Jsoup simplifies HTML/XML parsing via a CSS selectors syntax.
An artifact of fully-specified annotations to power static-analysis checks, beginning with nullness analysis.
Jchardet is a Java port of the source from mozilla's automatic charset detection algorithm.
A JNA-based (native) Operating System and Hardware Information library, to get processes details and CPU usage.
Apache Commons is a set of commonly needed features implemented as reusable Java components.
A simple facade abstraction for various logging frameworks.
A reliable, generic, fast and flexible logging framework.
Automate Java boilerplate code via annotations.
OkHttp is an efficient HTTP client.
The MongoDB Synchronous Driver provides an easy API for interacting with a MongoDB Server.
We are currently working on that.