AVAA Toolkit 0.66

The Audio Video Annotations Analysis Toolkit

Introduction
Installation
Editor
Queries
Views
Charts
Operations
Processors
Pipelines
Styling
Making PDF
Command Line
Troubleshooting
Licenses

Introduction

Corpora analysis is a complex task, requiring to learn editors for different file formats and multiple tools, often command-line based, or with programming knowledge prerequisite.

AVAA Toolkit makes it easy to create pipelines connecting ecosystems to process raw data (automated transcriptions, formats conversion..), and query large corpora of annotations coming from various sources to extract advanced statistics and generate beautiful, always up-to-date charts and timelines.

AVAA Toolkit is also a flexible converter ; it takes as input XML files describing the style and operations to generate an HTML document, and takes care of exporting only relevant portions of videos and their thumbnail snapshots, minimizing final document size and potential load times if hosted online.

Annotations Formats

AVAA Toolkit understands the following file formats

AZP Advene
Celluloid Huma-Num Celluloid Platform
CHA CLAN (Computerized Language ANalysis)
EAF ELAN (EUDICO Linguistic Annotator)
MKV Matroska embedded subtitles
OTR oTranscribe
SRT SubRip
TEI Text Encoding Initiative
TEXTGRID Praat

⭳ GPL-3.0 License TEI, CHA and TEXTGRID formats are available thanks to theTEI-CORPO project

Raw Media Formats

AVAA Toolkit can also process the following media types

Audio MP3, AAC, OGG, WAV, OPUS, FLAC
Video MOV, MKV, MP4, AVI, MTS

⭳ Multiple Licenses Most media processing made possible byFFmpeg

Installation

Simply extract the latest release zip

avaa-toolkit.exe Windows launcher to start AVAA Toolkit
avaa-toolkit.bat Batch Windows launcher to start AVAA Toolkit
avaa-toolkit.sh Linux/macOS launcher to start AVAA Toolkit
avaa-toolkit.jar Java executable started by launcher
avaa-config.xml Configuration file
editor Offline version of the editor
include Basic scripts and styles required by the HTML page
scripts Userland generation scripts processed by AVAA Toolkit
assets Folder auto-generated containing snapshots and media cuts
corpus Folder auto-generated containing remote-corpus data
tests Tests and samples
projects Default folder for your projects' XML documents

Windows

To start AVAA Toolkit, simply double-click the launcher file avaa-toolkit.exe

The executable is not signed, so Windows will ask for confirmation before starting it.

In case AVAA Toolkit fails to start, check the troubleshooting section.

Linux / macOS

On those systems AVAA Toolkit can be started by running the shell script avaa-toolkit.sh but it must be set as executable first:

Open a terminal and navigate to AVAA Toolkit's directory with the cd command.
Run the following command to make AVAA Toolkit's launcher executable: sudo chmod +x avaa-toolkit.sh
Start AVAA Toolkit by running the command ./avaa-toolkit.sh or double-click the launcher file.

Note that at least Java 11 is required, you can install latest version using trusted adoptium.net binaries.

Update

When AVAA Toolkit is already installed, follow these steps to update:

Close AVAA Toolkit if it is running
Delete these folders: scripts, editor, tests
Download the latest release zip
From the zip, copy to your installation folder (replace existing) avaa-toolkit.jar, include, scripts, editor, tests
Restart toolkit

Editor

An editor for AVAA Toolkit's XML documents is available in the browser.

To begin, start AVAA Toolkit by running the launcher (avaa-toolkit.exe on Windows or avaa-toolkit.sh on Linux/macOS),
then navigate with your browser to avaa-toolkit.org
If internet is not available, use the provided offline editor in your installation folder (open index.html)

By default, the editor is allowed to create and edit files in the projects folder.

It is possible to add other folders to the editor, just edit avaa-config.xml file.

XML Structure

AVAA Toolkit will process XML files and convert them to HTML.
It expects a document with the following structure

DOCUMENT root tag
- lang language of the document
- title title of the document
- DOMAIN a list of FILE tags referencing corpus files to process
  - path optional, root path to use for FILE paths
  - beginning optional, corpus starting point unix timestamp in milliseconds (since epoch)
  - FILE a file to include in the corpus
    - path actual path to the file
    - group optional, custom group for later use
    - time-offset optional, time-offset in milliseconds from corpus beginning
  - CORPUS another avaa-toolkit .xml corpus file to include in the corpus
    - path actual path to the XML file
- STYLE custom CSS to embed into final HTML document
- TOC the auto-generated table of content
- SECTION a section of the document. a document consists of multiple sections
  - title the title of the section
  - toc if specified, will appear in the ToC, optionally as a different title if not blank
  - class a reference to a CSS class, like in HTML
  - VIEW render annotations with a view
    - type which view to use
    - ... other attributes specific to chosen view
    - SET set custom view parameters
    - SELECT query to select annotations
    - OP operations on selection
  - CHART insert a D3 chart
    - type which D3 chart to use
    - ... other attributes specific to chosen chart
    - SET set chart parameters
    - SELECT query to select annotations
    - OP operations on selection to build dataset required by chart
  - HTML Inserts raw HTML
  - MARKDOWN Render content as markdown

Queries

AVAA Toolkit is all about querying and filtering annotations. Complex queries can be expressed to extract only specific annotations.
This is done via the SELECT tag, various attributes can be combined to make a curated selection of annotations:

SELECT
- value (string) select annotations with a specific value
- value-contains (string) select annotations with a value containing a specific string
- tier (string) select annotations of a tier
- participant (string) select annotations of a participant
- group (string) select annotations belonging to a group
- value-match (regexp) a regular expression to select only annotations whose value matches the pattern
- tier-match (regexp) a regular expression to select annotations whose tier matches the pattern
- group-match (regexp) a regular expression to select annotations whose group matches the pattern
- participant-match (regexp) a regular expression to select annotations whose participant matches the pattern
- file-match (regexp) a regular expression to filter annotations whose filename matches the pattern
- file-tag-match (regexp) a regular expression to filter annotations whose file has a tag matching the pattern
- tag-match (regexp) a regular expression to filter annotations which have a tag matching the pattern
- ref-tier (string) ?
- ref-value (string) ?
- ref-value-match (regexp) ?
- sort (string) [+/-] sort resulting list in ascending/descending order, based on start time
- mode (string) [+/-/*] mode used for completing an existing selection (add,subtract,intersect)
- limit (int) maximum number of annotations to select
- limit-per-file (int) maximum number of annotations to select in one particular file

Attributes of type regexp (*-match) have additional options:

- *-match (regexp) a regular expression to select annotations matching the pattern
- *-insensitive (bool) case insensitive search
- *-literal (bool) literal parsing of pattern (special chars ignored)
- *-unicode (bool) enable unicode

When multiple attributes are used, the selection will consist only of the annotations fulfilling all the constraints.

Views

A view defines how annotations are rendered in the page. Each view has its specific attributes that can alter annotations' display and final HTML output. While all views come with basic default style, it is possible to change any visual aspect via CSS to fit custom needs. Because everyone will have different visual requirements, styling is entirely left to the document authors.

concordancer

The concordancer view displays a table of annotations and their cotext annotations. Attributes allow timerange and count limits, to extract only meaningful relative annotations. Configurable display to see each cotext annotation in its own column, or all combined into one clip.

show-value (bool = true) Whether to display the annotation's text value
show-video (bool = false) Whether to display the video clip of the processed annotation
show-tier (bool = false) Whether to display the annotation's tier
show-group (bool = false) Whether to display the annotation's group
show-file (bool = false) Whether to display the annotation's file
show-index (bool = false) Whether to display the index of each row
video-with-cotext (bool = true) Whether to include cotext annotations in the video
cotext (int = 1) Number of annotations to extract before and after the processed annotation
cotext-before (int = cotext) Number of annotations to extract before the processed annotation
cotext-after (int = cotext) Number of annotations to extract after the processed annotation
cotext-video (bool = false) Whether to display a video for cotext annotations
cotext-split (bool = false) Whether to display each cotext annotation in its own column
max-time-diff (int = 60) Skip cotext annotations that are out of this range
label-before (string = Before) Label for table header
label-pivot (string = Pivot) Label for table header
label-after (string = After) Label for table header
label-index (string = ) Label for table header
label-tier (string = Tier) Label for table header
label-group (string = Group) Label for table header
label-file (string = File) Label for table header
label-none (string = ) Label for no matches

Input

Array<Annotation>

controlled-vocabulary

Displays the controlled vocabulary associated to each tier in the input selection

Input

Array<Annotation>

density

The density view plots annotations as filled bars in a timeline, revealing interactions frequency and duration between tiers.

group (bool = false) Whether to separate annotations based on their group instead of their file
sequence (string) Separate annotations based on another tier's annotations (used as sequences)
time-offset (bool = false) Whether to use files' time-offset
time-collapse (number = 0) Number of seconds after which empty space is collapsed (0 = no collpase)
zoom-factor (int = 100) Milliseconds to pixel factor
show-alt-tiers (bool = false) Whether to display alternative tiers in the main table
show-alt-tier-in-empty-cells (bool = false) Whether to display the alternative tier if the cell is empty
show-value-on-hover (bool = false) Whether to display the annotation value on mouse over
show-snap-on-hover (bool = false) Whether to display the video snapshot on mouse over
show-video-on-click (bool = false) Whether to display the video player on click
video-player-width (string = 300)
video-player-height (string)

SET tier="id" name="" parent="" color="" alt-tier="" Customize tier display
SET group="id" name="Custom Name" Set a custom header name for each group
SET file="id" name="Custom Name" Set a custom header name for each file
SET alt-tier="id" label="" css="" Customize alternative tier display and legend

Input

Array<Annotation>

density-timeline 🧪

The density timeline plots annotations as filled bars horizontally, revealing interactions frequency and duration between tiers. Flexible time collapsing options allow compacting empty space between annotations. Currently not compatible with alt-tiers.

group (bool = false) Whether to separate annotations based on their group instead of their file
sequence (tier) Separate annotations based on another tier's annotations (used as sequences)
relate-to-corpus (bool = false) Whether to show the whole corpus regardless of input selection
time-offset (bool = false) Whether to use files' time-offset
time-between (bool = false) Whether to show the time between separators
time-collapse-threshold (number = 0) Number of seconds after which empty space is collapsed (0 = no collpase)
time-collapse-display (string = show-duration) How the collapsed zone should look(show-duration, short, compact, dense, minimum, none)
time-collapse-alt-display (string = compact) How the collapsed zone should look if it collapsed less than alt-display-threshold(show-duration, short, compact, dense, minimum, none)
time-collapse-alt-display-threshold (number = 0) Number of seconds collapsed before which alt-display is used
time-collapse-hide-first-start (bool = false) Whether to hide the first (starting) collapse, if any
time-collapse-hide-each-start (bool = false) Whether to hide the first (starting) collapse of each group/column, if any
time-collapse-hide-each-last (bool = false) Whether to hide the last collapse of each group/column, if any
time-collapse-hide-borders (bool = false) Whether to hide the left and right borders of collapsed zones
time-collapse-absolute-start (bool = false) Whether to show the duration of the first collapse of each group, as an absolute duration (relative to corpus beginning when time-offset is active)
time-collapse-debug (bool = false) Whether to show the debug info of collapses
zoom-factor (int = 100) Milliseconds to pixel factor (how many milliseconds one pixel will represent)
color-values (bool = false) Whether to use CSS for specific values, colors defined in Style or as parameters
color-values-opacity (bool = false) Whether to toggle the opacity of the colored values
show-alt-tiers-in-table (bool = false) Whether to display alternative tiers in the main table
show-alt-tier-in-empty-cells (bool = false) Whether to display the alternative tier if the cell is empty
show-value-on-hover (bool = false) Whether to display the annotation value on mouse over
show-snap-on-hover (bool = false) Whether to display the media snapshot on mouse over
show-media-on-click (bool = false) Whether to display the media player on click
video-player-width (string = 300)
video-player-height (string)

SET tier="id" name="" parent="" color="" alt-tier="" Customize tier display
SET group="id" name="Custom Name" Set a custom header name for each group
SET file="id" name="Custom Name" Set a custom header name for each file
SET alt-tier="id" label="" css="" Customize alternative tier display and legend
SET value="value" label="" color="" css="" Customize specific value display and legend

Input

Array<Annotation>

form

The form view renders HTML input forms, allowing easy online sharing for collecting external data. Form results are simple JSON files which can then be imported back as virtual tiers for further analysis.

show-header (bool = false) Whether to display table header
show-index (bool = true) Whether to display index column
show-value (bool = true) Whether to display annotation's text value
show-video (bool = true) Whether to display annotation's video clip
show-tier (bool = true) Whether to display tier's name
show-file (bool = false) Whether to display annotation file name
cotext (int = 0) Number of cotext annotations to show before and after (0 = no cotext)
cotext-range (number = 0) Max number of seconds to consider a neighbor annotation as a valid cotext annotation (0 = no limit)
cotext-media (bool = false) Whether to extend the clip to include the cotext annotations (make sure to use a reasonable cotext-range to avoid super long clips)
extend-duration (number = 0) Number of seconds to add before and after when cutting the clip
label-save (string = SAVE) Save button label
name (string) a name to identify this form, will be used as filename and accept field parameters interpolation

SET dimension="name" label="label" cv="controlled vocabulary id"
SET dimension="name" label="label" choices="list,of,choices"
SET dimension="name" label="label" type="text"
SET field="name" label="label" type="text"

Input

Array<Annotation>

hidden

A special view with no output, for instance to do live queries.

intercoding

The intercoding view makes it easy to process JSON files resulting from forms, and display them in a meaningful way for intercoding validation and statistics.

reference (tier) The tier used as reference, if any.
show-video (bool = true) Whether to display the annotation's video clip
show-index (bool = true) Whether to display the index
show-start (bool = false) Whether to display the start timecode
values-replacement (js) A JSON object used to replace a given value with another one, like {"a1":"A", "a2":"A"}
weight-matrix (bool = false) Whether to use the weight matrix
weight-matrix-values (matrix) A custom matrix defining the weight of each values' pair
weight-matrix-func (js) A custom JS function that should return the weight for a given pair provided as arguments, like (a,b) => a==b ? 1 : 0
save-weight-matrix-csv (string) Name of a CSV file to export the weight matrix
save-coincidence-matrix-csv (string) Name of a CSV file to export the coincidence matrix
save-krippendorff-alpha-json (string) Name of a JSON file to export Krippendorff's alpha, as {"alpha":X}
save-krippendorff-alpha-variable (string) Name of a variable to save Krippendorff's alpha to

SET column="tier id" name="" Set custom column order and header names
SET exclude-tag="tag" Exclude from the intercoding calculations all annotations that have this tag
SET name-of-tier="id" name="Custom Name" Set a custom header name for each tier
SET name-of-group="id" name="Custom Name" Set a custom header name for each group
SET name-of-file="id" name="Custom Name" Set a custom header name for each file
SET stat="score" name="Score (Points)" Add a stat column (score alone)
SET stat="score%" name="Score (Percentage)" Add a stat column (score percentage)
SET stat="score/" name="Score (Points/Total)" Add a stat column (score on total)
SET stat="count" name="Similarity percentage count" Add a stat column
SET stat="weight-matrix" name="Weight Matrix" Show the weight matrix
SET stat="coincidence-matrix" name="Coincidence Matrix" Show the coincidence matrix
SET stat="krippendorff-alpha" name="Krippendorff's Alpha" Show Krippendorff's Alpha

Input

Array<Annotation>

json

Displays JSON serializable data as an interactive tree.

Input

Object
Array

list

The list view simply renders annotations one after another. It is possible to specify a custom class for switching display to grid mode.

show-value (bool = true) Whether to display annotation's text value
show-media (bool = true) Whether to display annotation's media clip
show-snap (bool = false) Whether to display annotation's snapshot
show-tier (bool = true) Whether to display tier's name
class (string = list) Alternative display style(list, grid)
file-header (html) Raw html to insert as a header before each file
file-separator (html) Raw html to insert as a separator between each file
file-footer (html) Raw html to insert as a footer after each file

SET file="" name="" Set a custom name for each file

Input

Array<Annotation>

table

The table view can display various annotations' (or objects) properties into table columns. It also supports the Extra protocol allowing user-defined properties to be displayed as columns.

header-name-replace (js) A custom function to replace the columns header name

SET column="index" name="Index"
SET column="tier" name="Tier"
SET column="value" name="Annotation Value"
SET column="video" name="Video"
SET column="snap" name="Snapshot"
SET column="start" name="Start timecode"
SET column="stop" name="Stop timecode"
SET column="group" name="Group"
SET column="participant" name="Participant"
SET column="file" name="File"
SET column="extra" extra-id="" name="" as="[value,video]"
SET column="property" property="" name="" as="[text,link]"

Input

Array<Annotation>
Array<Object>

testcase

A special view to display testcase results.

timeline

The timeline view displays annotations vertically with time markers, and using one column per tier. Ideal for dialogs between 2+ participants.

collapse (int = 60) Max number of seconds between 2 annotations beyond which a separator indicating collapsed duration is displayed
show-value (bool = true) Whether to display the annotation's text value
show-video (bool = true) Whether to display the annotation's video clip
show-timestamp (bool = true) Whether to display the timestamp
group (bool = false) Whether to separate annotations based on their group instead of their file
sequence (string) Separate annotations based on another tier's annotations (used as sequences)

SET tier="id" name="Custom Name" Set a custom header name for each tier
SET group="id" name="Custom Name" Set a custom header name for each group
SET file="id" name="Custom Name" Set a custom header name for each file
SET column="tier id" Set a custom order for each tier

Input

Array<Annotation>

transcript

Display annotations as a simple transcript, can also export transcript to CSV files.

style (select = monospace) The layout to use(monospace, fluid)
separator (string = : ) The separator to use between the tier name and the text
max-line-width (int = 0) Maximum number of characters in one line of text (0 = disabled)
min-tier-width (int = 0) Minimum width (in characters) of the tier column
export-to-csv (bool = false) Whether to also export a CSV file of each transcript

Input

Array<Annotation>

wordcloud

The wordcloud helps visualize frequency of words and has many customisation attributes. Wordcloud can slow down PDF generation time and can take some time to show-up.

length-threshold (int = 3) Exclude words with less than length-threshold letters
frequency-threshold (int = 0) Exclude words with a lower frequency
size-factor (int = 0) The size factor to draw words (0 = auto)
min-size (int = 0) (0 to disable)
weight-factor (number = 1)
font-family (string)
font-weight (string = normal)
color (string = random-dark)
background-color (string = #fff)
draw-out-of-bound (bool = false)
shrink-to-fit (bool = false)
origin (array) origin of the “cloud” in [x, y]
min-rotation (number = -Math.PI / 2)
max-rotation (number = Math.PI / 2)
rotation-steps (number = 0)
rotate-ratio (number = 0.1)
shape (string = circle) (circle, cardioid, diamond, square, triangle-forward, triangle, pentagon, star)
ellipticity (number = 0.65)
shuffle (bool = true)
list-words (int = 0) How many words to show in a table, 0 to disable

Input

Array<Annotation>
Array<Array> an array of ["word", weight] pairs

🗀 MIT LicenseWordcloud2 by Timothy Guan-tin Chien and contributorsgithub.com/timdream/wordcloud2.js

Charts

Use charts to visualize data through meaningful representations. Powered by D3.js and Observable

🗀 ISC LicenseD3 by Mike Bostockgithub.com/d3/d3

donut

name (string = ([x]) => x) given d in data, returns the (ordinal) label
value (string = ([, y]) => y) given d in data, returns the (quantitative) value
title (string) given d in data, returns the title text
width (string = 640) outer width, in pixels
height (string = 400) outer height, in pixels
innerRadius (string = 0) inner radius of pie, in pixels (non-zero for donut)
outerRadius (string = Math.min(width, height) / 2) outer radius of pie, in pixels
labelRadius (string = (innerRadius * 0.2 + outerRadius * 0.8)) center radius of labels
format (string = ',') a format specifier for values (in the label)
names (string) array of names (the domain of the color scale)
colors (string) array of colors for names
stroke (string = innerRadius > 0 ? 'none' : 'white') stroke separating widths
strokeWidth (string = 1) width of stroke separating wedges
strokeLinejoin (string = 'round') line join of stroke separating wedges
padAngle (string = stroke === 'none' ? 1 / outerRadius : 0) angular separation between wedges

🗀 ISC LicenseAdapted fromobservablehq.com/@d3/donut-chart

grouped-bar

x (string = (d, i) => i) given d in data, returns the (ordinal) x-value
y (string = d => d) given d in data, returns the (quantitative) y-value
z (string = () => 1) given d in data, returns the (categorical) z-value
title (string) given d in data, returns the title text
marginTop (string = 30) top margin, in pixels
marginRight (string = 0) right margin, in pixels
marginBottom (string = 30) bottom margin, in pixels
marginLeft (string = 40) left margin, in pixels
width (string = 640) outer width, in pixels
height (string = 400) outer height, in pixels
xDomain (string = d3.groupSort(data, g => d3.median(g, ${x}), ${x})) the x domain [xmin, xmax]
xRange (string = [marginLeft, width - marginRight])
xPadding (string = 0.1) amount of x-range to reserve to separate groups
yType (string = d3.scaleLinear) type of y-scale
yDomain (string)
yRange (string = [height - marginBottom, marginTop])
zDomain (string = d3.groupSort(data, g => -d3.median(g, ${y}), ${z})) the z domain
zPadding (string = 0.05) amount of x-range to reserve to separate bars
yFormat (string) a format specifier string for the y-axis
yLabel (string) a label for the y-axis
filterZ (string = false) other way
separatorWidth (string = 0)
separatorOffset (string = 0)
separatorColor (string = 'gainsboro')
colors (string) array of colors

🗀 ISC LicenseAdapted fromobservablehq.com/@d3/grouped-bar-chart

horizontal-bar

x (string = d => d) given d in data, returns the (quantitative) x-value
y (string = (d, i) => i) given d in data, returns the (ordinal) y-value
title (string) given d in data, returns the title text
marginTop (string = 30) the top margin, in pixels
marginRight (string = 0) the right margin, in pixels
marginBottom (string = 10) the bottom margin, in pixels
marginLeft (string = 80) the left margin, in pixels
width (string = 640) the outer width of the chart, in pixels
height (string) outer height, in pixels
xType (string = d3.scaleLinear) type of x-scale
xDomain (string)
xRange (string = [marginLeft, width - marginRight])
xFormat (string) a format specifier string for the x-axis
xLabel (string) a label for the x-axis
yPadding (string = 0.1) amount of y-range to reserve to separate bars
yDomain (string) an array of (ordinal) y-values
yRange (string)
colors (string) array of colors
titleColor (string = "white") title fill color when atop bar
titleAltColor (string = "currentColor") title fill color when atop background

🗀 ISC LicenseAdapted fromobservablehq.com/@d3/horizontal-bar-chart

inline

x (string = ([x]) => x) given d in data, returns the (temporal) x-value
y (string = ([, y]) => y) given d in data, returns the (quantitative) y-value
z (string = () => 1) given d in data, returns the (categorical) z-value
defined (string) for gaps in data
curve (string = d3.curveLinear) method of interpolation between points
marginTop (string = 30) top margin, in pixels
marginRight (string = 50) right margin, in pixels
marginBottom (string = 30) bottom margin, in pixels
marginLeft (string = 30) left margin, in pixels
width (string = 640) outer width, in pixels
height (string = 400) outer height, in pixels
xType (string = d3.scaleOrdinal) type of x-scale
xDomain (string = d3.groupSort(data, g => d3.median(g, ${x}), ${x})) the x domain [xmin, xmax]
xRange (string)
yType (string = d3.scaleLinear) type of y-scale
yDomain (string)
yRange (string = [height - marginBottom, marginTop])
zDomain (string) array of z-values
yFormat (string) a format specifier string for the labels
colors (string = d3.schemeCategory10) stroke color of line
strokeLinecap (string = 'round') stroke line cap of the line
strokeLinejoin (string = 'round') stroke line join of the line
strokeWidth (string = 2) stroke width of line, in pixels
strokeOpacity (string = 1) stroke opacity of line
halo (string = '#fff') color of label halo
haloWidth (string = 6) padding around the labels

🗀 ISC LicenseAdapted fromobservablehq.com/@d3/inline-labels

pie

name (string = ([x]) => x) given d in data, returns the (ordinal) label
value (string = ([, y]) => y) given d in data, returns the (quantitative) value
title (string) given d in data, returns the title text
width (string = 640) outer width, in pixels
height (string = 400) outer height, in pixels
innerRadius (string = 0) inner radius of pie, in pixels (non-zero for donut)
outerRadius (string = Math.min(width, height) / 2) outer radius of pie, in pixels
labelRadius (string = (innerRadius * 0.2 + outerRadius * 0.8)) center radius of labels
format (string = ',') a format specifier for values (in the label)
names (string) array of names (the domain of the color scale)
colors (string) array of colors for names
stroke (string = innerRadius > 0 ? 'none' : 'white') stroke separating widths
strokeWidth (string = 1) width of stroke separating wedges
strokeLinejoin (string = 'round') line join of stroke separating wedges
padAngle (string = stroke === 'none' ? 1 / outerRadius : 0) angular separation between wedges

🗀 ISC LicenseAdapted fromobservablehq.com/@d3/pie-chart

Operations

An operation takes input data and transforms it. Operations can also modify or filter the current selection of annotations.

calculate-duration-by

Calculates sum/average of annotations' duration (in milliseconds) by grouping on a property (value/tier/group/participant). If the property is not specified, input object will be directly used as groups. If the property is the asterisk (*) all input annotations will be grouped in a single asterisk ("*") group.

mode (select = sum) (sum, average, min, max, all)
factor (int = 1) Convert milliseconds to other units
truncate (int = -1) Truncate factored value to X decimal places

Input

Array<Annotation>
Object<Array<Annotation>>

calculate-duration-of-pause

Calculates sum/average of annotations' pause duration (in milliseconds).

mode (string = sum) (sum, average, median, min, max, all)
factor (int = 1) Convert milliseconds to other units
truncate (int = -1) Truncate factored value to X decimal places
min-threshold (int = 0) Min pause in milliseconds between 2 consecutive annotations (pauses shorter than this will not be counted)
max-threshold (int = 500) Max pause in milliseconds between 2 consecutive annotations (pauses longer than this will not be counted)

Input

Array<Annotation>
Object<Array<Annotation>>

calculate-percent-by

Calculates percentage of an annotations' property value occurrence.

truncate (int = 0) Truncate factored value to X decimal places

Input

Object<String,Array<Annotation>>

clone

Clones each annotation in the selection, so they can be modified without affecting the originals.

Input

Array<Annotation>

Output

Array<Annotation>

combine-overlapping-annotations

Combine into one annotation all the overlapping annotations

Input

Array<Annotation>

Output

Array<Annotation>

combine-same-tier-consecutive-annotations

Combine into one annotation all the consecutive annotations of a same tier

value-separator (string = , ) The separator to use when combining annotations' values missing punctuation
sentence-separator (string) A separator to insert between sentences

Input

Array<Annotation>

Output

Array<Annotation>

control-vocabulary

Checks each annotation value against a specified controlled vocabulary. If the value does not match a controlled vocabulary item, a warning will be issued. The operation can also fix common mistakes, in which case the original input annotations will be modified (use the "clone" operation beforehand to prevent original annotations modification).

vocabulary (string) The Controlled Vocabulary ID to use for checking the value of the annotations
fix-whitespaces (bool = false) Whether to correct when mistake is surrounding whitespaces
fix-repeats (bool = false) Whether to correct when mistake is a repeated value
return (select = input) The operation by default returns the original (eventually corrected) input, but can also return the list of valid (or invalid) annotations(input, valid, invalid, corrected)

Input

Array<Annotation>

Output

Array<Annotation>

convert-to-slices

Takes each tier and slices its annotations to segments of a specific duration, aligning all the segments between tiers. The initial alignment starts with the earliest annotation found.

slice-duration (number = 0) The duration of each slice annotation, in seconds
slice-count (int = 0) Alternative to slice-duration, the total amount of slices to create
overlap-threshold (number = 0) todo

Input

Array<Annotation>

Output

Array<Annotation>

count

Counts the elements in each array element of the input array/object, replacing the array element itself with an integer value

depth (int = 0) How deep to go in the input tree before counting
remove-uncountable (bool = false) Whether to remove the properties that are not countable
remove-uncountable-elements (bool = false) Whether to remove the uncountable elements from array, resulting in a shorter array
count-strings (bool = false) Whether to consider strings as arrays for counting

Input

Array<Array>
Object<Array>

Output

Array<Integer>
Object<Integer>

count-by

Counts by a specific property

Input

Array<Annotation|Object>
Object<String,Array<Annotation|Object>>

count-object-keys

Count keys across all input objects

Input

Array<Object>

Output

Object

detect-sequences

Detects sequences by grouping annotations that are close to each other.

range (number = 30) Minimum number of seconds between two annotations to consider as a new sequence
to-tier (string = sequences) The created tier wich will contain the sequences annotations
output (choices = array) The output type, which can be an Array of sequences' annotations, or an Object mapping a sequence to its list of annotations.(array, object)

Input

Array<Annotation>

Output

Array<Annotation>
Object<String,Array<Annotation>>

each

Iterates over the array(s) values, and executes the provided function

Input

Array
Object<Array>

each-file

Iterates over each file, and executes the provided function

Input

Output

extend-duration

Takes each input annotation and extends its duration by changing their start and/or stop times. This will by default clone all input annotations so the originals stay unaffected.

before (number) Seconds to add before start (default to arg)
after (number) Seconds to add after stop (default to arg)
clone (bool = true) Whether to clone the annotation before changing its duration

Input

Array<Annotation>

Output

Array<Annotation>

filter

Filters the array(s) keeping only items passing the filter expression

Input

Array
Object<String,Array>

filter-by-value

Filters the array(s) keeping only annotations whose value matches the provided regexp

Input

Array<Annotation>
Object<String,Array<Annotation>>

flatten

Transforms nested objects into a flat array of objects. Syntax: {name,value} Takes all keys of the input object and map them to an array of objects, each object built upon the provided structure reflecting the key name and its associated value.

Input

Object

Output

Array<Object>

group-by

Groups the array elements by the value of a specific element's property. The result is an object whose keys are the property values, mapped to arrays of elements.

Input

Array
Object<Array>

Output

Object<Array>
Object<Object<Array>>

group-by-file-tags

Groups the annotations by specific file tags (comma separated or JSON array of strings). Corpus files can have multiple tags, therefore an annotation could appear in multiple groups. The result is an object whose keys are the chosen tags, mapped to arrays of annotations.

Input

Array<Annotation>

Output

Object<String,Array>

group-by-ref-value

improve-transcript 🧪

Improves transcribed content with various heuristic techniques.

hallucination-char-factor (int = 1) todo
hallucination-time-diff (int = 1000) Maximum pause time between 2 hallucinated annotations (in milliseconds)
merge-comma-end (bool = true) Merge 2 annotations if the first one ends with a comma
merge-lowercase-start (bool = true) Merge 2 annotations if the second one starts with a lower-case character

Input

Array<Annotation>

Output

Array<Annotation>

load-annotations-from-forms

Loads JSON files resulting from forms, as virtual annotations. The JSON files must be in the folder of the processed XML file.

dimension (string) The dimension to extract from the form
exclude-from-intercoding (bool = false) Whether to tag the extracted annotations so they are excluded from intercoding calculations
extract-original-annotations (bool = false) Whether to extract only the original annotations from the form, so the results are safe from corpus changes. When this attribute is specified, no dimension will be extracted: use again the operation to extract a dimension.
match-to-original-annotations (bool = false) Try to correct loaded annotations so they all have a match with the original ones

Input

None

Output

Array<Annotation>

load-data-from-json

Loads a JSON file or set data directly from embedded JSON. The JSON file must be in the folder of the processed XML file.

json (js) json data
file (string) file name

Input

None

Output

Object
Array

load-data-from-script

Runs a JS function which result will be set as current selection/data Global variables can also be used in scripts via variables.varname

script (js) a function called with current selection/data, which should return data

load-data-from-variables

Use variables to build an object

object-delimiter (string) A delimiter used to cut a variable name into an object structure

Output

Object
Array

load-data-from-xls

Loads a XLS or CSV file.

file (file) file name

Input

None

Output

Object
Array

map

Creates a new array populated with the results of calling the provided function on each element of the input array(s). Special pseudo-object syntax can be used to facilitate direct mapping of object properties: annotation => ({value: annotation.value, tier: annotation.tier.id}) {value: .value, tier: .tier.id} {value, tier: .tier.id}

Input

Array
Object<String,Array>

mongo-clear

Clears a MongoDB Collection (drops the collection)

mongo-create-ref

Creates references between documents across collections

source-fields (string) Name of the document fields to transform into a reference (comma separated)
target-field () The field that will hold the reference(s) (an array if multiple source fields are specified)
reference-collection (string) The collection containing the (eventually created) references
reference-field (string) The reference field to compare values
trim (bool = false) Whether to trim the values before comparing them

Output

Array<Object>

mongo-find

Loads objects from a MongoDB Collection

query (js) The mongo query, a plain JS object like {property:"value"}
projection (js) The query projection, a plain array like ['prop1','prop2'] listing the properties to return
limit (int = 1000) The maximum number of documents to return

Output

Array<Object>

mongo-insert

Inserts input objects into a MongoDB Collection

Input

Array<Object>
Object

mongo-remove

Removes objects from a MongoDB Collection

Input

Array<Object>
Object

randomize

Randomizes the input selection with a PRNG allowing reproducible results based on an initial seed.

limit (int = 0) Maximum number of elements for the output array
limit-per-file (0) Maximum number of annotations to take from one particular annotation file (if the input array contains annotations)
seed (int = 1) The initial seed for reproducible randomness
prng (string = LCG) The algorithm for random number generation

Input

Array

Output

Array

reduce

Executes a "reducer" function on each element of the input array(s), in order, passing in the return value from the calculation on the preceding element. The final result of running the reducer across all elements of the array is a single value. (accumulator, currentValue) => accumulator + currentValue, initialValue

Input

Array
Object<String,Array>

replace-with-annotations-in-sequence

Considers each input annotation as a sequence, and selects all annotations (of the same file) that are included in the sequences.

overlap (bool = true) Whether to include overlaping annotations (not fully contained in the sequence)
range (number = 0) Number of seconds to add before and after the sequence, for considering annotations in sequence
range-before (number = 0) Additional seconds to add before
range-after (number = 0) Additional seconds to add after
distinct (bool = false) Whether to remove duplicate annotations from the resulting list
combine (bool = false) Whether to combine in one final annotation, all the annotations found in the sequence
limit (int = 0) Max number of annotations to return (0 = all annotations found in sequence, 1 = first annotation found)
reverse (bool = false) Whether to reverse the list of annotations found in sequence before applying the limit. This can be used to return the last annotation found in sequence (via limit=1).
separator (string = | ) A separator to insert when combining values of multiple annotations
default-to-null (bool = false) Whether to add a null element in the output array if no corresponding annotation is found for a given input annotation (default to false, or true if the operation runs in an EXTRA block)

Input

Array<Annotation>

Output

Array<Annotation>

replace-with-annotations-in-sequence-from-tier

Considers each input annotation as a sequence, and selects those from another tier (of the same file) that are included in the sequences.

overlap (bool = true) Whether to include overlaping annotations (not fully contained in the sequence)
range (number = 0) Number of seconds to add before and after the sequence, for considering annotations in sequence
range-before (number = 0) Additional seconds to add before
range-after (number = 0) Additional seconds to add after
distinct (bool = false) Whether to remove duplicate annotations from the resulting list
combine (bool = false) Whether to combine in one final annotation, all the annotations found in the sequence
limit (int = 0) Max number of annotations to return (0 = all annotations found in sequence, 1 = first annotation found)
reverse (bool = false) Whether to reverse the list of annotations found in sequence before applying the limit. This can be used to return the last annotation found in sequence (via limit=1).
separator (string = | ) A separator to insert when combining values of multiple annotations
default-to-null (bool = false) Whether to add a null element in the output array if no corresponding annotation is found for a given input annotation (default to false, or true if the operation runs in an EXTRA block)

Input

Array<Annotation>

Output

Array<Annotation>

replace-with-next-timecode-annotations-from-tier

Replaces each input annotation with one from another tier (of the same annotations' file), the first found whose start time is after input annotation's start time.

range (int = 0) Maximum time range to find next annotation (0 = no maximum)

Input

Array<Annotation>

Output

Array<Annotation>

replace-with-same-timecode-annotations-from-tier

Replaces each input annotation with one from another tier that has the same start time.

multiple (bool = false) Whether to select multiple annotations if more than one is in range
range (number = 0) Acceptable range in seconds to consider 2 timecodes as equivalent
default-to-null (bool) Whether to add a null element in the output array if no corresponding annotation is found for a given input annotation (default to false, or true if the operation runs in an EXTRA block)

Input

Array<Annotation>

Output

Array<Annotation>

sanitize-strip-xml-tags

Remove XML tags from annotations' value

Input

Array<Annotation>

Output

Array<Annotation>

save-data-to-csv

Saves current data to a CSV file. The file will be saved in the processed XML file's folder, overwriting any existing file.

Input

Object
Array

save-data-to-json

Saves current data to a JSON file. The file will be saved in the processed XML file's folder, overwriting any existing file.

Input

Object
Array

save-variable

Sets a global variable which becomes available in HTML blocks as {{varname}}. It can then be used in scripts with variables.varname, and in attributes via the ${} syntax, like ${0.05*variables.counter} If the "value" attribute is not defined, value saved will be the current selection/data

value (js) a function called with current selection/data, which should return the value of the variable

scrape

Loads an URL (or a file) and extract data

url (string) URL to fetch for scraping
file (file) a file path to load content instead of an URL
jsoup (js) a function to extract data called with a Jsoup document as argument (to scrape HTML)
js (js) a function to extract data called with the content as a string argument (to scrape JSON, text files...)

Input

Output

selection-to-data

Transforms input annotations to an array of objects

export-value (bool = true)
export-value-as (string = value)
export-tier (bool = true)
export-tier-as (string = tier)
export-start (bool = true)
export-start-as (string = start)
export-stop (bool = true)
export-stop-as (string = stop)

Input

Array<Annotation>

Output

Array<Object>

set-tag

Adds or removes annotation's tag

Input

Array<Annotation>

Output

Array<Annotation>

set-tier

Changes the tier of each input annotation. By default will clone annotations so it does not affect original ones.

clone (bool = true)
template (js) A function called for each annotation, that must return its new tier name, like (annotation,tier)=>`${tier.id} 2`

Input

Array<Annotation>

Output

Array<Annotation>

sort

Sorts input array by a given field

natural (bool = false) Whether to sort based only on the digits contained in the field, and not string comparison
func (js) A custom comparison function like (a,b)=>(a>b?-1:1)

Input

Array

Output

Array

sort-by-file

Sorts annotations first by their file, and then by the provided field or compare function. When providing a field, first character must indicate sorting order with +/- (ascending/descending)

order (string) (+) (ascending/descending) Sorting order of the AF groups(+/-)
field (string) A field to sort on, like "+start"

Input

Array<Annotation>

Output

Array<Annotation>

sort-by-group

Sorts annotations by their group, and then inside each group by their relative start time (using eventual file time-offset) If no field is provided, sorting will be done around //TODO

order (string) (+) Sorting order of groups (ascending/descending)(+, -)
natural (bool = false) Natural integer sorting instead of string
field (string) A field to sort on, like "+start"

Input

Array<Annotation>

Output

Array<Annotation>

sum-by

Calculate sum of the annotations' property value. (not working yet)

Input

Object<String,Array<Annotation>>

swap-nested-objects

Transforms an object of structure {A:{a:1,b:2}} into structure {a:{A:1},b:{A:2}}

Input

Object<Object>

type-token

Computes the type-token of input annotations.

group-by-attribute (string) An optional attribute name to be used for grouping type-tokens together
strip-punctuation (bool = true) Whether to remove all punctuation before type-token processing
case-sensitive (bool = false) Whether to consider capital letters in words comparison
replace-regex (string) A regular expression to replace text before type-token processing
replace-func (string) A js function to replace text before type-token processing
split-func (string) A js function to use instead of the basic space splitting, for extracting words from strings

Input

Array<Annotation>
Object<Array<Annotation>>
Object<Array<String>>
Object<String>

Output

Object

Processors

Use processors to analyze or convert raw data such as audio or video, and to manipulate the corpus.

audio-anonymizer

Audio Anonymizer modifies media files by applying audio filters on each input annotation segment. Available modes are - silence: replaces each segment with complete silence (default) - noise: replaces each segment with a configurable noise - beep: replaces each segment with a configurable beep - voice: replaces each segment with a synthetized voice - file: replaces each segment with a custom audio file

mode (string = silence) The mode of anonymization, an audio transformation that will be applied to each annotation segment.(silence, noise, beep, voice, file)
mute (bool = true) Whether to silence the segment anyway before mixing the noise/beep/voice/file
beep-frequency (int = 800) The frequency to use for sine beep sound
noise-amplitude (number = 1) Amplitude of the generated noise(0 1)
noise-color (string = white) Noise color(white, pink, brown, blue, violet, velvet)
noise-seed (int) Seed value for noise PRNG
noise-weight (number = 1) Mixing weight of the noise sound
voice-text (string) Text to synthetize and use for anonymization. If not provided, the annotation's value will be used.
voice-amplitude (int = 100) How loud the voice will be.
voice-pitch (int = 50) The voice pitch
voice-speed (int = 175) The speed at which to talk (words per minute)
voice-wordgap (int = 0) Additional gap between words in 10 ms units
file (string) Path to an audio file to use for anonymization, when mode is set to "file"

SET annotation-value="X" file="X.mp3" Use a custom audio file for annotations with a specific value
SET annotation-file="X" output-file="Y.mp4" Map custom output filename for a given annotation file

Input

Array<Annotation>

🗀 GPL-3.0 LicenseSpeech Synthesizergithub.com/kripken/speak.js

🗀 GNU GPL LicenseOriginal eSpeak library ported by speak.js

demucs-separation

Demucs can separate voice and instruments from an audio track

source (string = vocals) The source type to separate from the rest(vocals, drums)

⭳ MIT LicenseHybrid Spectrogram and Waveform Source Separationgithub.com/facebookresearch/demucs

divide-tiers

Move annotations with specific values to specific tiers.

tier-match (regexp) Restrict tier(s) to divide
file-match (regexp) Restrict file(s) on which to divide tiers
move-to-tier (js) A function to map an annotation to a specific tier, like (annotation)=>annotation.value.includes('eye')?'eyes':undefined // when undefined is returned, the annotation won't change tier.
create-tier (js) An optional function called for each tier that should return a string which will be used to create equivalent empty tiers, like (tier)=>tier.id.replace('eye','nose')

export-corpus-media

Exports current corpus media files to a folder. The folder will be created in AVAA's temp directory.

folder (string = exported-corpus-media) Name of the folder to create and populate with the corpus media files
overwrite (bool = false) Whether to overwrite existing files in the folder, otherwise another folder will be created
copy-temp (bool = false) Whether to copy the files even if they come from the temp folder, otherwise temporary files are just moved to destination folder. If you use processors after export-corpus-media, you should activate this attribute so AVAA still finds the temporary files.

export-corpus-standalone 🧪

Exports current corpus media files together with a copy of the ORIGINAL corpus files edited to reference the exported media files. This produces a standalone corpus folder which can be easily shared because it does not contains absolute paths anymore.

folder (string = exported-corpus) Name of the folder to create and populate with the corpus files. If the name contains a slash (/) it will be considered as an absolute folder path, otherwise the folder is created in AVAA's temp directory.
overwrite (bool = false) Whether to overwrite existing files in the folder, otherwise another folder will be created
copy-temp (bool = false) Whether to copy the files even if they come from the temp folder, otherwise temporary files are just moved to destination folder. If you use processors after export-corpus-standalone, you should activate this attribute so AVAA still finds the temporary files.

export-to-eaf 🧪

Exports current corpus to EAF format.

copy-media (bool = false) Whether to copy the associated media files next to EAF file, to use as relative path (good for sharing). Otherwise absolute path of media will be used (good for testing)
export-empty-tiers (bool = false) Whether to also exports the tiers that don't have any annotation

export-to-srt 🧪

Exports a selection of annotations to SRT format

copy-media (bool = false) Whether to copy the associated media files next to SRT file

Input

Array<Annotation>

export-to-tei 🧪

Exports selection of annotations to TEI format. (Synchronization markers not implemented yet)

mode (select = TEI corpus) Method used to export the annotations into TEI(TEI corpus, TEI divisions, TEI files)
structure (select = speaker) Structure used to represent annotations(speaker, utterance)
author (string) The author to add to TEI file description
corpus-name (string = tei-corpus) Name of the exported corpus
corpus-subtitle (string) Subtitle for exported corpus and separate files
extension (string = .tei.xml) The extension of exported TEI files
folder (string) An optional specific folder name to save the TEI files

ffmpeg-cut

Cuts a segment from each corpus media file. This processor also accepts an array of annotations to cut multiple segments. In this case, the corpus will be reduced to relevant annotation files and each media file will be replaced by its cuts, or by a one merged file from all cuts when "concat" attribute is set to true.

start (string) Starting point in seconds to start cutting from
duration (string) Duration of cut segment
concat (bool = false) Whether to concat all segments in case multiple are cut

Input

Array<Annotation>

ffmpeg-denoise 🧪

This processor calls ffmpeg's denoise feature. - FFT: Denoises audio with FFT. - NLM: Reduces broadband noise using a Non-Local Means algorithm. - RNN: Reduces noise from speech using Recurrent Neural Networks model. Learn more about the RNN models at https://github.com/GregorR/rnnoise-models

method (string) The denoise method to use(FFT, RNN, NLM)
rnn-model (string = beguiling-drafter) (beguiling-drafter, conjoined-burgers, leavened-quisling, marathon-prescription,...)
rnn-mix (number = 1) How much to mix filtered samples into final output. Allowed range is from -1 to 1. Default value is 1. Negative values are special, they set how much to keep filtered noise in the final filter output
rnn-threads (int = 1) Number of threads (1 for stereo)
nlm-strength (number = 0.00001) Set denoising strength. Allowed range is from 0.00001 to 10000(0.00001, 10000)
nlm-patch (number = 0.002) Set patch radius duration. Allowed range is from 0.001 to 0.1(0.001, 0.1)
nlm-research (number = 0.006) Set research radius duration. Allowed range is from 2 to 300 milliseconds(0.002, 0.3)
nlm-smooth (number = 11) Set smooth factor. Allowed range is from 1 to 1000(1, 1000)
nlm-output (select) [output denoised,input unchanged,noise only] Set the output mode(o)

🗀 No LicenseNoise Removal Neural Network Modelsgithub.com/GregorR/rnnoise-models

ffmpeg-filter-audio

This processor calls ffmpeg with a user defined audio filter Learn more about what filters can do at https://www.ffmpeg.org/ffmpeg-filters.html

filter-audio (string) The filter expression
sample-rate (string) If specified, audio output will also be resampled

ffmpeg-filter-complex

This processor calls ffmpeg with a user defined filter-complex. Learn more about what filters can do at https://www.ffmpeg.org/ffmpeg-filters.html

filter-complex (string) The filter expression

ffmpeg-frei0r 🧪

Applies a frei0r filter on each corpus media file.

filter (string) The frei0r filter to use(3dflippo, addition, addition_alpha, aech0r, alpha0ps_alpha0ps,...)
param-1 (string) A parameter to pass to the filter
param-2 (string) A parameter to pass to the filter
param-3 (string) A parameter to pass to the filter

🗀 GPL-2.0 LicenseVideo filters bygithub.com/dyne/frei0r

🗀 CC BY-NC-ND 4.0 LicenseFrei0r DLL pack for Windows by Gyan Doshiwww.gyan.dev/ffmpeg

hardsub

Hardcodes annotations as subtitles on top of video. This processor will automatically use the values of the annotations that generated the clips, whenever they are available. It is possible to use different annotations from other tiers in range, by adding "source-tier" parameters.

include-tier-names (bool = false) Whether to include the tier names in the subtitles
include-tier-separator (string = : ) A separator to add between tier name and subtitle text
extend-duration-before (int = 0) Number of milliseconds to display subtitle before its original start time, so it is shown earlier
extend-duration-after (int = 0) Number of milliseconds added after the original end time, so it stays visible longer
style-color (color = #FFFFFF) Subtitles color
style-opacity (string = 100%) Subtitles opacity, in %
style-outline-color (color = #000000) Subtitles text outline color
style-outline-opacity (string = 100%) Opacity of text outline, in %
style-outline-width (string) Width of the text outline, in pixels
style-size (int) Subtitles size
style-bold (bool = false) Subtitles weight
style-font (string) Subtitles font name

SET source-tier="" name="" Add a tier from which to take subtitles text, and optionally customize its name

media-converter 🧪

Use Media Converter to convert video and audio files into other formats

video-resolution (string) Resolution of converted video (like 1280x720)
audio-volume (number = 1) Audio volume(0 1)

Input

Array<Annotation>

r-script 🧪

This processor executes a R program and integrates the resulting data into the final HTML page. Resulting R output can be graphic files (jpg, png, gif, svg) or tabular text data. Arguments provided to R are in order: - temp directory path to work with and create result files - path to a JSON file consisting of the selection (annotations) or data provided to the processor R scripts must follow a specific input/output syntax to be compatible with AVAA (see "Calling R" in the scripting guide).

file (file) .R file to run
source (js) Plain R source code to run

reduce-corpus

Reduces the corpus to specific files. Useful to work on a subset of the corpus without modifying the corpus itself. This processor also accepts a selection of annotations, in which case only corpus files of these annotations will be kept.

group (regexp) Which group of corpus files to keep (regular expression)
file (regexp) Which file from the corpus to keep (regular expression)
tag (regexp) Files from the corpus to keep that have a tag satisfying this regular expression
filter (js) A custom filter function called for each file that should return true to keep the file, like (f)=>f.filename.includes('.eaf')

Input

Array<Annotation>

reduce-corpus-media

Filters media files from the corpus, keeping only files that match a specific critera. The "exclude" attribute can be used to alternatively exclude these files from the corpus. Useful to work on a subset of the corpus media files without modifying the corpus itself.

exclude (bool = false) Whether to exclude the filtered files instead of keeping them
file (regexp) Which media file from the corpus to keep (regular expression)
filter (js) A custom filter function called for each file that should return true to keep the file, like (mf)=>mf.extension.includes('mp4')

remove-sequences-from-corpus

Takes a selection of annotations and modifies the corpus, using the input annotations as sequences, each sequence being removed from the corpus file, with its associated media segment and all the annotations included in that sequence.

Input

Array<Annotation>

rename-tiers

Rename tiers in all or specific corpus files.

file (regexp) Restrict file(s) on which to rename tiers
from (string) A single tier name to be replaced
to (string) The replacement name to use with the "from" attribute
map (js) A JSON object mapping the names to replace (keys) to the new names (values), like {"OLD NAME":"NEW NAME"}
template (js = `${name}`) A template function to build the name of each tier, called with the original tier name and object, like (name,tier) => `${name.toUpperCase()}`. When this attribute is specified all tiers (eventually filtered via the "file" attribute) will be affected.

reset-corpus

Reset the pipeline corpus to its original state Useful when working with loops.

sequences-to-corpus

Takes a selection of annotations and recreates the corpus, using the input annotations as sequences, each sequence being transformed into one corpus file with its associated media file and all the annotations/tiers included in that sequence.

name-template (js = `sequence-${i}-${a.af.id}`) The template function to build the name of each corpus file, provided with the annotation and its index, like (a,i) => `${i} - ${a.value}`

Input

Array<Annotation>

speaker-diarization-pyannote

Speaker diarization is the process of marking segments of voice with their speaker. This processor takes a selection of annotations, and add to the corpus new annotations associated with their speaker tier.

hf-token (string) The HuggingFace access token, required to download pyannote models
speakers (int = 0) Number of speakers in the audio

Input

Array<Annotation>

Output

Array<Annotation>

⭳ MIT LicenseSpeaker diarizationgithub.com/pyannote/pyannote-audio

speech-to-text-faster-whisper

A speech to text processor using SYSTRAN Faster Whisper to transcribe and automatically create annotations

language (string) The language to transcribe, if not specified autodetect will be attempted
model (select = tiny) The trained model to use for transcription(tiny, small, medium, large, distil-large-v3)
temperature (number = 0) Temperature, adjust to fix hallucinations
device (select = auto) The processing device to use(auto, cpu, cuda)
precision (select = int8) (auto, int8, fp16, fp32)
beam-size (int = 5) The decoding beam size
batched (bool = false) Whether to use batch processing (faster)
word-timestamps (false) Whether to output word-level timestamps
output-tier (string = stt-faster-whisper) Tier id for the extracted annotations
vad-threshold (number = 0.5) Speech threshold. Silero VAD outputs speech probabilities for each audio chunk, probabilities ABOVE this value are considered as SPEECH. It is better to tune this parameter for each dataset separately, but "lazy" 0.5 is pretty good for most datasets.
vad-min-speech-duration (int = 250) Final speech chunks shorter than this are thrown out (in milliseconds)
vad-max-speech-duration (number) Maximum duration of speech chunks in seconds. Chunks longer than max_speech_duration_s will be split at the timestamp of the last silence that lasts more than 100ms (if any), to prevent aggressive cutting. Otherwise, they will be split aggressively just before max-speech-duration.
vad-min-silence-duration (int = 2000) In the end of each speech chunk wait for min-silence-duration before separating it (in milliseconds).
vad-speech-pad-ms (int = 400) Final speech chunks are padded by vad-speech-pad milliseconds each side
verbose (bool = false) Whether to log the transcriptions as soon as they are detected

Output

Array<Annotation>

⭳ MIT LicenseReimplementation of OpenAI's Whisper model using CTranslate2github.com/SYSTRAN/faster-whisper

speech-to-text-whisper

A speech to text processor using OpenAI Whisper to transcribe and automatically create annotations

language (string) The language to transcribe, if not specified autodetect will be attempted
model (string = small) The trained model to use for transcription(tiny, small, medium, large-v3)
temperature (number = 0) Temperature, adjust to fix hallucinations
output-tier (string = stt-whisper) Tier id for the extracted annotations
verbose (bool = false) Whether to log the transcriptions as soon as they are detected

Output

Array<Annotation>

⭳ MIT LicenseRobust Speech Recognition via Large-Scale Weak Supervisiongithub.com/openai/whisper

speech-to-text-whisper-at 🧪

A variation of OpenAI Whisper designed to extract audio events of the 527-class AudioSet, Whisper-AT processor outputs general audio events as annotations.

language (string) The language to transcribe will also affect the names of the audio events
model (string = tiny) The trained model to use for transcription(tiny, small, medium, large-v3)

Output

Array<Annotation>

⭳ BSD-2 LicenseNoise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggersgithub.com/YuanGongND/whisper-at

vad-silero

Silero's Voice Activity Detector processor creates annotations for each segment of input audio containing voice.

output-tier (string = vad-silero) Tier id for the generated annotations
threshold (number = 0.5) Use a higher threshold for noisy audio
sampling-rate (number = 16000) (8000, 16000, 32000, 48000)
min-silence-duration (int = 500) Number of milliseconds
min-speech-duration (int = 1000) Minimum duration (in milliseconds) of activity to consider a voice segment

Output

Array<Annotation>

⭳ MIT LicensePre-trained enterprise-grade Voice Activity Detectorgithub.com/snakers4/silero-vad

🗀 MIT LicenseSilero JIT and ONNX files

video-anonymizer 🧪

Anonymize videos with these special effects: - deface: automatically detects and blur faces - cartoon: cartoonize the video - cartoon-blur: cartoonize and blur the video

mode (select = cartoon) The type of anonymisation to apply on the video(cartoon, deface, cartoon-blur, retro-glow)
deface-mode (select = blur) Anonymization filter mode for face regions(blur, solid, mosaic)
deface-threshold (number = 0.2) Detection threshold (tune this to trade off between false positive and false negative rate)
deface-mask-scale (number = 1.3) Scale factor for face masks, to make sure that masks cover the complete face
deface-mosaicsize (int = 20) Width of the mosaic squares when deface-mode is mosaic
deface-boxes (bool = false) Use boxes instead of ellipse masks
deface-draw-scores (bool = false) Draw detection scores onto outputs, useful to find the best threshold
deface-downscale (string) Downscale resolution for the network inference (WxH)

⭳ MIT LicenseVideo anonymization by face detectiongithub.com/ORB-HD/deface

🗀 GPL-2.0 LicenseVideo filters bygithub.com/dyne/frei0r

🗀 CC BY-NC-ND 4.0 LicenseFrei0r DLL pack for Windows by Gyan Doshiwww.gyan.dev/ffmpeg

Processor Pipelines

AVAA Toolkit features an advanced pipeline system easing automation of complex tasks.

A pipeline is created for each section of the document, and initially contains a virtual copy of the corpus and its associated media files.
The corpus and its media files are then modified sequentially by each processor inside the pipeline.

Pipeline Input Modes

The pipeline can be fed different initial media files, by defining the processor-pipeline-input setting.

corpus: this is the default mode if the processor is placed at the beginning of a section, and will feed the pipeline with the corpus media files
section-assets: this is the default mode if the processor is placed after a view which exported clips, and will feed the pipeline with all the exported clips/snapshots (of the section) until this processor was reached
all-assets: this mode must be manually selected, and will feed the pipeline with all the exported clips/snapshots of the document until this processor was reached

The corpus mode is useful to process corpus files directly (audio-anonymization, formats conversion...), while for instance all-assets mode could be used to apply effects only on the exported media of the document intended for sharing with peers.

Pipeline Chain

Processors inside a pipeline (that is for now, a section of the document) are executed one after another, each processor using the results of the previous one to work on.

Complex chains of processors can be built to automate heavy tasks alleviating the burden of manually running each step and verifying its consistency.

Pipeline and Views

Views placed after a processor (in the same section) will inherit its modified media files when exporting clips and snapshots.
This can be helpful extracting annotations from cuts of raw media files, to avoid processing long corpus media file when testing samples ; or preprocessing a media file before it is exported into clips during later views generation.

Processors generating annotations will make these annotations immediately available in the main corpus (and not only for the current pipeline), hence for all subsequent views and processors in the document.

Settings

Settings can be modified at any time via the Local Settings block.

video-codec (copy) string Codec for exported video clips encoding
video-resolution string Resolution (ie. 640x480) of exported video clips
video-frame-rate int Frame rate of exported video clips
video-format string Format (ie. mp4) of exported video clips
video-filter string FFMPEG complex filter to apply on each exported video clip
video-poster string () [,details,waveform] Script for video poster generation (default to snap)
video-input int (1) Which video input to choose from when multiple are available in a corpus file
video-inputs int (1) Number of video inputs from a corpus file, to use when building clips
video-stack-mode string (hstack) [hstack,vstack] The stacking mode to combine multiple inputs
audio-codec string Codec for audio encoding (aac, mp3...)
audio-channels int Number of audio channels (mono 1, stereo 2)
audio-sample-rate int Sample rate in Hz
audio-bit-rate int Bit rate in bit/s
audio-filter string FFMPEG audio filter to apply on each exported video clip
audio-poster string (details) [details,waveform] Script for audio poster generation
snap-offset string (0%) Offset to capture snapshot, from annotation's start time (in milliseconds, or percentage with %)
poster-resolution string Resolution of exported poster (ie. 640x480)
disable-cache bool (false) Disabling cache will force all assets (clips, posters) to be regenerated during conversion
processor-pipeline-input string () [,corpus,section-assets,all-assets] The initial media to populate the pipeline

Styling

It is possible to change the style via CSS. The HTML code generated makes it easy to target specific elements or apply styling rules for the whole page. Each view has its own structure of elements, and a simple "Inspect Element" from browser will reveal selectors.

Embedded CSS

Styles can be defined directly in the XML file, by using a STYLE tag.

These styles will only apply to this specific HTML document.


<STYLE>
.view-timeline td {
    border-color:red;
}
.view-timeline tr.tier-header {
    text-align:right;
}
</STYLE>

CSS File

Styles can be defined in a separate CSS file, that must be placed in the include folder.

All the generated HTML documents will load this file and have these styles in common.

e.g. my-styles.css


h2 {
    color:green;
}
section {
    border-left: 2px solid gray;
}

Styling Views

Views generate simple HTML code and try to follow common guidelines so that applying styles is straightforward

Annotations' text labels always have the annotation class, so for instance to change the color of all annotations:


.view .annotation {
    color:red;
}

Making PDF

AVAA Toolkit can also generate PDF, though interactive features like videos or dynamic charts won't work in this format, for obvious reasons.

Chrome (or Chromium) must be installed on the system (alternatively on Windows AVAA Toolkit will try to use Edge).

Chrome/Edge executable should be detected automatically, if that fails it is required to provide its path in avaa-config.xml

If Chrome/Edge is not available, it is recommended to install Chrome Headless Shell and then provide its path in avaa-config.xml

Command Line

AVAA Toolkit is made for the command line and can integrate seamlessly in any tool chain.

Usage:  [options] XML files or folders to process
  Options:
    --lang
      Language of the generated document, if translations are available
      Default: 
    --watch
      Watch for xml changes and regenerate documents
      Default: false
    --combine
      Combine documents into one final html file
      Default: false
    --pdf
      Also convert HTML to PDF
      Default: false
    --zip-all
      Zip all generated documents together
      Default: false
    --zip-each
      Zip each generated documents separately
      Default: false
    --deployer-user
      Deployer user name
      Default: 
    --deployer-pass
      Deployer password
      Default: 
    --deploy
      Upload zip to deployer
      Default: false
    --deployer-url
      Specify a custom deployer URL to upload zip to
      Default: 
    --debug
      Debug mode
      Default: false
    --verbose
      Display more information when converting
      Default: true
    --path
      Path of application for includes. Default to working directory
    --path-temp
      Path for temporary files. Default to ./temp/
    --test
      Run a XML document as a test suite
      Default: 
    --gendoc
      Generate all documentation and exit
      Default: false
    --dev
      Reload scripts before building
      Default: true
    --cache-af
      Cache annotations file in memory for faster exec
      Default: true
    --server-allowed-origin
      A custom origin URL allowed to connect to the server
      Default: 
    --server
      Websocket server for editor and interactive sessions
      Default: false
    --server-port
      Websocket server port
      Default: 41744
    --server-ssl
      Use SSL certificate (for Server Mode)
      Default: false
    --mongo-host
      Address of the mongodb server
      Default: 
    --mongo-port
      Port of the mongodb server
      Default: 27017
    --mongo-db
      Name of database to work with
      Default: 
    --download-remote-corpus
      Whether to automatically download a referenced remote corpus
      Default: false
    --conf
      Custom config file to load
      Default: avaa-config.xml

Troubleshooting

Installation and first run

Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.UnsupportedClassVersionError:
    org/avaatoolkit/Main has been compiled by a more recent version of the Java Runtime (class file version 55.0),
        this version of the Java Runtime only recognizes class file versions up to X

Solution: Your version of java runtime is outdated, follow these steps

java.net.BindException: Couldn't bind to any port in the range `41744:41744`.
    at org.glassfish.grizzly.AbstractBindingHandler.bind(AbstractBindingHandler.java)
    at org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java)
    at org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:)
    at org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java)
    at org.avaatoolkit.server.Server.start(Daemon.java)
    at org.avaatoolkit.Main.main(Main.java)

Solution: The toolkit is already started with the --server argument, close it before running a new instance.
Solution: Your firewall has a strict policy regarding localhost port bindings, add a rule to allow localhost:41744

Custom Java Runtime

On some operating systems, the installed java runtime might not be up-to-date and prevent AVAA Toolkit from executing properly.
To run AVAA Toolkit, at least java 11 is required. To install a valid runtime:

Go to adoptium.net and install release for your system

Alternatively using the OpenJDK archives:

Go to Open JDK and download the archive for your system
Extract the archive into AVAA Toolkit installation's folder
Rename the extracted jdk-22.x.x folder to jdk
The directory path should be avaa-toolkit / jdk / bin /
The launcher should now use the provided runtime in the jdk folder automatically

Advanced Video Processing

Some processors require a full FFmpeg version to work.

For Windows install full version of FFmpeg ffmpeg-release-full.7z

FFmpeg Clipping Caveats

Short clips or sync issues

When generating really short clips (under 1 sec), it is possible that the clips will consist of only one frozen image.

This is because by default FFmpeg will be instructed to do a copy of the video stream (vcopy), which saves considerable processing time, at the expense of less accurate clipping.

When perfect accuracy is required for clips, it is recommended to force FFmpeg re-encoding, for instance by defining the Setting video-codec = h264

Licenses

About licensed material

This documentation includes attributions to licensed material such as libraries and software modules.
These notices are written explicitly in each relevant component and for convenience listed again below.

Components License Notices

Some modules are not included in AVAA Toolkit but rather installed on demand whenever a component requires it.
Other modules are included or integrated in AVAA Toolkit to provide a better overall user experience.
This behavior is indicated by a little icon preceding the license, as well as a tooltip describing its inclusion method.

Redistributed Material

Additional libraries are packaged with the produced HTML document, and therefore redistributed by the end user.

jQuery

jQuery simplifies DOM manipulation, some components use it to initialize content in the browser.

🗀 MIT License jQuery jquery.com/license

D3 - Data Visualization Library

D3 has unparalleled flexibility in building custom and dynamic visualizations.
Charts generated by AVAA Toolkit are actually rendered right in the browser with D3.

🗀 ISC License D3 d3js.org

Tipped - Tooltip solution based on jQuery

Tipped features easy to use and customizable tooltips.
AVAA Toolkit views sometimes use these tooltips for instance to show snapshots or videos in a small popup when an annotation is clicked or hovered.

🗀 CC BY 4.0 License Tipped github.com/staaky/tipped

FileSaver.js - Save files generated in the browser

FileSaver.js provides a simple interface to save (as a "download") files created directly in the browser.
We believe AVAA Toolkit components can benefit from the presence of the FileSaver library.

🗀 MIT License FileSaver.js github.com/eligrey/FileSaver.js

AVAA Toolkit Core Libraries

AVAA Toolkit itself is built with Java, and makes use of various libraries (via Maven) which are compiled into the final JAR executable distributed to the toolkit users.

NewPipe Extractor

A library for extracting things from streaming sites, AVAA Toolkit includes this library to provide an easy API for downloading PeerTube videos.

🗀 GPL-3.0 License NewPipe Extractor github.com/TeamNewPipe/NewPipeExtractor

Bramp FFmpeg

A FFmpeg CLI Wrapper for Java, used to execute FFmpeg and read progress feedback.

🗀 BSD-2-Clause License Bramp FFmpeg github.com/bramp/ffmpeg-cli-wrapper

Mozilla Rhino

Rhino is the JavaScript engine used to execute all components' scripts.

🗀 Mozilla Public License 2.0 Rhino github.com/mozilla/rhino

JCommander

The best library for parsing command-line arguments.

🗀 Apache-2.0 License JCommander github.com/cbeust/jcommander

Grizzly

This library is used to spawn server sockets, and brings WebSocket sessions (that's how the Editor can interact with AVAA Toolkit).

🗀 Eclipse Public License 2.0 Grizzly github.com/eclipse-ee4j/grizzly

Jsoup

Jsoup simplifies HTML/XML parsing via a CSS selectors syntax.

🗀 MIT License Jsoup jsoup.org

Jspecify

An artifact of fully-specified annotations to power static-analysis checks, beginning with nullness analysis.

🗀 Apache-2.0 License Jspecify jspecify.org

Jchardet

Jchardet is a Java port of the source from mozilla's automatic charset detection algorithm.

🗀 Mozilla Public License 1.1 Jchardet jchardet.sourceforge.net

OSHI

A JNA-based (native) Operating System and Hardware Information library, to get processes details and CPU usage.

🗀 MIT License OSHI github.com/oshi/oshi

Apache Commons

Apache Commons is a set of commonly needed features implemented as reusable Java components.

🗀 Apache-2.0 License Apache Commons commons.apache.org

SLF4J

A simple facade abstraction for various logging frameworks.

🗀 MIT License SLF4J slf4j.org

Logback

A reliable, generic, fast and flexible logging framework.

🗀 LGPL 2.1 License Logback logback.qos.ch

Lombok

Automate Java boilerplate code via annotations.

🗀 LGPL 2.1 License Lombok projectlombok.org

OkHttp

OkHttp is an efficient HTTP client.

🗀 Apache-2.0 License OkHttp square.github.io/okhttp

MongoDB Driver

The MongoDB Synchronous Driver provides an easy API for interacting with a MongoDB Server.

🗀 Apache-2.0 License MongoDB Driver mongodb.com/docs/drivers/java/sync/current

Metadata Extractor

Drew Noakes' excellent library for extracting metadata (Exif/IPTC/XMP...) from image files.

🗀 Apache-2.0 License Metadata Extractor github.com/drewnoakes/metadata-extractor

What about AVAA Toolkit's License?

We are currently working on that.