AVAA Toolkit 0.62

The Audio Video Annotations Analysis Toolkit

Introduction

Corpora analysis is a complex task, requiring to learn editors for different file formats and multiple tools, often command-line based, or with programming knowledge prerequisite.

AVAA Toolkit makes it easy to create pipelines connecting ecosystems to process raw data (automated transcriptions, formats conversion..), and query large corpora of annotations coming from various sources to extract advanced statistics and generate beautiful, always up-to-date charts and timelines.

AVAA Toolkit is also a flexible converter ; it takes as input XML files describing the style and operations to generate an HTML document, and takes care of exporting only relevant portions of videos and their thumbnail snapshots, minimizing final document size and potential load times if hosted online.

Annotations Formats

AVAA Toolkit understands the following file formats

โญณ GPL-3.0 License TEI, CHA and TEXTGRID formats are available thanks to theTEI-CORPO project

Raw Media Formats

AVAA Toolkit can also process the following media types

โญณ Multiple Licenses Most media processing made possible byFFmpeg

Installation

Simply extract the latest release zip

Windows

To start AVAA Toolkit, simply double-click the launcher file avaa-toolkit.exe

The executable is not signed, so Windows will ask for confirmation before starting it.

In case AVAA Toolkit fails to start, check the troubleshooting section.

Linux / macOS

On those systems AVAA Toolkit can be started by running the shell script avaa-toolkit.sh but it must be set as executable first:

  1. Open a terminal and navigate to AVAA Toolkit's directory with the cd command.
  2. Run the following command to make AVAA Toolkit's launcher executable: sudo chmod +x avaa-toolkit.sh
  3. Start AVAA Toolkit by running the command ./avaa-toolkit.sh or double-click the launcher file.

Note that at least Java 11 is required, you can install latest version using trusted adoptium.net binaries.

Update

When AVAA Toolkit is already installed, follow these steps to update:

  1. Close AVAA Toolkit if it is running
  2. Delete these folders: scripts, editor, tests
  3. Download the latest release zip
  4. From the zip, copy to your installation folder (replace existing) avaa-toolkit.jar, include, scripts, editor, tests
  5. Restart toolkit

Editor

An editor for AVAA Toolkit's XML documents is available in the browser.

To begin, start AVAA Toolkit by running the launcher (avaa-toolkit.exe on Windows or avaa-toolkit.sh on Linux/macOS),
then navigate with your browser to avaa-toolkit.org
If internet is not available, use the provided offline editor in your installation folder (open index.html)

By default, the editor is allowed to create and edit files in the projects folder.

It is possible to add other folders to the editor, just edit avaa-config.xml file.

XML Structure

AVAA Toolkit will process XML files and convert them to HTML.
It expects a document with the following structure

Queries

AVAA Toolkit is all about querying and filtering annotations. Complex queries can be expressed to extract only specific annotations.
This is done via the SELECT tag, various attributes can be combined to make a curated selection of annotations:

Attributes of type regexp (*-match) have additional options:

When multiple attributes are used, the selection will consist only of the annotations fulfilling all the constraints.

Views

A view defines how annotations are rendered in the page. Each view has its specific attributes that can alter annotations' display and final HTML output. While all views come with basic default style, it is possible to change any visual aspect via CSS to fit custom needs. Because everyone will have different visual requirements, styling is entirely left to the document authors.

concordancer

The concordancer view displays a table of annotations and their cotext annotations. Attributes allow timerange and count limits, to extract only meaningful relative annotations. Configurable display to see each cotext annotation in its own column, or all combined into one clip.

  • show-value (bool = true) Whether to display the annotation's text value
  • show-video (bool = false) Whether to display the video clip of the processed annotation
  • show-tier (bool = false) Whether to display the annotation's tier
  • show-group (bool = false) Whether to display the annotation's group
  • show-file (bool = false) Whether to display the annotation's file
  • show-index (bool = false) Whether to display the index of each row
  • video-with-cotext (bool = true) Whether to include cotext annotations in the video
  • cotext (int = 1) Number of annotations to extract before and after the processed annotation
  • cotext-before (int = cotext) Number of annotations to extract before the processed annotation
  • cotext-after (int = cotext) Number of annotations to extract after the processed annotation
  • cotext-video (bool = false) Whether to display a video for cotext annotations
  • cotext-split (bool = false) Whether to display each cotext annotation in its own column
  • max-time-diff (int = 60) Skip cotext annotations that are out of this range
  • label-before (string = Before) Label for table header
  • label-pivot (string = Pivot) Label for table header
  • label-after (string = After) Label for table header
  • label-index (string = ) Label for table header
  • label-tier (string = Tier) Label for table header
  • label-group (string = Group) Label for table header
  • label-file (string = File) Label for table header
  • label-none (string = ) Label for no matches

Input

  • Array<Annotation>


density

The density view plots annotations as filled bars in a timeline, revealing interactions frequency and duration between tiers.

  • group (bool = false) Whether to separate annotations based on their group instead of their file
  • sequence (string) Separate annotations based on another tier's annotations (used as sequences)
  • time-offset (bool = false) Whether to use files' time-offset
  • time-collapse (number = 0) Number of seconds after which empty space is collapsed (0 = no collpase)
  • zoom-factor (int = 100) Milliseconds to pixel factor
  • show-alt-tiers (bool = false) Whether to display alternative tiers in the main table
  • show-alt-tier-in-empty-cells (bool = false) Whether to display the alternative tier if the cell is empty
  • show-value-on-hover (bool = false) Whether to display the annotation value on mouse over
  • show-snap-on-hover (bool = false) Whether to display the video snapshot on mouse over
  • show-video-on-click (bool = false) Whether to display the video player on click
  • video-player-width (string = 300)
  • video-player-height (string)
  • SET tier="id" name="" parent="" color="" alt-tier="" Customize tier display
  • SET group="id" name="Custom Name" Set a custom header name for each group
  • SET file="id" name="Custom Name" Set a custom header name for each file
  • SET alt-tier="id" label="" css="" Customize alternative tier display and legend

Input

  • Array<Annotation>

density-timeline ๐Ÿงช

The density timeline plots annotations as filled bars horizontally, revealing interactions frequency and duration between tiers. Flexible time collapsing options allow compacting empty space between annotations. Currently not compatible with alt-tiers.

  • group (bool = false) Whether to separate annotations based on their group instead of their file
  • sequence (tier) Separate annotations based on another tier's annotations (used as sequences)
  • relate-to-corpus (bool = false) Whether to show the whole corpus regardless of input selection
  • time-offset (bool = false) Whether to use files' time-offset
  • time-between (bool = false) Whether to show the time between separators
  • time-collapse-threshold (number = 0) Number of seconds after which empty space is collapsed (0 = no collpase)
  • time-collapse-display (string = show-duration) How the collapsed zone should look(show-duration, short, compact, dense, minimum, none)
  • time-collapse-alt-display (string = compact) How the collapsed zone should look if it collapsed less than alt-display-threshold(show-duration, short, compact, dense, minimum, none)
  • time-collapse-alt-display-threshold (number = 0) Number of seconds collapsed before which alt-display is used
  • time-collapse-hide-first-start (bool = false) Whether to hide the first (starting) collapse, if any
  • time-collapse-hide-each-start (bool = false) Whether to hide the first (starting) collapse of each group/column, if any
  • time-collapse-hide-each-last (bool = false) Whether to hide the last collapse of each group/column, if any
  • time-collapse-hide-borders (bool = false) Whether to hide the left and right borders of collapsed zones
  • time-collapse-absolute-start (bool = false) Whether to show the duration of the first collapse of each group, as an absolute duration (relative to corpus beginning when time-offset is active)
  • time-collapse-debug (bool = false) Whether to show the debug info of collapses
  • zoom-factor (int = 100) Milliseconds to pixel factor (how many milliseconds one pixel will represent)
  • color-values (bool = false) Whether to use CSS for specific values, colors defined in Style or as parameters
  • color-values-opacity (bool = false) Whether to toggle the opacity of the colored values
  • show-alt-tiers-in-table (bool = false) Whether to display alternative tiers in the main table
  • show-alt-tier-in-empty-cells (bool = false) Whether to display the alternative tier if the cell is empty
  • show-value-on-hover (bool = false) Whether to display the annotation value on mouse over
  • show-snap-on-hover (bool = false) Whether to display the media snapshot on mouse over
  • show-media-on-click (bool = false) Whether to display the media player on click
  • video-player-width (string = 300)
  • video-player-height (string)
  • SET tier="id" name="" parent="" color="" alt-tier="" Customize tier display
  • SET group="id" name="Custom Name" Set a custom header name for each group
  • SET file="id" name="Custom Name" Set a custom header name for each file
  • SET alt-tier="id" label="" css="" Customize alternative tier display and legend
  • SET value="value" label="" color="" css="" Customize specific value display and legend

Input

  • Array<Annotation>

form

The form view renders HTML input forms, allowing easy online sharing for collecting external data. Form results are simple JSON files which can then be imported back as virtual tiers for further analysis.

  • show-header (bool = false) Whether to display table header
  • show-index (bool = true) Whether to display index column
  • show-value (bool = true) Whether to display annotation's text value
  • show-video (bool = true) Whether to display annotation's video clip
  • show-tier (bool = true) Whether to display tier's name
  • show-file (bool = false) Whether to display annotation file name
  • cotext (int = 0) Number of cotext annotations to show before and after (0 = no cotext)
  • cotext-range (number = 0) Max number of seconds to consider a neighbor annotation as a valid cotext annotation (0 = no limit)
  • cotext-media (bool = false) Whether to extend the clip to include the cotext annotations (make sure to use a reasonable cotext-range to avoid super long clips)
  • extend-duration (number = 0) Number of seconds to add before and after when cutting the clip
  • label-save (string = SAVE) Save button label
  • name (string) a name to identify this form, will be used as filename and accept field parameters interpolation
  • SET dimension="name" label="label" cv="controlled vocabulary id"
  • SET dimension="name" label="label" choices="list,of,choices"
  • SET dimension="name" label="label" type="text"
  • SET field="name" label="label" type="text"

Input

  • Array<Annotation>

hidden

A special view with no output, for instance to do live queries.

intercoding

The intercoding view makes it easy to process JSON files resulting from forms, and display them in a meaningful way for intercoding validation and statistics.

  • reference (tier) The tier used as reference, if any.
  • show-video (bool = true) Whether to display the annotation's video clip
  • show-index (bool = true) Whether to display the index
  • show-start (bool = false) Whether to display the start timecode
  • values-replacement (js) A JSON object used to replace a given value with another one, like {"a1":"A", "a2":"A"}
  • weight-matrix (bool = false) Whether to use the weight matrix
  • weight-matrix-values (matrix) A custom matrix defining the weight of each values' pair
  • weight-matrix-func (js) A custom JS function that should return the weight for a given pair provided as arguments, like (a,b) => a==b ? 1 : 0
  • save-weight-matrix-csv (string) Name of a CSV file to export the weight matrix
  • save-coincidence-matrix-csv (string) Name of a CSV file to export the coincidence matrix
  • save-krippendorff-alpha-json (string) Name of a JSON file to export Krippendorff's alpha, as {"alpha":X}
  • SET column="tier id" name="" Set custom column order and header names
  • SET exclude-tag="tag" Exclude from the intercoding calculations all annotations that have this tag
  • SET name-of-tier="id" name="Custom Name" Set a custom header name for each tier
  • SET name-of-group="id" name="Custom Name" Set a custom header name for each group
  • SET name-of-file="id" name="Custom Name" Set a custom header name for each file
  • SET stat="score" name="Score (Points)" Add a stat column (score alone)
  • SET stat="score%" name="Score (Percentage)" Add a stat column (score percentage)
  • SET stat="score/" name="Score (Points/Total)" Add a stat column (score on total)
  • SET stat="count" name="Similarity percentage count" Add a stat column
  • SET stat="weight-matrix" name="Weight Matrix" Show the weight matrix
  • SET stat="coincidence-matrix" name="Coincidence Matrix" Show the coincidence matrix
  • SET stat="krippendorff-alpha" name="Krippendorff's Alpha" Show Krippendorff's Alpha

Input

  • Array<Annotation>

json

Displays JSON serializable data as an interactive tree.

Input

  • Object
  • Array

list

The list view simply renders annotations one after another. It is possible to specify a custom class for switching display to grid mode.

  • show-value (bool = true) Whether to display annotation's text value
  • show-media (bool = true) Whether to display annotation's media clip
  • show-snap (bool = false) Whether to display annotation's snapshot
  • show-tier (bool = true) Whether to display tier's name
  • class (string = list) Alternative display style(list, grid)
  • file-header (html) Raw html to insert as a header before each file
  • file-separator (html) Raw html to insert as a separator between each file
  • file-footer (html) Raw html to insert as a footer after each file
  • SET file="" name="" Set a custom name for each file

Input

  • Array<Annotation>

table

The table view can display various annotations' (or objects) properties into table columns. It also supports the Extra protocol allowing user-defined properties to be displayed as columns.

  • header-name-replace (js) A custom function to replace the columns header name
  • SET column="index" name="Index"
  • SET column="tier" name="Tier"
  • SET column="value" name="Annotation Value"
  • SET column="video" name="Video"
  • SET column="snap" name="Snapshot"
  • SET column="start" name="Start timecode"
  • SET column="stop" name="Stop timecode"
  • SET column="group" name="Group"
  • SET column="participant" name="Participant"
  • SET column="file" name="File"
  • SET column="extra" extra-id="" name="" as="[value,video]"
  • SET column="property" property="" name="" as="[text,link]"

Input

  • Array<Annotation>
  • Array<Object>

testcase

A special view to display testcase results.

timeline

The timeline view displays annotations vertically with time markers, and using one column per tier. Ideal for dialogs between 2+ participants.

  • collapse (int = 60) Max number of seconds between 2 annotations beyond which a separator indicating collapsed duration is displayed
  • show-value (bool = true) Whether to display the annotation's text value
  • show-video (bool = true) Whether to display the annotation's video clip
  • show-timestamp (bool = true) Whether to display the timestamp
  • group (bool = false) Whether to separate annotations based on their group instead of their file
  • sequence (string) Separate annotations based on another tier's annotations (used as sequences)
  • SET tier="id" name="Custom Name" Set a custom header name for each tier
  • SET group="id" name="Custom Name" Set a custom header name for each group
  • SET file="id" name="Custom Name" Set a custom header name for each file
  • SET column="tier id" Set a custom order for each tier

Input

  • Array<Annotation>

transcript

Display annotations as a simple transcript, can also export transcript to CSV files.

  • style (select = monospace) The layout to use(monospace, fluid)
  • separator (string = : ) The separator to use between the tier name and the text
  • max-line-width (int = 0) Maximum number of characters in one line of text (0 = disabled)
  • min-tier-width (int = 0) Minimum width (in characters) of the tier column
  • export-to-csv (bool = false) Whether to also export a CSV file of each transcript

Input

  • Array<Annotation>

wordcloud

The wordcloud helps visualize frequency of words and has many customisation attributes. Wordcloud can slow down PDF generation time and can take some time to show-up.

  • length-threshold (int = 3) Exclude words with less than length-threshold letters
  • frequency-threshold (int = 0) Exclude words with a lower frequency
  • size-factor (int = 0) The size factor to draw words (0 = auto)
  • min-size (int = 0) (0 to disable)
  • weight-factor (number = 1)
  • font-family (string)
  • font-weight (string = normal)
  • color (string = random-dark)
  • background-color (string = #fff)
  • draw-out-of-bound (bool = false)
  • shrink-to-fit (bool = false)
  • origin (array) origin of the โ€œcloudโ€ in [x, y]
  • min-rotation (number = -Math.PI / 2)
  • max-rotation (number = Math.PI / 2)
  • rotation-steps (number = 0)
  • rotate-ratio (number = 0.1)
  • shape (string = circle) (circle, cardioid, diamond, square, triangle-forward, triangle, pentagon, star)
  • ellipticity (number = 0.65)
  • shuffle (bool = true)
  • list-words (int = 0) How many words to show in a table, 0 to disable

Input

  • Array<Annotation>
  • Array<Array> an array of ["word", weight] pairs
๐Ÿ—€ MIT LicenseWordcloud2 by Timothy Guan-tin Chien and contributorsgithub.com/timdream/wordcloud2.js

Charts

Use charts to visualize data through meaningful representations. Powered by D3.js and Observable

๐Ÿ—€ ISC LicenseD3 by Mike Bostockgithub.com/d3/d3

donut

  • name (string = ([x]) => x) given d in data, returns the (ordinal) label
  • value (string = ([, y]) => y) given d in data, returns the (quantitative) value
  • title (string) given d in data, returns the title text
  • width (string = 640) outer width, in pixels
  • height (string = 400) outer height, in pixels
  • innerRadius (string = 0) inner radius of pie, in pixels (non-zero for donut)
  • outerRadius (string = Math.min(width, height) / 2) outer radius of pie, in pixels
  • labelRadius (string = (innerRadius * 0.2 + outerRadius * 0.8)) center radius of labels
  • format (string = ',') a format specifier for values (in the label)
  • names (string) array of names (the domain of the color scale)
  • colors (string) array of colors for names
  • stroke (string = innerRadius > 0 ? 'none' : 'white') stroke separating widths
  • strokeWidth (string = 1) width of stroke separating wedges
  • strokeLinejoin (string = 'round') line join of stroke separating wedges
  • padAngle (string = stroke === 'none' ? 1 / outerRadius : 0) angular separation between wedges
๐Ÿ—€ ISC LicenseAdapted fromobservablehq.com/@d3/donut-chart

grouped-bar

  • x (string = (d, i) => i) given d in data, returns the (ordinal) x-value
  • y (string = d => d) given d in data, returns the (quantitative) y-value
  • z (string = () => 1) given d in data, returns the (categorical) z-value
  • title (string) given d in data, returns the title text
  • marginTop (string = 30) top margin, in pixels
  • marginRight (string = 0) right margin, in pixels
  • marginBottom (string = 30) bottom margin, in pixels
  • marginLeft (string = 40) left margin, in pixels
  • width (string = 640) outer width, in pixels
  • height (string = 400) outer height, in pixels
  • xDomain (string = d3.groupSort(data, g => d3.median(g, ${x}), ${x})) the x domain [xmin, xmax]
  • xRange (string = [marginLeft, width - marginRight])
  • xPadding (string = 0.1) amount of x-range to reserve to separate groups
  • yType (string = d3.scaleLinear) type of y-scale
  • yDomain (string)
  • yRange (string = [height - marginBottom, marginTop])
  • zDomain (string = d3.groupSort(data, g => -d3.median(g, ${y}), ${z})) the z domain
  • zPadding (string = 0.05) amount of x-range to reserve to separate bars
  • yFormat (string) a format specifier string for the y-axis
  • yLabel (string) a label for the y-axis
  • filterZ (string = false) other way
  • separatorWidth (string = 0)
  • separatorOffset (string = 0)
  • separatorColor (string = 'gainsboro')
  • colors (string) array of colors
๐Ÿ—€ ISC LicenseAdapted fromobservablehq.com/@d3/grouped-bar-chart

horizontal-bar

  • x (string = d => d) given d in data, returns the (quantitative) x-value
  • y (string = (d, i) => i) given d in data, returns the (ordinal) y-value
  • title (string) given d in data, returns the title text
  • marginTop (string = 30) the top margin, in pixels
  • marginRight (string = 0) the right margin, in pixels
  • marginBottom (string = 10) the bottom margin, in pixels
  • marginLeft (string = 80) the left margin, in pixels
  • width (string = 640) the outer width of the chart, in pixels
  • height (string) outer height, in pixels
  • xType (string = d3.scaleLinear) type of x-scale
  • xDomain (string)
  • xRange (string = [marginLeft, width - marginRight])
  • xFormat (string) a format specifier string for the x-axis
  • xLabel (string) a label for the x-axis
  • yPadding (string = 0.1) amount of y-range to reserve to separate bars
  • yDomain (string) an array of (ordinal) y-values
  • yRange (string)
  • colors (string) array of colors
  • titleColor (string = "white") title fill color when atop bar
  • titleAltColor (string = "currentColor") title fill color when atop background
๐Ÿ—€ ISC LicenseAdapted fromobservablehq.com/@d3/horizontal-bar-chart

inline

  • x (string = ([x]) => x) given d in data, returns the (temporal) x-value
  • y (string = ([, y]) => y) given d in data, returns the (quantitative) y-value
  • z (string = () => 1) given d in data, returns the (categorical) z-value
  • defined (string) for gaps in data
  • curve (string = d3.curveLinear) method of interpolation between points
  • marginTop (string = 30) top margin, in pixels
  • marginRight (string = 50) right margin, in pixels
  • marginBottom (string = 30) bottom margin, in pixels
  • marginLeft (string = 30) left margin, in pixels
  • width (string = 640) outer width, in pixels
  • height (string = 400) outer height, in pixels
  • xType (string = d3.scaleOrdinal) type of x-scale
  • xDomain (string = d3.groupSort(data, g => d3.median(g, ${x}), ${x})) the x domain [xmin, xmax]
  • xRange (string)
  • yType (string = d3.scaleLinear) type of y-scale
  • yDomain (string)
  • yRange (string = [height - marginBottom, marginTop])
  • zDomain (string) array of z-values
  • yFormat (string) a format specifier string for the labels
  • colors (string = d3.schemeCategory10) stroke color of line
  • strokeLinecap (string = 'round') stroke line cap of the line
  • strokeLinejoin (string = 'round') stroke line join of the line
  • strokeWidth (string = 2) stroke width of line, in pixels
  • strokeOpacity (string = 1) stroke opacity of line
  • halo (string = '#fff') color of label halo
  • haloWidth (string = 6) padding around the labels
๐Ÿ—€ ISC LicenseAdapted fromobservablehq.com/@d3/inline-labels

pie

  • name (string = ([x]) => x) given d in data, returns the (ordinal) label
  • value (string = ([, y]) => y) given d in data, returns the (quantitative) value
  • title (string) given d in data, returns the title text
  • width (string = 640) outer width, in pixels
  • height (string = 400) outer height, in pixels
  • innerRadius (string = 0) inner radius of pie, in pixels (non-zero for donut)
  • outerRadius (string = Math.min(width, height) / 2) outer radius of pie, in pixels
  • labelRadius (string = (innerRadius * 0.2 + outerRadius * 0.8)) center radius of labels
  • format (string = ',') a format specifier for values (in the label)
  • names (string) array of names (the domain of the color scale)
  • colors (string) array of colors for names
  • stroke (string = innerRadius > 0 ? 'none' : 'white') stroke separating widths
  • strokeWidth (string = 1) width of stroke separating wedges
  • strokeLinejoin (string = 'round') line join of stroke separating wedges
  • padAngle (string = stroke === 'none' ? 1 / outerRadius : 0) angular separation between wedges
๐Ÿ—€ ISC LicenseAdapted fromobservablehq.com/@d3/pie-chart

Operations

An operation takes input data and transforms it. Operations can also modify or filter the current selection of annotations.

calculate-duration-by

Calculates sum/average of annotations' duration (in milliseconds) by grouping on a property (value/tier/group/participant). If the property is not specified, input object will be directly used as groups.

  • mode (select = sum) (sum, average, min, max, all)
  • factor (int = 1) Convert milliseconds to other units
  • truncate (int = -1) Truncate factored value to X decimal places

Input

  • Array<Annotation>
  • Object<Array<Annotation>>

calculate-duration-of-pause

Calculates sum/average of annotations' pause duration (in milliseconds).

  • mode (string = sum) (sum, average, median, min, max, all)
  • factor (int = 1) Convert milliseconds to other units
  • truncate (int = -1) Truncate factored value to X decimal places
  • min-threshold (int = 0) Min pause in milliseconds between 2 consecutive annotations (pauses shorter than this will not be counted)
  • max-threshold (int = 500) Max pause in milliseconds between 2 consecutive annotations (pauses longer than this will not be counted)

Input

  • Array<Annotation>
  • Object<Array<Annotation>>

calculate-percent-by

Calculates percentage of an annotations' property value occurrence.

  • truncate (int = 0) Truncate factored value to X decimal places

Input

  • Object<String,Array<Annotation>>

clone

Clones each annotation in the selection, so they can be modified without affecting the originals.

Input

  • Array<Annotation>

Output

  • Array<Annotation>

combine-overlapping-annotations

Combine into one annotation all the overlapping annotations

Input

  • Array<Annotation>

Output

  • Array<Annotation>

combine-same-tier-consecutive-annotations

Combine into one annotation all the consecutive annotations of a same tier

  • value-separator (string = , ) The separator to use when combining annotations' values missing punctuation
  • sentence-separator (string) A separator to insert between sentences

Input

  • Array<Annotation>

Output

  • Array<Annotation>

count

Counts the elements in each array element of the input array/object, replacing the array element itself with an integer value

  • depth (int = 0) How deep to go in the input tree before counting
  • remove-uncountable (bool = false) Whether to remove the properties that are not countable
  • remove-uncountable-elements (bool = false) Whether to remove the uncountable elements from array, resulting in a shorter array
  • count-strings (bool = false) Whether to consider strings as arrays for counting

Input

  • Array<Array>
  • Object<Array>

Output

  • Array<Integer>
  • Object<Integer>

count-by

Counts by a specific property

    Input

    • Array<Annotation|Object>
    • Object<String,Array<Annotation|Object>>

    count-object-keys

    Count keys across all input objects

    Input

    • Array<Object>

    Output

    • Object

    detect-sequences

    Detects sequences by grouping annotations that are close to each other.

    • range (number = 30) Minimum number of seconds between two annotations to consider as a new sequence
    • to-tier (string = sequences) The created tier wich will contain the sequences annotations
    • output (choices = array) The output type, which can be an Array of sequences' annotations, or an Object mapping a sequence to its list of annotations.(array, object)

    Input

    • Array<Annotation>

    Output

    • Array<Annotation>
    • Object<String,Array<Annotation>>

    each

    Iterates over the array(s) values, and executes the provided function

      Input

      • Array
      • Object<String,Array>

      each-file

      Iterates over each file, and executes the provided function

      Input

      Output


      extend-duration

      Takes each input annotation and extends its duration by changing their start and/or stop times. This will by default clone all input annotations so the originals stay unaffected.

      • before (number) Seconds to add before start (default to arg)
      • after (number) Seconds to add after stop (default to arg)
      • clone (bool = true) Whether to clone the annotation before changing its duration

      Input

      • Array<Annotation>

      Output

      • Array<Annotation>

      filter

      Filters the array(s) keeping only items passing the filter expression

      Input

      • Array
      • Object<String,Array>

      filter-by-value

      Filters the array(s) keeping only annotations whose value matches the provided regexp

      Input

      • Array<Annotation>
      • Object<String,Array<Annotation>>

      flatten

      Transforms nested objects into a flat array of objects

      Input

      • Object

      Output

      • Array<Object>

      group-by

      Groups the array elements by the value of a specific element's property. The result is an object whose keys are the property values, mapped to arrays of elements.

        Input

        • Array
        • Object<Array>

        Output

        • Object<Array>
        • Object<Object<Array>>

        group-by-file-tags

        Groups the annotations by specific file tags (comma separated or JSON array of strings). Corpus files can have multiple tags, therefore an annotation could appear in multiple groups. The result is an object whose keys are the chosen tags, mapped to arrays of annotations.

        Input

        • Array<Annotation>

        Output

        • Object<String,Array>

        group-by-ref-value

        improve-transcript ๐Ÿงช

        Improves transcribed content with various heuristic techniques.

        • hallucination-char-factor (int = 1) todo
        • hallucination-time-diff (int = 1000) Maximum pause time between 2 hallucinated annotations (in milliseconds)
        • merge-comma-end (bool = true) Merge 2 annotations if the first one ends with a comma
        • merge-lowercase-start (bool = true) Merge 2 annotations if the second one starts with a lower-case character

        Input

        • Array<Annotation>

        Output

        • Array<Annotation>

        load-annotations-from-forms

        Loads JSON files resulting from forms, as virtual annotations. The JSON files must be in the folder of the processed XML file.

        • dimension (string) The dimension to extract from the form
        • exclude-from-intercoding (bool = false) Whether to tag the extracted annotations so they are excluded from intercoding calculations
        • extract-original-annotations (bool = false) Whether to extract only the original annotations from the form, so the results are safe from corpus changes. When this attribute is specified, no dimension will be extracted: use again the operation to extract a dimension.
        • match-to-original-annotations (bool = false) Try to correct loaded annotations so they all have a match with the original ones

        Input

        • None

        Output

        • Array<Annotation>

        load-data-from-json

        Loads a JSON file or set data directly from embedded JSON. The JSON file must be in the folder of the processed XML file.

        • json (js) json data
        • file (string) file name

        Input

        • None

        Output

        • Object
        • Array

        load-data-from-script

        Runs a JS function which result will be set as current selection/data Global variables can also be used in scripts via variables.varname

        • script (js) a function called with current selection/data, which should return data

        load-data-from-xls

        Loads a XLS or CSV file.

        • file (file) file name

        Input

        • None

        Output

        • Object
        • Array

        map

        Creates a new array populated with the results of calling the provided function on each element of the input array(s). Special pseudo-object syntax can be used to facilitate direct mapping of object properties: annotation => ({value: annotation.value, tier: annotation.tier.id}) {value: .value, tier: .tier.id} {value, tier: .tier.id}

        Input

        • Array
        • Object<String,Array>

        mongo-clear

        Clears a MongoDB Collection (drops the collection)

        mongo-find

        Loads objects from a MongoDB Collection

        • query (js) The mongo query, a plain JS object like {property:"value"}
        • projection (js) The query projection, a plain array like ['prop1','prop2'] listing the properties to return
        • limit (int = 1000) The maximum number of documents to return

        Output

        • Array<Object>

        mongo-insert

        Inserts input objects into a MongoDB Collection

        Input

        • Array<Object>
        • Object

        mongo-remove

        Removes objects from a MongoDB Collection

        Input

        • Array<Object>
        • Object

        randomize

        Randomizes the input selection with a PRNG allowing reproducible results based on an initial seed.

        • limit (int = 0) Maximum number of elements for the output array
        • limit-per-file (0) Maximum number of annotations to take from one particular annotation file (if the input array contains annotations)
        • seed (int = 1) The initial seed for reproducible randomness
        • prng (string = LCG) The algorithm for random number generation

        Input

        • Array

        Output

        • Array

        reduce

        Executes a "reducer" function on each element of the input array(s), in order, passing in the return value from the calculation on the preceding element. The final result of running the reducer across all elements of the array is a single value. (accumulator, currentValue) => accumulator + currentValue, initialValue

        Input

        • Array
        • Object<String,Array>

        replace-with-annotations-in-sequence-from-tier

        Considers each input annotation as a sequence, and selects those from another tier (of the same file) that are included in the sequences.

        • overlap (bool = true) Whether to include overlaping annotations (not fully contained in the sequence)
        • range (number = 0) Number of seconds to add before and after the sequence, for considering annotations in sequence
        • range-before (number = 0) Additional seconds to add before
        • range-after (number = 0) Additional seconds to add after
        • distinct (bool = false) Whether to remove duplicate annotations from the resulting list
        • combine (bool = false) Whether to combine in one final annotation, all the annotations found in the sequence
        • limit (int = 0) Max number of annotations to return (0 = all annotations found in sequence, 1 = first annotation found)
        • reverse (bool = false) Whether to reverse the list of annotations found in sequence before applying the limit. This can be used to return the last annotation found in sequence (via limit=1).
        • separator (string = | ) A separator to insert when combining values of multiple annotations
        • default-to-null (bool = false) Whether to add a null element in the output array if no corresponding annotation is found for a given input annotation (default to false, or true if the operation runs in an EXTRA block)

        Input

        • Array<Annotation>

        Output

        • Array<Annotation>

        replace-with-next-timecode-annotations-from-tier

        Replaces each input annotation with one from another tier (of the same annotations' file), the first found whose start time is after input annotation's start time.

        • range (int = 0) Maximum time range to find next annotation (0 = no maximum)

        Input

        • Array<Annotation>

        Output

        • Array<Annotation>

        replace-with-same-timecode-annotations-from-tier

        Replaces each input annotation with one from another tier that has the same start time.

        • multiple (bool = false) Whether to select multiple annotations if more than one is in range
        • range (number = 0) Acceptable range in seconds to consider 2 timecodes as equivalent
        • default-to-null (bool) Whether to add a null element in the output array if no corresponding annotation is found for a given input annotation (default to false, or true if the operation runs in an EXTRA block)

        Input

        • Array<Annotation>

        Output

        • Array<Annotation>

        sanitize-strip-xml-tags

        Remove XML tags from annotations' value

        Input

        • Array<Annotation>

        Output

        • Array<Annotation>

        save-data-to-csv

        Saves current data to a CSV file. The file will be saved in the processed XML file's folder, overwriting any existing file.

        Input

        • Object
        • Array

        save-data-to-json

        Saves current data to a JSON file. The file will be saved in the processed XML file's folder, overwriting any existing file.

        Input

        • Object
        • Array

        scrape

        Loads an URL (or a file) and extract data

        • url (string) URL to fetch for scraping
        • file (file) a file path to load content instead of an URL
        • jsoup (js) a function to extract data called with a Jsoup document as argument (to scrape HTML)
        • js (js) a function to extract data called with the content as a string argument (to scrape JSON, text files...)

        Input

        Output


        selection-to-data

        Transforms input annotations to an array of objects

        • export-value (bool = true)
        • export-value-as (string = value)
        • export-tier (bool = true)
        • export-tier-as (string = tier)
        • export-start (bool = true)
        • export-start-as (string = start)
        • export-stop (bool = true)
        • export-stop-as (string = stop)

        Input

        • Array<Annotation>

        Output

        • Array<Object>

        set-global-variable

        Sets a global variable which becomes available in HTML blocks as {{varname}}. It can then be used in scripts with variables.varname, and in attributes via the ${} syntax, like ${0.05*variables.counter} If the "value" attribute is not defined, value saved will be the current selection/data

        • value (js) a function called with current selection/data, which should return the value of the variable

        set-tag

        Adds or removes annotation's tag

        Input

        • Array<Annotation>

        Output

        • Array<Annotation>

        set-tier

        Changes the tier of each input annotation. By default will clone annotations so it does not affect original ones.

        • clone (bool = true)

        Input

        • Array<Annotation>

        Output

        • Array<Annotation>

        sort

        Sorts input array by a given field

        • natural (bool = false) Whether to sort based only on the digits contained in the field, and not string comparison
        • func (js) A custom comparison function like (a,b)=>(a>b?-1:1)

        Input

        • Array

        Output

        • Array

        sort-by-file

        Sorts annotations first by their file, and then by the provided field or compare function. When providing a field, first character must indicate sorting order with +/- (ascending/descending)

        • order (string) (+) (ascending/descending) Sorting order of the AF groups(+/-)
        • field (string) A field to sort on, like "+start"

        Input

        • Array<Annotation>

        Output

        • Array<Annotation>

        sort-by-group

        Sorts annotations by their group, and then inside each group by their relative start time (using eventual file time-offset) If no field is provided, sorting will be done around //TODO

        • order (string) (+) Sorting order of groups (ascending/descending)(+, -)
        • natural (bool = false) Natural integer sorting instead of string
        • field (string) A field to sort on, like "+start"

        Input

        • Array<Annotation>

        Output

        • Array<Annotation>

        sum-by

        Calculate sum of the annotations' property value. (not working yet)

        Input

        • Object<String,Array<Annotation>>

        swap-nested-objects

        Transforms an object of structure {A:{a:1,b:2}} into structure {a:{A:1},b:{A:2}}

        Input

        • Object<Object>

        type-token

        Computes the type-token of input annotations.

        • group-by-attribute (string) An optional attribute name to be used for grouping type-tokens together
        • strip-punctuation (bool = true) Whether to remove all punctuation before type-token processing
        • case-sensitive (bool = false) Whether to consider capital letters in words comparison
        • replace-regex (string) A regular expression to replace text before type-token processing
        • replace-func (string) A js function to replace text before type-token processing
        • split-func (string) A js function to use instead of the basic space splitting, for extracting words from strings

        Input

        • Array<Annotation>
        • Object<Array<Annotation>>
        • Object<Array<String>>
        • Object<String>

        Output

        • Object

        Processors

        Use processors to analyze or convert raw data such as audio or video, and to manipulate the corpus.

        audio-anonymizer

        Audio Anonymizer modifies media files by applying audio filters on each input annotation segment. Available modes are - silence: replaces each segment with complete silence (default) - noise: replaces each segment with a configurable noise - beep: replaces each segment with a configurable beep - voice: replaces each segment with a synthetized voice - file: replaces each segment with a custom audio file

        • mode (string = silence) The mode of anonymization, an audio transformation that will be applied to each annotation segment.(silence, noise, beep, voice, file)
        • mute (bool = true) Whether to silence the segment anyway before mixing the noise/beep/voice/file
        • beep-frequency (int = 800) The frequency to use for sine beep sound
        • noise-amplitude (number = 1) Amplitude of the generated noise(0 1)
        • noise-color (string = white) Noise color(white, pink, brown, blue, violet, velvet)
        • noise-seed (int) Seed value for noise PRNG
        • noise-weight (number = 1) Mixing weight of the noise sound
        • voice-text (string) Text to synthetize and use for anonymization. If not provided, the annotation's value will be used.
        • voice-amplitude (int = 100) How loud the voice will be.
        • voice-pitch (int = 50) The voice pitch
        • voice-speed (int = 175) The speed at which to talk (words per minute)
        • voice-wordgap (int = 0) Additional gap between words in 10 ms units
        • file (string) Path to an audio file to use for anonymization, when mode is set to "file"
        • SET annotation-value="X" file="X.mp3" Use a custom audio file for annotations with a specific value
        • SET annotation-file="X" output-file="Y.mp4" Map custom output filename for a given annotation file

        Input

        • Array<Annotation>
        ๐Ÿ—€ GPL-3.0 LicenseSpeech Synthesizergithub.com/kripken/speak.js
        ๐Ÿ—€ GNU GPL LicenseOriginal eSpeak library ported by speak.js

        demucs-separation

        Demucs can separate voice and instruments from an audio track

        • source (string = vocals) The source type to separate from the rest(vocals, drums)
        โญณ MIT LicenseHybrid Spectrogram and Waveform Source Separationgithub.com/facebookresearch/demucs

        export-corpus-media

        Exports current corpus media files to a folder. The folder will be created in AVAA's temp directory.

        • folder (string = exported-corpus-media) Name of the folder to create and populate with the corpus media files
        • overwrite (bool = false) Whether to overwrite existing files in the folder, otherwise another folder will be created
        • copy-temp (bool = false) Whether to copy the files even if they come from the temp folder, otherwise temporary files are just moved to destination folder. If you use processors after export-corpus-media, you should activate this attribute so AVAA still finds the temporary files.

        export-corpus-standalone ๐Ÿงช

        Exports current corpus media files together with a copy of the ORIGINAL corpus files edited to reference the exported media files. This produces a standalone corpus folder which can be easily shared because it does not contains absolute paths anymore.

        • folder (string = exported-corpus) Name of the folder to create and populate with the corpus files. If the name contains a slash (/) it will be considered as an absolute folder path, otherwise the folder is created in AVAA's temp directory.
        • overwrite (bool = false) Whether to overwrite existing files in the folder, otherwise another folder will be created
        • copy-temp (bool = false) Whether to copy the files even if they come from the temp folder, otherwise temporary files are just moved to destination folder. If you use processors after export-corpus-standalone, you should activate this attribute so AVAA still finds the temporary files.

        export-to-eaf ๐Ÿงช

        Exports current corpus to EAF format.

        • copy-media (bool = false) Whether to copy the associated media files next to EAF file, to use as relative path (good for sharing). Otherwise absolute path of media will be used (good for testing)
        • export-empty-tiers (bool = false) Whether to also exports the tiers that don't have any annotation

        export-to-srt ๐Ÿงช

        Exports a selection of annotations to SRT format

        • copy-media (bool = false) Whether to copy the associated media files next to SRT file

        Input

        • Array<Annotation>

        ffmpeg-cut

        Cuts a segment from each corpus media file. This processor also accepts an array of annotations to cut multiple segments. In this case, the corpus will be reduced to relevant annotation files and each media file will be replaced by its cuts, or by a one merged file from all cuts when "concat" attribute is set to true.

        • start (string) Starting point in seconds to start cutting from
        • duration (string) Duration of cut segment
        • concat (bool = false) Whether to concat all segments in case multiple are cut

        Input

        • Array<Annotation>

        ffmpeg-denoise ๐Ÿงช

        This processor calls ffmpeg's denoise feature. - FFT: Denoises audio with FFT. - NLM: Reduces broadband noise using a Non-Local Means algorithm. - RNN: Reduces noise from speech using Recurrent Neural Networks model. Learn more about the RNN models at https://github.com/GregorR/rnnoise-models

        • method (string) The denoise method to use(FFT, RNN, NLM)
        • rnn-model (string = beguiling-drafter) (beguiling-drafter, conjoined-burgers, leavened-quisling, marathon-prescription,...)
        • rnn-mix (number = 1) How much to mix filtered samples into final output. Allowed range is from -1 to 1. Default value is 1. Negative values are special, they set how much to keep filtered noise in the final filter output
        • rnn-threads (int = 1) Number of threads (1 for stereo)
        • nlm-strength (number = 0.00001) Set denoising strength. Allowed range is from 0.00001 to 10000(0.00001, 10000)
        • nlm-patch (number = 0.002) Set patch radius duration. Allowed range is from 0.001 to 0.1(0.001, 0.1)
        • nlm-research (number = 0.006) Set research radius duration. Allowed range is from 2 to 300 milliseconds(0.002, 0.3)
        • nlm-smooth (number = 11) Set smooth factor. Allowed range is from 1 to 1000(1, 1000)
        • nlm-output (select) [output denoised,input unchanged,noise only] Set the output mode(o)
        ๐Ÿ—€ No LicenseNoise Removal Neural Network Modelsgithub.com/GregorR/rnnoise-models

        ffmpeg-filter-audio

        This processor calls ffmpeg with a user defined audio filter Learn more about what filters can do at https://www.ffmpeg.org/ffmpeg-filters.html

        • filter-audio (string) The filter expression
        • sample-rate (string) If specified, audio output will also be resampled

        ffmpeg-filter-complex

        This processor calls ffmpeg with a user defined filter-complex. Learn more about what filters can do at https://www.ffmpeg.org/ffmpeg-filters.html

        • filter-complex (string) The filter expression

        ffmpeg-frei0r ๐Ÿงช

        Applies a frei0r filter on each corpus media file.

        • filter (string) The frei0r filter to use(3dflippo, addition, addition_alpha, aech0r, alpha0ps_alpha0ps,...)
        • param-1 (string) A parameter to pass to the filter
        • param-2 (string) A parameter to pass to the filter
        • param-3 (string) A parameter to pass to the filter
        ๐Ÿ—€ GPL-2.0 LicenseVideo filters bygithub.com/dyne/frei0r
        ๐Ÿ—€ CC BY-NC-ND 4.0 LicenseFrei0r DLL pack for Windows by Gyan Doshiwww.gyan.dev/ffmpeg

        hardsub

        Hardcodes annotations as subtitles on top of video. This processor will automatically use the values of the annotations that generated the clips, whenever they are available. It is possible to use different annotations from other tiers in range, by adding "source-tier" parameters.

        • include-tier-names (bool = false) Whether to include the tier names in the subtitles
        • include-tier-separator (string = : ) A separator to add between tier name and subtitle text
        • extend-duration-before (int = 0) Number of milliseconds to display subtitle before its original start time, so it is shown earlier
        • extend-duration-after (int = 0) Number of milliseconds added after the original end time, so it stays visible longer
        • style-color (color = #FFFFFF) Subtitles color
        • style-opacity (string = 100%) Subtitles opacity, in %
        • style-outline-color (color = #000000) Subtitles text outline color
        • style-outline-opacity (string = 100%) Opacity of text outline, in %
        • style-outline-width (string) Width of the text outline, in pixels
        • style-size (int) Subtitles size
        • style-bold (bool = false) Subtitles weight
        • style-font (string) Subtitles font name
        • SET source-tier="" name="" Add a tier from which to take subtitles text, and optionally customize its name

        media-converter ๐Ÿงช

        Use Media Converter to convert video and audio files into other formats

        • video-resolution (string) Resolution of converted video (like 1280x720)
        • audio-volume (number = 1) Audio volume(0 1)

        Input

        • Array<Annotation>

        r-script ๐Ÿงช

        This processor executes a R program and integrates the resulting data into the final HTML page. Resulting R output can be graphic files (jpg, png, gif, svg) or tabular text data. Arguments provided to R are in order: - temp directory path to work with and create result files - path to a JSON file consisting of the selection (annotations) or data provided to the processor R scripts must follow a specific input/output syntax to be compatible with AVAA (see "Calling R" in the scripting guide).

        • file (file) .R file to run
        • source (js) Plain R source code to run

        reduce-corpus

        Reduces the corpus to specific files. Useful to work on a subset of the corpus without modifying the corpus itself. This processor also accepts a selection of annotations, in which case only corpus files of these annotations will be kept.

        • group (regexp) Which group of corpus files to keep (regular expression)
        • file (regexp) Which file from the corpus to keep (regular expression)
        • tag (regexp) Files from the corpus to keep that have a tag satisfying this regular expression
        • filter (js) A custom filter function called for each file that should return true to keep the file, like (f)=>f.filename.includes('.eaf')

        Input

        • Array<Annotation>

        reduce-corpus-media

        Filters media files from the corpus, keeping only files that match a specific critera. The "exclude" attribute can be used to alternatively exclude these files from the corpus. Useful to work on a subset of the corpus media files without modifying the corpus itself.

        • exclude (bool = false) Whether to exclude the filtered files instead of keeping them
        • file (regexp) Which media file from the corpus to keep (regular expression)
        • filter (js) A custom filter function called for each file that should return true to keep the file, like (mf)=>mf.extension.includes('mp4')

        remove-sequences-from-corpus

        Takes a selection of annotations and modifies the corpus, using the input annotations as sequences, each sequence being removed from the corpus file, with its associated media segment and all the annotations included in that sequence.

        Input

        • Array<Annotation>

        rename-tiers

        Rename tiers in all or specific corpus files.

        • file (regexp) Restrict file(s) on which to rename tiers
        • from (string) The tier name to be replaced
        • to (string) The replacement name to use
        • map (js) A JSON object mapping the names to replace (keys) to the new names (values), like {"OLD NAME":"NEW NAME"}

        reset-corpus

        Reset the pipeline corpus to its original state Useful when working with loops.

        sequences-to-corpus

        Takes a selection of annotations and recreates the corpus, using the input annotations as sequences, each sequence being transformed into one corpus file with its associated media file and all the annotations/tiers included in that sequence.

        • name-template (js = `sequence-${i}-${a.af.id}`) The template function to build the name of each corpus file, provided with the annotation and its index, like (a,i) => `${i} - ${a.value}`

        Input

        • Array<Annotation>

        speaker-diarization-pyannote

        Speaker diarization is the process of marking segments of voice with their speaker. This processor takes a selection of annotations, and add to the corpus new annotations associated with their speaker tier.

        • hf-token (string) The HuggingFace access token, required to download pyannote models
        • speakers (int = 0) Number of speakers in the audio

        Input

        • Array<Annotation>

        Output

        • Array<Annotation>
        โญณ MIT LicenseSpeaker diarizationgithub.com/pyannote/pyannote-audio

        speech-to-text-faster-whisper

        A speech to text processor using SYSTRAN Faster Whisper to transcribe and automatically create annotations

        • language (string) The language to transcribe, if not specified autodetect will be attempted
        • model (select = tiny) The trained model to use for transcription(tiny, small, medium, large, distil-large-v3)
        • temperature (number = 0) Temperature, adjust to fix hallucinations
        • device (select = auto) The processing device to use(auto, cpu, cuda)
        • precision (select = int8) (auto, int8, fp16, fp32)
        • beam-size (int = 5) The decoding beam size
        • batched (bool = false) Whether to use batch processing (faster)
        • word-timestamps (false) Whether to output word-level timestamps
        • output-tier (string = stt-faster-whisper) Tier id for the extracted annotations
        • vad-threshold (number = 0.5) Speech threshold. Silero VAD outputs speech probabilities for each audio chunk, probabilities ABOVE this value are considered as SPEECH. It is better to tune this parameter for each dataset separately, but "lazy" 0.5 is pretty good for most datasets.
        • vad-min-speech-duration (int = 250) Final speech chunks shorter than this are thrown out (in milliseconds)
        • vad-max-speech-duration (number) Maximum duration of speech chunks in seconds. Chunks longer than max_speech_duration_s will be split at the timestamp of the last silence that lasts more than 100ms (if any), to prevent aggressive cutting. Otherwise, they will be split aggressively just before max-speech-duration.
        • vad-min-silence-duration (int = 2000) In the end of each speech chunk wait for min-silence-duration before separating it (in milliseconds).
        • vad-speech-pad-ms (int = 400) Final speech chunks are padded by vad-speech-pad milliseconds each side
        • verbose (bool = false) Whether to log the transcriptions as soon as they are detected

        Output

        • Array<Annotation>
        โญณ MIT LicenseReimplementation of OpenAI's Whisper model using CTranslate2github.com/SYSTRAN/faster-whisper

        speech-to-text-whisper

        A speech to text processor using OpenAI Whisper to transcribe and automatically create annotations

        • language (string) The language to transcribe, if not specified autodetect will be attempted
        • model (string = small) The trained model to use for transcription(tiny, small, medium, large-v3)
        • temperature (number = 0) Temperature, adjust to fix hallucinations
        • output-tier (string = stt-whisper) Tier id for the extracted annotations
        • verbose (bool = false) Whether to log the transcriptions as soon as they are detected

        Output

        • Array<Annotation>
        โญณ MIT LicenseRobust Speech Recognition via Large-Scale Weak Supervisiongithub.com/openai/whisper

        speech-to-text-whisper-at ๐Ÿงช

        A variation of OpenAI Whisper designed to extract audio events of the 527-class AudioSet, Whisper-AT processor outputs general audio events as annotations.

        • language (string) The language to transcribe will also affect the names of the audio events
        • model (string = tiny) The trained model to use for transcription(tiny, small, medium, large-v3)

        Output

        • Array<Annotation>
        โญณ BSD-2 LicenseNoise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggersgithub.com/YuanGongND/whisper-at

        vad-silero

        Silero's Voice Activity Detector processor creates annotations for each segment of input audio containing voice.

        • output-tier (string = vad-silero) Tier id for the generated annotations
        • threshold (number = 0.5) Use a higher threshold for noisy audio
        • sampling-rate (number = 16000) (8000, 16000, 32000, 48000)
        • min-silence-duration (int = 500) Number of milliseconds
        • min-speech-duration (int = 1000) Minimum duration (in milliseconds) of activity to consider a voice segment

        Output

        • Array<Annotation>
        โญณ MIT LicensePre-trained enterprise-grade Voice Activity Detectorgithub.com/snakers4/silero-vad
        ๐Ÿ—€ MIT LicenseSilero JIT and ONNX files

        video-anonymizer ๐Ÿงช

        Anonymize videos with these special effects: - deface: automatically detects and blur faces - cartoon: cartoonize the video - cartoon-blur: cartoonize and blur the video

        • mode (select) The type of anonymisation to apply on the video(deface, cartoon, cartoon-blur, retro-glow)
        • deface-mode (select = blur) Anonymization filter mode for face regions(blur, solid, mosaic)
        • deface-threshold (number = 0.2) Detection threshold (tune this to trade off between false positive and false negative rate)
        • deface-mask-scale (number = 1.3) Scale factor for face masks, to make sure that masks cover the complete face
        • deface-mosaicsize (int = 20) Width of the mosaic squares when deface-mode is mosaic
        • deface-boxes (bool = false) Use boxes instead of ellipse masks
        • deface-draw-scores (bool = false) Draw detection scores onto outputs, useful to find the best threshold
        • deface-downscale (string) Downscale resolution for the network inference (WxH)
        โญณ MIT LicenseVideo anonymization by face detectiongithub.com/ORB-HD/deface
        ๐Ÿ—€ GPL-2.0 LicenseVideo filters bygithub.com/dyne/frei0r
        ๐Ÿ—€ CC BY-NC-ND 4.0 LicenseFrei0r DLL pack for Windows by Gyan Doshiwww.gyan.dev/ffmpeg

        Processor Pipelines

        AVAA Toolkit features an advanced pipeline system easing automation of complex tasks.

        A pipeline is created for each section of the document, and initially contains a virtual copy of the corpus and its associated media files.
        The corpus and its media files are then modified sequentially by each processor inside the pipeline.

        Pipeline Input Modes

        The pipeline can be fed different initial media files, by defining the processor-pipeline-input setting.

        1. corpus: this is the default mode if the processor is placed at the beginning of a section, and will feed the pipeline with the corpus media files
        2. section-assets: this is the default mode if the processor is placed after a view which exported clips, and will feed the pipeline with all the exported clips/snapshots (of the section) until this processor was reached
        3. all-assets: this mode must be manually selected, and will feed the pipeline with all the exported clips/snapshots of the document until this processor was reached

        The corpus mode is useful to process corpus files directly (audio-anonymization, formats conversion...), while for instance all-assets mode could be used to apply effects only on the exported media of the document intended for sharing with peers.

        Pipeline Chain

        Processors inside a pipeline (that is for now, a section of the document) are executed one after another, each processor using the results of the previous one to work on.

        Complex chains of processors can be built to automate heavy tasks alleviating the burden of manually running each step and verifying its consistency.

        Pipeline and Views

        Views placed after a processor (in the same section) will inherit its modified media files when exporting clips and snapshots.
        This can be helpful extracting annotations from cuts of raw media files, to avoid processing long corpus media file when testing samples ; or preprocessing a media file before it is exported into clips during later views generation.

        Processors generating annotations will make these annotations immediately available in the main corpus (and not only for the current pipeline), hence for all subsequent views and processors in the document.

        Settings

        Settings can be modified at any time via the Local Settings block.

        Styling

        It is possible to change the style via CSS. The HTML code generated makes it easy to target specific elements or apply styling rules for the whole page. Each view has its own structure of elements, and a simple "Inspect Element" from browser will reveal selectors.

        Embedded CSS

        Styles can be defined directly in the XML file, by using a STYLE tag.

        These styles will only apply to this specific HTML document.

        <STYLE> .view-timeline td { border-color:red; } .view-timeline tr.tier-header { text-align:right; } </STYLE>

        CSS File

        Styles can be defined in a separate CSS file, that must be placed in the include folder.

        All the generated HTML documents will load this file and have these styles in common.

        e.g. my-styles.css h2 { color:green; } section { border-left: 2px solid gray; }

        Styling Views

        Views generate simple HTML code and try to follow common guidelines so that applying styles is straightforward

        Annotations' text labels always have the annotation class, so for instance to change the color of all annotations:

        .view .annotation { color:red; }

        Making PDF

        AVAA Toolkit can also generate PDF, though interactive features like videos or dynamic charts won't work in this format, for obvious reasons.

        Chrome (or Chromium) must be installed on the system (alternatively on Windows AVAA Toolkit will try to use Edge).

        Chrome/Edge executable should be detected automatically, if that fails it is required to provide its path in avaa-config.xml

        If Chrome/Edge is not available, it is recommended to install Chrome Headless Shell and then provide its path in avaa-config.xml

        Command Line

        AVAA Toolkit is made for the command line and can integrate seamlessly in any tool chain.

        Usage: 
        [options] XML files or folders to process Options: --lang Language of the generated document, if translations are available Default: --watch Watch for xml changes and regenerate documents Default: false --combine Combine documents into one final html file Default: false --pdf Also convert HTML to PDF Default: false --zip-all Zip all generated documents together Default: false --zip-each Zip each generated documents separately Default: false --deployer-user Deployer user name Default: --deployer-pass Deployer password Default: --deploy Upload zip to deployer Default: false --deployer-url Specify a custom deployer URL to upload zip to Default: --debug Debug mode Default: false --verbose Display more information when converting Default: true --path Path of application for includes. Default to working directory --path-temp Path for temporary files. Default to ./temp/ --test Run a XML document as a test suite Default: --gendoc Generate all documentation and exit Default: false --dev Reload scripts before building Default: true --cache-af Cache annotations file in memory for faster exec Default: true --server-allowed-origin A custom origin URL allowed to connect to the server Default: --server Websocket server for editor and interactive sessions Default: false --server-port Websocket server port Default: 42042 --server-ssl Use SSL certificate (for Server Mode) Default: false --mongo-host Address of the mongodb server Default: 127.0.0.1 --mongo-port Port of the mongodb server Default: 27017 --mongo-db Name of database to work with Default: avaa --download-remote-corpus Whether to automatically download a referenced remote corpus Default: false --conf Custom config file to load Default: avaa-config.xml

        Troubleshooting

        Installation and first run

        Error: A JNI error has occurred, please check your installation and try again
        Exception in thread "main" java.lang.UnsupportedClassVersionError:
            org/avaatoolkit/Main has been compiled by a more recent version of the Java Runtime (class file version 55.0),
                this version of the Java Runtime only recognizes class file versions up to X

        Solution: Your version of java runtime is outdated, follow these steps


        java.net.BindException: Couldn't bind to any port in the range `42042:42042`.
            at org.glassfish.grizzly.AbstractBindingHandler.bind(AbstractBindingHandler.java)
            at org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java)
            at org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:)
            at org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java)
            at org.avaatoolkit.server.Daemon.start(Daemon.java)
            at org.avaatoolkit.Main.main(Main.java)

        Solution: The toolkit is already started with the --server argument, close it before running a new instance.
        Solution: Your firewall has a strict policy regarding localhost port bindings, add a rule to allow localhost:42042

        Custom Java Runtime

        On some operating systems, the installed java runtime might not be up-to-date and prevent AVAA Toolkit from executing properly.
        To run AVAA Toolkit, at least java 11 is required. To install a valid runtime:

        Alternatively using the OpenJDK archives:

        • Go to Open JDK and download the archive for your system
        • Extract the archive into AVAA Toolkit installation's folder
        • Rename the extracted jdk-22.x.x folder to jdk
        • The directory path should be avaa-toolkit / jdk / bin /
        • The launcher should now use the provided runtime in the jdk folder automatically

        Advanced Video Processing

        Some processors require a full FFmpeg version to work.

        FFmpeg Clipping Caveats

        Short clips or sync issues

        When generating really short clips (under 1 sec), it is possible that the clips will consist of only one frozen image.

        This is because by default FFmpeg will be instructed to do a copy of the video stream (vcopy), which saves considerable processing time, at the expense of less accurate clipping.

        When perfect accuracy is required for clips, it is recommended to force FFmpeg re-encoding, for instance by defining the Setting video-codec = h264

        Licenses

        About licensed material

        This documentation includes attributions to licensed material such as libraries and software modules.
        These notices are written explicitly in each relevant component and for convenience listed again below.

        Components License Notices

        Some modules are not included in AVAA Toolkit but rather installed on demand whenever a component requires it.
        Other modules are included or integrated in AVAA Toolkit to provide a better overall user experience.
        This behavior is indicated by a little icon preceding the license, as well as a tooltip describing its inclusion method.

        Redistributed Material

        Additional libraries are packaged with the produced HTML document, and therefore redistributed by the end user.

        jQuery

        jQuery simplifies DOM manipulation, some components use it to initialize content in the browser.

        ๐Ÿ—€ MIT License jQuery jquery.com/license

        D3 - Data Visualization Library

        D3 has unparalleled flexibility in building custom and dynamic visualizations.
        Charts generated by AVAA Toolkit are actually rendered right in the browser with D3.

        ๐Ÿ—€ ISC License D3 d3js.org

        GSAP - GreenSock Animation Platform

        GSAP is incredible and we deemed its inclusion valuable for providing a robust interactivity and animation framework for future AVAA Toolkit components.

        ๐Ÿ—€ Custom GreenSock License GSAP gsap.com/licensing

        Tipped - Tooltip solution based on jQuery

        Tipped features easy to use and customizable tooltips.
        AVAA Toolkit views sometimes use these tooltips for instance to show snapshots or videos in a small popup when an annotation is clicked or hovered.

        ๐Ÿ—€ CC BY 4.0 License Tipped github.com/staaky/tipped

        FileSaver.js - Save files generated in the browser

        FileSaver.js provides a simple interface to save (as a "download") files created directly in the browser.
        We believe AVAA Toolkit components can benefit from the presence of the FileSaver library.

        ๐Ÿ—€ MIT License FileSaver.js github.com/eligrey/FileSaver.js

        AVAA Toolkit Core Libraries

        AVAA Toolkit itself is built with Java, and makes use of various libraries (via Maven) which are compiled into the final JAR executable distributed to the toolkit users.

        NewPipe Extractor

        A library for extracting things from streaming sites, AVAA Toolkit includes this library to provide an easy API for downloading PeerTube videos.

        ๐Ÿ—€ GPL-3.0 License NewPipe Extractor github.com/TeamNewPipe/NewPipeExtractor

        Bramp FFmpeg

        A FFmpeg CLI Wrapper for Java, used to execute FFmpeg and read progress feedback.

        ๐Ÿ—€ BSD-2-Clause License Bramp FFmpeg github.com/bramp/ffmpeg-cli-wrapper

        Mozilla Rhino

        Rhino is the JavaScript engine used to execute all components' scripts.

        ๐Ÿ—€ Mozilla Public License 2.0 Rhino github.com/mozilla/rhino

        JCommander

        The best library for parsing command-line arguments.

        ๐Ÿ—€ Apache-2.0 License JCommander github.com/cbeust/jcommander

        Grizzly

        This library is used to spawn server sockets, and brings WebSocket sessions (that's how the Editor can interact with AVAA Toolkit).

        ๐Ÿ—€ Eclipse Public License 2.0 Grizzly github.com/eclipse-ee4j/grizzly

        Jsoup

        Jsoup simplifies HTML/XML parsing via a CSS selectors syntax.

        ๐Ÿ—€ MIT License Jsoup jsoup.org

        Jspecify

        An artifact of fully-specified annotations to power static-analysis checks, beginning with nullness analysis.

        ๐Ÿ—€ Apache-2.0 License Jspecify jspecify.org

        Jchardet

        Jchardet is a Java port of the source from mozilla's automatic charset detection algorithm.

        ๐Ÿ—€ Mozilla Public License 1.1 Jchardet jchardet.sourceforge.net

        OSHI

        A JNA-based (native) Operating System and Hardware Information library, to get processes details and CPU usage.

        ๐Ÿ—€ MIT License OSHI github.com/oshi/oshi

        Apache Commons

        Apache Commons is a set of commonly needed features implemented as reusable Java components.

        ๐Ÿ—€ Apache-2.0 License Apache Commons commons.apache.org

        SLF4J

        A simple facade abstraction for various logging frameworks.

        ๐Ÿ—€ MIT License SLF4J slf4j.org

        Logback

        A reliable, generic, fast and flexible logging framework.

        ๐Ÿ—€ LGPL 2.1 License Logback logback.qos.ch

        Lombok

        Automate Java boilerplate code via annotations.

        ๐Ÿ—€ LGPL 2.1 License Lombok projectlombok.org

        OkHttp

        OkHttp is an efficient HTTP client.

        ๐Ÿ—€ Apache-2.0 License OkHttp square.github.io/okhttp

        MongoDB Driver

        The MongoDB Synchronous Driver provides an easy API for interacting with a MongoDB Server.

        ๐Ÿ—€ Apache-2.0 License MongoDB Driver mongodb.com/docs/drivers/java/sync/current

        What about AVAA Toolkit's License?

        We are currently working on that.