Processors

1 🕮

audio-anonymizer

Audio Anonymizer modifies corpus media files by applying audio filters on each input annotation segment. Available modes are

silence: replaces each segment with complete silence (default)
noise: replaces each segment with a configurable noise
beep: replaces each segment with a configurable beep
voice: replaces each segment with a synthetized voice
file: replaces each segment with a custom audio file

(⚠ for now works only on corpus files, not exported clips)

mode string = silence The mode of anonymization, an audio transformation that will be applied to each annotation segment.(silence, noise, beep, voice, file)
mute bool = true Whether to silence the segment anyway before mixing the noise/beep/voice/file
beep-frequency int = 800 The frequency to use for sine beep sound
noise-amplitude number = 1 Amplitude of the generated noise(0 1)
noise-color string = white Noise color(white, pink, brown, blue, violet, velvet)
noise-seed int Seed value for noise PRNG
noise-weight number = 1 Mixing weight of the noise sound
voice-text string Text to synthetize and use for anonymization. If not provided, the annotation's value will be used.
voice-amplitude int = 100 How loud the voice will be.
voice-pitch int = 50 The voice pitch
voice-speed int = 175 The speed at which to talk (words per minute)
voice-wordgap int = 0 Additional gap between words in 10 ms units
file string Path to an audio file to use for anonymization, when mode is set to "file"

🏷 annotation-value="X" file="X.mp3" Use a custom audio file for annotations with a specific value
🏷 annotation-file="X" output-file="Y.mp4" Map custom output filename for a given annotation file

Input: Array<Annotation>

🗀 GPL-3.0 LicenseSpeech Synthesizer 🔗

🗀 GNU GPL LicenseOriginal eSpeak library ported by speak.js

create-annotations 🧪

Create tiers in all or specific corpus files.

missing-across-files bool = false Automatically create missing tiers across corpus files
file regexp Restrict file(s) on which to create tiers

create-tiers 🧪

Create tiers in all or specific corpus files.

missing-across-files bool = false Automatically create missing tiers across corpus files
file regexp Restrict file(s) on which to create tiers

demucs-separation

Demucs can separate voice and instruments from an audio track.

source string = vocals The source type to separate from the rest(vocals, drums)

⭳ MIT LicenseHybrid Spectrogram and Waveform Source Separation 🔗

divide-tiers

Move annotations with specific values to specific tiers.

tier-match regexp Restrict tier(s) to divide
file-match regexp Restrict file(s) on which to divide tiers
move-to-tier js A function to map an annotation to a specific tier, like (annotation)=>annotation.value.includes('eye')?'eyes':undefined // when undefined is returned, the annotation won't change tier.
create-tier js An optional function called for each tier that should return a string which will be used to create equivalent empty tiers, like (tier)=>tier.id.replace('eye','nose')

2 🕮

export-corpus-media

Exports current corpus media files to a folder. The folder will be created in AVAA's temp directory.

folder string = exported-corpus-media Name of the folder to create and populate with the corpus media files
overwrite bool = false Whether to overwrite existing files in the folder, otherwise another folder will be created
copy-temp bool = false Whether to copy the files even if they come from the temp folder, otherwise temporary files are just moved to destination folder. If you use processors after export-corpus-media, you should activate this attribute so AVAA still finds the temporary files.

1 🕮

export-corpus-standalone 🧪

Exports current corpus media files together with a copy of the ORIGINAL corpus files edited to reference the exported media files. This produces a standalone corpus folder which can be easily shared because it does not contains absolute paths anymore.

folder string = exported-corpus Name of the folder to create and populate with the corpus files. If the name contains a slash (/) it will be considered as an absolute folder path, otherwise the folder is created in AVAA's temp directory.
overwrite bool = false Whether to overwrite existing files in the folder, otherwise another folder will be created
copy-temp bool = false Whether to copy the files even if they come from the temp folder, otherwise temporary files are just moved to destination folder. If you use processors after export-corpus-standalone, you should activate this attribute so AVAA still finds the temporary files.

3 🕮

export-to-eaf 🧪

Exports current corpus to EAF format.

copy-media bool = false Whether to copy the associated media files next to EAF file, to use as relative path (good for sharing). Otherwise absolute path of media will be used (good for testing)
export-empty-tiers bool = false Whether to also exports the tiers that don't have any annotation

export-to-srt 🧪

Exports a selection of annotations to SRT format

copy-media bool = false Whether to copy the associated media files next to SRT file

Input: Array<Annotation>

1 🕮

export-to-tei 🧪

Exports a selection of annotations (or an array of objects) to TEI format.

(⚠ Synchronization markers not implemented yet)

mode select = TEI corpus Method used to export the annotations(TEI corpus, TEI divisions, TEI files)
structure select = speaker Structure used to represent annotations(speaker, utterance)
author string The author to add to TEI file description
corpus-name string = tei-corpus Name of the exported corpus
corpus-subtitle string Subtitle for exported corpus and separate files
extension string = .tei.xml The extension of exported TEI files
folder string An optional specific folder name to save the TEI files
role-map js A JSON object mapping the objects fields to their TEI role (data, label...)

Input: Array<Annotation>
Input: Array<Object>

1 🕮

ffmpeg-cut

Cuts a segment from each corpus media file.

This processor also accepts an array of annotations to cut multiple segments. In this case, the corpus will be reduced to relevant annotation files and each media file will be replaced by its cuts, or by a one merged file from all cuts when "concat" attribute is set to true.

start string Starting point in seconds to start cutting from
duration string Duration of cut segment
concat bool = false Whether to concat all segments in case multiple are cut

Input: Array<Annotation>

ffmpeg-denoise 🧪

This processor calls ffmpeg's denoise feature.

FFT: Denoises audio with FFT.
NLM: Reduces broadband noise using a Non-Local Means algorithm.
RNN: Reduces noise from speech using Recurrent Neural Networks model.

Learn more about the RNN models

method string The denoise method to use(FFT, RNN, NLM)
rnn-model string = beguiling-drafter (beguiling-drafter, conjoined-burgers, leavened-quisling, marathon-prescription,...)
rnn-mix number = 1 How much to mix filtered samples into final output. Allowed range is from -1 to 1. Default value is 1. Negative values are special, they set how much to keep filtered noise in the final filter output
rnn-threads int = 1 Number of threads (1 for stereo)
nlm-strength number = 0.00001 Set denoising strength. Allowed range is from 0.00001 to 10000(0.00001, 10000)
nlm-patch number = 0.002 Set patch radius duration. Allowed range is from 0.001 to 0.1(0.001, 0.1)
nlm-research number = 0.006 Set research radius duration. Allowed range is from 2 to 300 milliseconds(0.002, 0.3)
nlm-smooth number = 11 Set smooth factor. Allowed range is from 1 to 1000(1, 1000)
nlm-output select = output denoised Set the output mode(output denoised, input unchanged, noise only)

🗀 No LicenseNoise Removal Neural Network Models 🔗

ffmpeg-filter-audio

This processor calls FFmpeg with a user defined audio filter.

Learn more about audio filters on the FFmpeg site.

filter-audio string The filter expression
sample-rate string If specified, audio output will also be resampled

ffmpeg-filter-complex

This processor calls ffmpeg with a user defined filter-complex.

Learn more about complex filters on the FFmpeg site.

filter-complex string The filter expression

ffmpeg-frei0r 🧪

Applies a frei0r filter on each corpus media file.

filter string The frei0r filter to use(3dflippo, addition, addition_alpha, aech0r, alpha0ps_alpha0ps,...)
param-1 string A parameter to pass to the filter
param-2 string A parameter to pass to the filter
param-3 string A parameter to pass to the filter

🗀 GPL-2.0 LicenseVideo filters by Dyne.org 🔗

🗀 CC BY-NC-ND 4.0 LicenseFrei0r DLL pack for Windows by Gyan Doshi 🔗

3 🕮

hardsub

Hardcodes annotations as subtitles on top of video. This processor will automatically use the values of the annotations that generated the clips, whenever they are available. It is possible to use different annotations from other tiers in range, by adding "source-tier" parameters.

include-tier-names bool = false Whether to include the tier names in the subtitles
include-tier-separator string = : A separator to add between tier name and subtitle text
extend-duration-before int = 0 Number of milliseconds to display subtitle before its original start time, so it is shown earlier
extend-duration-after int = 0 Number of milliseconds added after the original end time, so it stays visible longer
style-color color = #FFFFFF Subtitles color
style-opacity string = 100% Subtitles opacity, in %
style-outline-color color = #000000 Subtitles text outline color
style-outline-opacity string = 100% Opacity of text outline, in %
style-outline-width string Width of the text outline, in pixels
style-size int Subtitles size
style-bold bool = false Subtitles weight
style-font string Subtitles font name

🏷 source-tier="" name="" Add a tier from which to take subtitles text, and optionally customize its name

load-image-annotations

Upgrades each input annotation to a special image annotation, using the value of the annotation as the corresponding image filename which should exist in same folder.

folder string A folder name located next to the annotation file, to look into for image files (accepts wildcard)

Input: Array<Annotation>

1 🗸1 🕮

media-converter 🧪

Use Media Converter to convert video and audio files into other formats

video-resolution string Resolution of converted video (like 1280x720)
audio-volume number = 1 Audio volume(0 1)
format string Format of output media(wav, flac, aac, mp3, mp4, avi)
audio-codec string Audio codec(pcm_s16le)
audio-mono bool = false Whether to convert audio to mono
audio-rate int Custom audio sample rate

Input: Array<Annotation>

merge-annotation-files

Merge tiers from similar annotation files (which have an identical media).

file-comparator js A function called with 2 files as arguments, which should return true if the files are to be merged together
file-priority js A function called with an array of files as first argument, and a tier name as second argument, which should return the file that has priority in case of colliding tiers
merged-name js A function called with an array of files as argument, which should return a name for the created corpus file holding the merged tiers
tiers-order string A comma separated list of tier names which will define the creation order of the tiers in the merged file
exclude-tiers regexp A regular expression to specify which tiers should be excluded from merging

Input: Array<Annotation>

2 🕮

r-script 🧪

This processor executes a R program and integrates the resulting data into the final HTML page. Resulting R output can be graphic files (jpg, png, gif, svg) or tabular text data. Arguments provided to R are in order: - temp directory path to work with and create result files - path to a JSON file consisting of the selection (annotations) or data provided to the processor

R scripts must follow a specific input/output syntax to be compatible with AVAA (see "Calling R" in the scripting guide).

file file .R file to run
source js Plain R source code to run

2 🕮

reduce-corpus

Reduces the corpus to specific files. Useful to work on a subset of the corpus without modifying the corpus itself. This processor also accepts a selection of annotations, in which case only corpus files of these annotations will be kept.

group regexp Which group of corpus files to keep (regular expression)
file regexp Which file from the corpus to keep (regular expression)
tag regexp Files from the corpus to keep that have a tag satisfying this regular expression
filter js A custom filter function called for each file that should return true to keep the file, like (f)=>f.filename.includes('.eaf')

Input: Array<Annotation>

reduce-corpus-media

Filters media files from the corpus, keeping only files that match a specific critera. The "exclude" attribute can be used to alternatively exclude these files from the corpus. Useful to work on a subset of the corpus media files without modifying the corpus itself.

exclude bool = false Whether to exclude the filtered files instead of keeping them
file regexp Which media file from the corpus to keep (regular expression)
filter js A custom filter function called for each file that should return true to keep the file, like (mf)=>mf.extension.includes('mp4')

1 🕮

remove-sequences-from-corpus

Takes a selection of annotations and modifies the corpus, using the input annotations as sequences, each sequence being removed from the corpus file, with its associated media segment and all the annotations included in that sequence.

Input: Array<Annotation>

1 🕮

rename-tiers

Rename tiers in all or specific corpus files.

file regexp Restrict file(s) on which to rename tiers
from string A single tier name to be replaced
to string The replacement name to use with the "from" attribute
map js A JSON object mapping the names to replace (keys) to the new names (values), like {"OLD NAME":"NEW NAME"}
template js = ${name} A template function to build the name of each tier, called with the original tier name and object, like (name,tier) => ${name.toUpperCase()}. When this attribute is specified all tiers (eventually filtered via the "file" attribute) will be affected.

reset-corpus

Resets the pipeline corpus to its original state.

Useful when working with loops.

1 🕮

sequences-to-corpus

Takes a selection of annotations and recreates the corpus, using the input annotations as sequences, each sequence being transformed into one corpus file with its associated media file and all the annotations/tiers included in that sequence.

name-template js = sequence-${i}-${a.af.id} The template function to build the name of each corpus file, provided with the annotation and its index, like (a,i) => ${i} - ${a.value}

Input: Array<Annotation>

1 🕮

speaker-diarization-pyannote

Speaker diarization is the process of marking segments of voice with their speaker. This processor takes a selection of annotations, and adds to the corpus new annotations associated with their speaker tier.

hf-token string The HuggingFace access token, required to download pyannote models
speakers int = 0 Number of speakers in the audio

Input: Array<Annotation>
Output: Array<Annotation>

⭳ MIT LicenseSpeaker diarization 🔗

speech-to-text-faster-whisper

A speech to text processor using SYSTRAN Faster Whisper to transcribe and automatically create annotations

language string The language to transcribe, if not specified autodetect will be attempted
model select = tiny The trained model to use for transcription(tiny, small, medium, large, distil-large-v3)
temperature number = 0 Temperature, adjust to fix hallucinations
device select = auto The processing device to use(auto, cpu, cuda)
precision select = int8 (auto, int8, fp16, fp32)
beam-size int = 5 The decoding beam size
batched bool = false Whether to use batch processing (faster)
word-timestamps false Whether to output word-level timestamps
output-tier string = stt-faster-whisper Tier id for the extracted annotations
vad-threshold number = 0.5 Speech threshold. Silero VAD outputs speech probabilities for each audio chunk, probabilities ABOVE this value are considered as SPEECH. It is better to tune this parameter for each dataset separately, but "lazy" 0.5 is pretty good for most datasets.
vad-min-speech-duration int = 250 Final speech chunks shorter than this are thrown out (in milliseconds)
vad-max-speech-duration number Maximum duration of speech chunks in seconds. Chunks longer than max_speech_duration_s will be split at the timestamp of the last silence that lasts more than 100ms (if any), to prevent aggressive cutting. Otherwise, they will be split aggressively just before max-speech-duration.
vad-min-silence-duration int = 2000 In the end of each speech chunk wait for min-silence-duration before separating it (in milliseconds).
vad-speech-pad-ms int = 400 Final speech chunks are padded by vad-speech-pad milliseconds each side
verbose bool = false Whether to log the transcriptions as soon as they are detected

Output: Array<Annotation>

⭳ MIT LicenseReimplementation of OpenAI's Whisper model using CTranslate2 🔗

1 🗸1 🕮

speech-to-text-whisper

A speech to text processor using OpenAI Whisper to transcribe and automatically create annotations

language string The language to transcribe, if not specified autodetect will be attempted
model string = small The trained model to use for transcription(tiny, small, medium, large-v3)
temperature number = 0 Temperature, adjust to fix hallucinations
output-tier string = stt-whisper Tier id for the extracted annotations
verbose bool = false Whether to log the transcriptions as soon as they are detected

Output: Array<Annotation>

⭳ MIT LicenseRobust Speech Recognition via Large-Scale Weak Supervision 🔗

speech-to-text-whisper-at 🧪

A variation of OpenAI Whisper designed to extract audio events of the 527-class AudioSet, Whisper-AT processor outputs general audio events as annotations.

language string The language to transcribe will also affect the names of the audio events
model string = tiny The trained model to use for transcription(tiny, small, medium, large-v3)

Output: Array<Annotation>

⭳ BSD-2 LicenseNoise-Robust ASR are Also Strong Audio Event Taggers 🔗

vad-silero

Silero's Voice Activity Detector processor creates annotations for each segment of input audio containing voice.

output-tier string = vad-silero Tier id for the generated annotations
threshold number = 0.5 Use a higher threshold for noisy audio
sampling-rate number = 16000 (8000, 16000, 32000, 48000)
min-silence-duration int = 500 Number of milliseconds
min-speech-duration int = 1000 Minimum duration (in milliseconds) of activity to consider a voice segment

Output: Array<Annotation>

⭳ MIT LicensePre-trained enterprise-grade Voice Activity Detector 🔗

🗀 MIT LicenseSilero JIT and ONNX files

2 🕮

video-anonymizer-cartoon

Anonymize videos with a cartoon effect and optional blurring.

effect select = cartoon The type of anonymisation effect to apply on the video(cartoon, cartoon-blur)
cartoon-diffspace number = 0.9995 The difference space parameter (between 0 and 1)
cartoon-triplevel number = 0.004 The trip level parameter (between 0 and 1)
blur-intensity number = 0.4 Intensity of blurring the cartoonized video

🗀 GPL-2.0 LicenseVideo filters by Dyne.org 🔗

🗀 CC BY-NC-ND 4.0 LicenseFrei0r DLL pack for Windows by Gyan Doshi 🔗

1 🕮

video-anonymizer-deface

Detect and blur faces with ORB-HD deface.

mode select = blur Anonymization filter mode for face regions(blur, solid, mosaic)
threshold number = 0.2 Detection threshold (tune this to trade off between false positive and false negative rate)
mask-scale number = 1.3 Scale factor for face masks, to make sure that masks cover the complete face
mosaicsize int = 20 Width of the mosaic squares when deface-mode is mosaic
boxes bool = false Use boxes instead of ellipse masks
draw-scores bool = false Draw detection scores onto outputs, useful to find the best threshold
downscale string Downscale resolution for the network inference (WxH)

⭳ MIT LicenseVideo anonymization by face detection 🔗