Processors

audio-anonymizer

Audio Anonymizer modifies corpus media files by applying audio filters on each input annotation segment. Available modes are

  • silence: replaces each segment with complete silence (default)
  • noise: replaces each segment with a configurable noise
  • beep: replaces each segment with a configurable beep
  • voice: replaces each segment with a synthetized voice
  • file: replaces each segment with a custom audio file

(โš  for now works only on corpus files, not exported clips)

  • mode string = silence The mode of anonymization, an audio transformation that will be applied to each annotation segment.(silence, noise, beep, voice, file)

  • mute bool = true Whether to silence the segment anyway before mixing the noise/beep/voice/file

  • beep-frequency int = 800 The frequency to use for sine beep sound

  • noise-amplitude number = 1 Amplitude of the generated noise(0 1)

  • noise-color string = white Noise color(white, pink, brown, blue, violet, velvet)

  • noise-seed int Seed value for noise PRNG

  • noise-weight number = 1 Mixing weight of the noise sound

  • voice-text string Text to synthetize and use for anonymization. If not provided, the annotation's value will be used.

  • voice-amplitude int = 100 How loud the voice will be.

  • voice-pitch int = 50 The voice pitch

  • voice-speed int = 175 The speed at which to talk (words per minute)

  • voice-wordgap int = 0 Additional gap between words in 10 ms units

  • file string Path to an audio file to use for anonymization, when mode is set to "file"

๐Ÿท annotation-value="X" file="X.mp3" Use a custom audio file for annotations with a specific value
๐Ÿท annotation-file="X" output-file="Y.mp4" Map custom output filename for a given annotation file

Input: Array<Annotation>

๐Ÿ—€ GPL-3.0 LicenseSpeech Synthesizer ๐Ÿ”—
๐Ÿ—€ GNU GPL LicenseOriginal eSpeak library ported by speak.js

create-tiers ๐Ÿงช

Create tiers in all or specific corpus files.

  • missing-across-files bool = false Automatically create missing tiers across corpus files

  • file regexp Restrict file(s) on which to create tiers


demucs-separation

Demucs can separate voice and instruments from an audio track.

  • source string = vocals The source type to separate from the rest(vocals, drums)
โญณ MIT LicenseHybrid Spectrogram and Waveform Source Separation ๐Ÿ”—

divide-tiers

Move annotations with specific values to specific tiers.

  • tier-match regexp Restrict tier(s) to divide

  • file-match regexp Restrict file(s) on which to divide tiers

  • move-to-tier js A function to map an annotation to a specific tier, like (annotation)=>annotation.value.includes('eye')?'eyes':undefined // when undefined is returned, the annotation won't change tier.

  • create-tier js An optional function called for each tier that should return a string which will be used to create equivalent empty tiers, like (tier)=>tier.id.replace('eye','nose')


export-corpus-media

Exports current corpus media files to a folder. The folder will be created in AVAA's temp directory.

  • folder string = exported-corpus-media Name of the folder to create and populate with the corpus media files

  • overwrite bool = false Whether to overwrite existing files in the folder, otherwise another folder will be created

  • copy-temp bool = false Whether to copy the files even if they come from the temp folder, otherwise temporary files are just moved to destination folder. If you use processors after export-corpus-media, you should activate this attribute so AVAA still finds the temporary files.


export-corpus-standalone ๐Ÿงช

Exports current corpus media files together with a copy of the ORIGINAL corpus files edited to reference the exported media files. This produces a standalone corpus folder which can be easily shared because it does not contains absolute paths anymore.

  • folder string = exported-corpus Name of the folder to create and populate with the corpus files. If the name contains a slash (/) it will be considered as an absolute folder path, otherwise the folder is created in AVAA's temp directory.

  • overwrite bool = false Whether to overwrite existing files in the folder, otherwise another folder will be created

  • copy-temp bool = false Whether to copy the files even if they come from the temp folder, otherwise temporary files are just moved to destination folder. If you use processors after export-corpus-standalone, you should activate this attribute so AVAA still finds the temporary files.


export-to-eaf ๐Ÿงช

Exports current corpus to EAF format.

  • copy-media bool = false Whether to copy the associated media files next to EAF file, to use as relative path (good for sharing). Otherwise absolute path of media will be used (good for testing)

  • export-empty-tiers bool = false Whether to also exports the tiers that don't have any annotation


export-to-srt ๐Ÿงช

Exports a selection of annotations to SRT format

  • copy-media bool = false Whether to copy the associated media files next to SRT file

Input: Array<Annotation>


export-to-tei ๐Ÿงช

Exports a selection of annotations (or an array of objects) to TEI format.

(โš  Synchronization markers not implemented yet)

  • mode select = TEI corpus Method used to export the annotations(TEI corpus, TEI divisions, TEI files)

  • structure select = speaker Structure used to represent annotations(speaker, utterance)

  • author string The author to add to TEI file description

  • corpus-name string = tei-corpus Name of the exported corpus

  • corpus-subtitle string Subtitle for exported corpus and separate files

  • extension string = .tei.xml The extension of exported TEI files

  • folder string An optional specific folder name to save the TEI files

  • role-map js A JSON object mapping the objects fields to their TEI role (data, label...)

Input: Array<Annotation>
Input: Array<Object>


ffmpeg-cut

Cuts a segment from each corpus media file.

This processor also accepts an array of annotations to cut multiple segments. In this case, the corpus will be reduced to relevant annotation files and each media file will be replaced by its cuts, or by a one merged file from all cuts when "concat" attribute is set to true.

  • start string Starting point in seconds to start cutting from

  • duration string Duration of cut segment

  • concat bool = false Whether to concat all segments in case multiple are cut

Input: Array<Annotation>


ffmpeg-denoise ๐Ÿงช

This processor calls ffmpeg's denoise feature.

  • FFT: Denoises audio with FFT.
  • NLM: Reduces broadband noise using a Non-Local Means algorithm.
  • RNN: Reduces noise from speech using Recurrent Neural Networks model.

Learn more about the RNN models

  • method string The denoise method to use(FFT, RNN, NLM)

  • rnn-model string = beguiling-drafter (beguiling-drafter, conjoined-burgers, leavened-quisling, marathon-prescription,...)

  • rnn-mix number = 1 How much to mix filtered samples into final output. Allowed range is from -1 to 1. Default value is 1. Negative values are special, they set how much to keep filtered noise in the final filter output

  • rnn-threads int = 1 Number of threads (1 for stereo)

  • nlm-strength number = 0.00001 Set denoising strength. Allowed range is from 0.00001 to 10000(0.00001, 10000)

  • nlm-patch number = 0.002 Set patch radius duration. Allowed range is from 0.001 to 0.1(0.001, 0.1)

  • nlm-research number = 0.006 Set research radius duration. Allowed range is from 2 to 300 milliseconds(0.002, 0.3)

  • nlm-smooth number = 11 Set smooth factor. Allowed range is from 1 to 1000(1, 1000)

  • nlm-output select = output denoised Set the output mode(output denoised, input unchanged, noise only)

๐Ÿ—€ No LicenseNoise Removal Neural Network Models ๐Ÿ”—

ffmpeg-filter-audio

This processor calls FFmpeg with a user defined audio filter.

Learn more about audio filters on the FFmpeg site.

  • filter-audio string The filter expression

  • sample-rate string If specified, audio output will also be resampled


ffmpeg-filter-complex

This processor calls ffmpeg with a user defined filter-complex.

Learn more about complex filters on the FFmpeg site.

  • filter-complex string The filter expression

ffmpeg-frei0r ๐Ÿงช

Applies a frei0r filter on each corpus media file.

  • filter string The frei0r filter to use(3dflippo, addition, addition_alpha, aech0r, alpha0ps_alpha0ps,...)

  • param-1 string A parameter to pass to the filter

  • param-2 string A parameter to pass to the filter

  • param-3 string A parameter to pass to the filter

๐Ÿ—€ GPL-2.0 LicenseVideo filters by Dyne.org ๐Ÿ”—
๐Ÿ—€ CC BY-NC-ND 4.0 LicenseFrei0r DLL pack for Windows by Gyan Doshi ๐Ÿ”—

hardsub

Hardcodes annotations as subtitles on top of video. This processor will automatically use the values of the annotations that generated the clips, whenever they are available. It is possible to use different annotations from other tiers in range, by adding "source-tier" parameters.

  • include-tier-names bool = false Whether to include the tier names in the subtitles

  • include-tier-separator string = : A separator to add between tier name and subtitle text

  • extend-duration-before int = 0 Number of milliseconds to display subtitle before its original start time, so it is shown earlier

  • extend-duration-after int = 0 Number of milliseconds added after the original end time, so it stays visible longer

  • style-color color = #FFFFFF Subtitles color

  • style-opacity string = 100% Subtitles opacity, in %

  • style-outline-color color = #000000 Subtitles text outline color

  • style-outline-opacity string = 100% Opacity of text outline, in %

  • style-outline-width string Width of the text outline, in pixels

  • style-size int Subtitles size

  • style-bold bool = false Subtitles weight

  • style-font string Subtitles font name

๐Ÿท source-tier="" name="" Add a tier from which to take subtitles text, and optionally customize its name


load-image-annotations

Upgrades each input annotation to a special image annotation, using the value of the annotation as the corresponding image filename which should exist in same folder.

  • folder string A folder name located next to the annotation file, to look into for image files (accepts wildcard)

Input: Array<Annotation>


media-converter ๐Ÿงช

Use Media Converter to convert video and audio files into other formats

  • video-resolution string Resolution of converted video (like 1280x720)

  • audio-volume number = 1 Audio volume(0 1)

  • format string Format of output media(wav, flac, aac, mp3, mp4, avi)

  • audio-codec string Audio codec(pcm_s16le)

  • audio-mono bool = false Whether to convert audio to mono

  • audio-rate int Custom audio sample rate

Input: Array<Annotation>


merge-annotation-files

Merge tiers from similar annotation files (which have an identical media).

  • file-comparator js A function called with 2 files as arguments, which should return true if the files are to be merged together

  • file-priority js A function called with an array of files as first argument, and a tier name as second argument, which should return the file that has priority in case of colliding tiers

  • merged-name js A function called with an array of files as argument, which should return a name for the created corpus file holding the merged tiers

  • tiers-order string A coma separated list of tier names which will define the creation order of the tiers in the merged file

  • exclude-tiers regexp A regular expression to specify which tiers should be excluded from merging

Input: Array<Annotation>


r-script ๐Ÿงช

This processor executes a R program and integrates the resulting data into the final HTML page. Resulting R output can be graphic files (jpg, png, gif, svg) or tabular text data. Arguments provided to R are in order: - temp directory path to work with and create result files - path to a JSON file consisting of the selection (annotations) or data provided to the processor

R scripts must follow a specific input/output syntax to be compatible with AVAA (see "Calling R" in the scripting guide).

  • file file .R file to run

  • source js Plain R source code to run


reduce-corpus

Reduces the corpus to specific files. Useful to work on a subset of the corpus without modifying the corpus itself. This processor also accepts a selection of annotations, in which case only corpus files of these annotations will be kept.

  • group regexp Which group of corpus files to keep (regular expression)

  • file regexp Which file from the corpus to keep (regular expression)

  • tag regexp Files from the corpus to keep that have a tag satisfying this regular expression

  • filter js A custom filter function called for each file that should return true to keep the file, like (f)=>f.filename.includes('.eaf')

Input: Array<Annotation>


reduce-corpus-media

Filters media files from the corpus, keeping only files that match a specific critera. The "exclude" attribute can be used to alternatively exclude these files from the corpus. Useful to work on a subset of the corpus media files without modifying the corpus itself.

  • exclude bool = false Whether to exclude the filtered files instead of keeping them

  • file regexp Which media file from the corpus to keep (regular expression)

  • filter js A custom filter function called for each file that should return true to keep the file, like (mf)=>mf.extension.includes('mp4')


remove-sequences-from-corpus

Takes a selection of annotations and modifies the corpus, using the input annotations as sequences, each sequence being removed from the corpus file, with its associated media segment and all the annotations included in that sequence.

Input: Array<Annotation>


rename-tiers

Rename tiers in all or specific corpus files.

  • file regexp Restrict file(s) on which to rename tiers

  • from string A single tier name to be replaced

  • to string The replacement name to use with the "from" attribute

  • map js A JSON object mapping the names to replace (keys) to the new names (values), like {"OLD NAME":"NEW NAME"}

  • template js = ${name} A template function to build the name of each tier, called with the original tier name and object, like (name,tier) => ${name.toUpperCase()}. When this attribute is specified all tiers (eventually filtered via the "file" attribute) will be affected.


reset-corpus

Resets the pipeline corpus to its original state.

Useful when working with loops.


sequences-to-corpus

Takes a selection of annotations and recreates the corpus, using the input annotations as sequences, each sequence being transformed into one corpus file with its associated media file and all the annotations/tiers included in that sequence.

  • name-template js = sequence-${i}-${a.af.id} The template function to build the name of each corpus file, provided with the annotation and its index, like (a,i) => ${i} - ${a.value}

Input: Array<Annotation>


speaker-diarization-pyannote

Speaker diarization is the process of marking segments of voice with their speaker. This processor takes a selection of annotations, and adds to the corpus new annotations associated with their speaker tier.

  • hf-token string The HuggingFace access token, required to download pyannote models

  • speakers int = 0 Number of speakers in the audio

Input: Array<Annotation>
Output: Array<Annotation>

โญณ MIT LicenseSpeaker diarization ๐Ÿ”—

speech-to-text-faster-whisper

A speech to text processor using SYSTRAN Faster Whisper to transcribe and automatically create annotations

  • language string The language to transcribe, if not specified autodetect will be attempted

  • model select = tiny The trained model to use for transcription(tiny, small, medium, large, distil-large-v3)

  • temperature number = 0 Temperature, adjust to fix hallucinations

  • device select = auto The processing device to use(auto, cpu, cuda)

  • precision select = int8 (auto, int8, fp16, fp32)

  • beam-size int = 5 The decoding beam size

  • batched bool = false Whether to use batch processing (faster)

  • word-timestamps false Whether to output word-level timestamps

  • output-tier string = stt-faster-whisper Tier id for the extracted annotations

  • vad-threshold number = 0.5 Speech threshold. Silero VAD outputs speech probabilities for each audio chunk, probabilities ABOVE this value are considered as SPEECH. It is better to tune this parameter for each dataset separately, but "lazy" 0.5 is pretty good for most datasets.

  • vad-min-speech-duration int = 250 Final speech chunks shorter than this are thrown out (in milliseconds)

  • vad-max-speech-duration number Maximum duration of speech chunks in seconds. Chunks longer than max_speech_duration_s will be split at the timestamp of the last silence that lasts more than 100ms (if any), to prevent aggressive cutting. Otherwise, they will be split aggressively just before max-speech-duration.

  • vad-min-silence-duration int = 2000 In the end of each speech chunk wait for min-silence-duration before separating it (in milliseconds).

  • vad-speech-pad-ms int = 400 Final speech chunks are padded by vad-speech-pad milliseconds each side

  • verbose bool = false Whether to log the transcriptions as soon as they are detected

Output: Array<Annotation>

โญณ MIT LicenseReimplementation of OpenAI's Whisper model using CTranslate2 ๐Ÿ”—

speech-to-text-whisper

A speech to text processor using OpenAI Whisper to transcribe and automatically create annotations

  • language string The language to transcribe, if not specified autodetect will be attempted

  • model string = small The trained model to use for transcription(tiny, small, medium, large-v3)

  • temperature number = 0 Temperature, adjust to fix hallucinations

  • output-tier string = stt-whisper Tier id for the extracted annotations

  • verbose bool = false Whether to log the transcriptions as soon as they are detected

Output: Array<Annotation>

โญณ MIT LicenseRobust Speech Recognition via Large-Scale Weak Supervision ๐Ÿ”—

speech-to-text-whisper-at ๐Ÿงช

A variation of OpenAI Whisper designed to extract audio events of the 527-class AudioSet, Whisper-AT processor outputs general audio events as annotations.

  • language string The language to transcribe will also affect the names of the audio events

  • model string = tiny The trained model to use for transcription(tiny, small, medium, large-v3)

Output: Array<Annotation>

โญณ BSD-2 LicenseNoise-Robust ASR are Also Strong Audio Event Taggers ๐Ÿ”—

vad-silero

Silero's Voice Activity Detector processor creates annotations for each segment of input audio containing voice.

  • output-tier string = vad-silero Tier id for the generated annotations

  • threshold number = 0.5 Use a higher threshold for noisy audio

  • sampling-rate number = 16000 (8000, 16000, 32000, 48000)

  • min-silence-duration int = 500 Number of milliseconds

  • min-speech-duration int = 1000 Minimum duration (in milliseconds) of activity to consider a voice segment

Output: Array<Annotation>

โญณ MIT LicensePre-trained enterprise-grade Voice Activity Detector ๐Ÿ”—
๐Ÿ—€ MIT LicenseSilero JIT and ONNX files

video-anonymizer-cartoon

Anonymize videos with a cartoon effect and optional blurring.

  • effect select = cartoon The type of anonymisation effect to apply on the video(cartoon, cartoon-blur)

  • cartoon-diffspace number = 0.9995 The difference space parameter (between 0 and 1)

  • cartoon-triplevel number = 0.004 The trip level parameter (between 0 and 1)

  • blur-intensity number = 0.4 Intensity of blurring the cartoonized video

๐Ÿ—€ GPL-2.0 LicenseVideo filters by Dyne.org ๐Ÿ”—
๐Ÿ—€ CC BY-NC-ND 4.0 LicenseFrei0r DLL pack for Windows by Gyan Doshi ๐Ÿ”—

video-anonymizer-deface

Detect and blur faces with ORB-HD deface.

  • mode select = blur Anonymization filter mode for face regions(blur, solid, mosaic)

  • threshold number = 0.2 Detection threshold (tune this to trade off between false positive and false negative rate)

  • mask-scale number = 1.3 Scale factor for face masks, to make sure that masks cover the complete face

  • mosaicsize int = 20 Width of the mosaic squares when deface-mode is mosaic

  • boxes bool = false Use boxes instead of ellipse masks

  • draw-scores bool = false Draw detection scores onto outputs, useful to find the best threshold

  • downscale string Downscale resolution for the network inference (WxH)

โญณ MIT LicenseVideo anonymization by face detection ๐Ÿ”—