Processors
audio-anonymizer
Audio Anonymizer modifies corpus media files by applying audio filters on each input annotation segment. Available modes are
- silence: replaces each segment with complete silence (default)
- noise: replaces each segment with a configurable noise
- beep: replaces each segment with a configurable beep
- voice: replaces each segment with a synthetized voice
- file: replaces each segment with a custom audio file
(โ for now works only on corpus files, not exported clips)
-
modestring = silence The mode of anonymization, an audio transformation that will be applied to each annotation segment.(silence, noise, beep, voice, file) -
mutebool = true Whether to silence the segment anyway before mixing the noise/beep/voice/file -
beep-frequencyint = 800 The frequency to use for sine beep sound -
noise-amplitudenumber = 1 Amplitude of the generated noise(0 1) -
noise-colorstring = white Noise color(white, pink, brown, blue, violet, velvet) -
noise-seedint Seed value for noise PRNG -
noise-weightnumber = 1 Mixing weight of the noise sound -
voice-textstring Text to synthetize and use for anonymization. If not provided, the annotation's value will be used. -
voice-amplitudeint = 100 How loud the voice will be. -
voice-pitchint = 50 The voice pitch -
voice-speedint = 175 The speed at which to talk (words per minute) -
voice-wordgapint = 0 Additional gap between words in 10 ms units -
filestring Path to an audio file to use for anonymization, when mode is set to "file"
๐ท annotation-value="X" file="X.mp3" Use a custom audio file for annotations with a specific value
๐ท annotation-file="X" output-file="Y.mp4" Map custom output filename for a given annotation file
Input: Array<Annotation>
create-tiers ๐งช
Create tiers in all or specific corpus files.
-
missing-across-filesbool = false Automatically create missing tiers across corpus files -
fileregexp Restrict file(s) on which to create tiers
demucs-separation
Demucs can separate voice and instruments from an audio track.
sourcestring = vocals The source type to separate from the rest(vocals, drums)
divide-tiers
Move annotations with specific values to specific tiers.
-
tier-matchregexp Restrict tier(s) to divide -
file-matchregexp Restrict file(s) on which to divide tiers -
move-to-tierjs A function to map an annotation to a specific tier, like (annotation)=>annotation.value.includes('eye')?'eyes':undefined // when undefined is returned, the annotation won't change tier. -
create-tierjs An optional function called for each tier that should return a string which will be used to create equivalent empty tiers, like (tier)=>tier.id.replace('eye','nose')
export-corpus-media
Exports current corpus media files to a folder. The folder will be created in AVAA's temp directory.
-
folderstring = exported-corpus-media Name of the folder to create and populate with the corpus media files -
overwritebool = false Whether to overwrite existing files in the folder, otherwise another folder will be created -
copy-tempbool = false Whether to copy the files even if they come from the temp folder, otherwise temporary files are just moved to destination folder. If you use processors after export-corpus-media, you should activate this attribute so AVAA still finds the temporary files.
export-corpus-standalone ๐งช
Exports current corpus media files together with a copy of the ORIGINAL corpus files edited to reference the exported media files. This produces a standalone corpus folder which can be easily shared because it does not contains absolute paths anymore.
-
folderstring = exported-corpus Name of the folder to create and populate with the corpus files. If the name contains a slash (/) it will be considered as an absolute folder path, otherwise the folder is created in AVAA's temp directory. -
overwritebool = false Whether to overwrite existing files in the folder, otherwise another folder will be created -
copy-tempbool = false Whether to copy the files even if they come from the temp folder, otherwise temporary files are just moved to destination folder. If you use processors after export-corpus-standalone, you should activate this attribute so AVAA still finds the temporary files.
export-to-eaf ๐งช
Exports current corpus to EAF format.
-
copy-mediabool = false Whether to copy the associated media files next to EAF file, to use as relative path (good for sharing). Otherwise absolute path of media will be used (good for testing) -
export-empty-tiersbool = false Whether to also exports the tiers that don't have any annotation
export-to-srt ๐งช
Exports a selection of annotations to SRT format
copy-mediabool = false Whether to copy the associated media files next to SRT file
Input: Array<Annotation>
export-to-tei ๐งช
Exports a selection of annotations (or an array of objects) to TEI format.
(โ Synchronization markers not implemented yet)
-
modeselect = TEI corpus Method used to export the annotations(TEI corpus, TEI divisions, TEI files) -
structureselect = speaker Structure used to represent annotations(speaker, utterance) -
authorstring The author to add to TEI file description -
corpus-namestring = tei-corpus Name of the exported corpus -
corpus-subtitlestring Subtitle for exported corpus and separate files -
extensionstring = .tei.xml The extension of exported TEI files -
folderstring An optional specific folder name to save the TEI files -
role-mapjs A JSON object mapping the objects fields to their TEI role (data, label...)
Input: Array<Annotation>
Input: Array<Object>
ffmpeg-cut
Cuts a segment from each corpus media file.
This processor also accepts an array of annotations to cut multiple segments. In this case, the corpus will be reduced to relevant annotation files and each media file will be replaced by its cuts, or by a one merged file from all cuts when "concat" attribute is set to true.
-
startstring Starting point in seconds to start cutting from -
durationstring Duration of cut segment -
concatbool = false Whether to concat all segments in case multiple are cut
Input: Array<Annotation>
ffmpeg-denoise ๐งช
This processor calls ffmpeg's denoise feature.
- FFT: Denoises audio with FFT.
- NLM: Reduces broadband noise using a Non-Local Means algorithm.
- RNN: Reduces noise from speech using Recurrent Neural Networks model.
Learn more about the RNN models
-
methodstring The denoise method to use(FFT, RNN, NLM) -
rnn-modelstring = beguiling-drafter (beguiling-drafter, conjoined-burgers, leavened-quisling, marathon-prescription,...) -
rnn-mixnumber = 1 How much to mix filtered samples into final output. Allowed range is from -1 to 1. Default value is 1. Negative values are special, they set how much to keep filtered noise in the final filter output -
rnn-threadsint = 1 Number of threads (1 for stereo) -
nlm-strengthnumber = 0.00001 Set denoising strength. Allowed range is from 0.00001 to 10000(0.00001, 10000) -
nlm-patchnumber = 0.002 Set patch radius duration. Allowed range is from 0.001 to 0.1(0.001, 0.1) -
nlm-researchnumber = 0.006 Set research radius duration. Allowed range is from 2 to 300 milliseconds(0.002, 0.3) -
nlm-smoothnumber = 11 Set smooth factor. Allowed range is from 1 to 1000(1, 1000) -
nlm-outputselect = output denoised Set the output mode(output denoised, input unchanged, noise only)
ffmpeg-filter-audio
This processor calls FFmpeg with a user defined audio filter.
Learn more about audio filters on the FFmpeg site.
-
filter-audiostring The filter expression -
sample-ratestring If specified, audio output will also be resampled
ffmpeg-filter-complex
This processor calls ffmpeg with a user defined filter-complex.
Learn more about complex filters on the FFmpeg site.
filter-complexstring The filter expression
ffmpeg-frei0r ๐งช
Applies a frei0r filter on each corpus media file.
-
filterstring The frei0r filter to use(3dflippo, addition, addition_alpha, aech0r, alpha0ps_alpha0ps,...) -
param-1string A parameter to pass to the filter -
param-2string A parameter to pass to the filter -
param-3string A parameter to pass to the filter
hardsub
Hardcodes annotations as subtitles on top of video. This processor will automatically use the values of the annotations that generated the clips, whenever they are available. It is possible to use different annotations from other tiers in range, by adding "source-tier" parameters.
-
include-tier-namesbool = false Whether to include the tier names in the subtitles -
include-tier-separatorstring = : A separator to add between tier name and subtitle text -
extend-duration-beforeint = 0 Number of milliseconds to display subtitle before its original start time, so it is shown earlier -
extend-duration-afterint = 0 Number of milliseconds added after the original end time, so it stays visible longer -
style-colorcolor = #FFFFFF Subtitles color -
style-opacitystring = 100% Subtitles opacity, in % -
style-outline-colorcolor = #000000 Subtitles text outline color -
style-outline-opacitystring = 100% Opacity of text outline, in % -
style-outline-widthstring Width of the text outline, in pixels -
style-sizeint Subtitles size -
style-boldbool = false Subtitles weight -
style-fontstring Subtitles font name
๐ท source-tier="" name="" Add a tier from which to take subtitles text, and optionally customize its name
load-image-annotations
Upgrades each input annotation to a special image annotation, using the value of the annotation as the corresponding image filename which should exist in same folder.
folderstring A folder name located next to the annotation file, to look into for image files (accepts wildcard)
Input: Array<Annotation>
media-converter ๐งช
Use Media Converter to convert video and audio files into other formats
-
video-resolutionstring Resolution of converted video (like 1280x720) -
audio-volumenumber = 1 Audio volume(0 1) -
formatstring Format of output media(wav, flac, aac, mp3, mp4, avi) -
audio-codecstring Audio codec(pcm_s16le) -
audio-monobool = false Whether to convert audio to mono -
audio-rateint Custom audio sample rate
Input: Array<Annotation>
merge-annotation-files
Merge tiers from similar annotation files (which have an identical media).
-
file-comparatorjs A function called with 2 files as arguments, which should return true if the files are to be merged together -
file-priorityjs A function called with an array of files as first argument, and a tier name as second argument, which should return the file that has priority in case of colliding tiers -
merged-namejs A function called with an array of files as argument, which should return a name for the created corpus file holding the merged tiers -
tiers-orderstring A coma separated list of tier names which will define the creation order of the tiers in the merged file -
exclude-tiersregexp A regular expression to specify which tiers should be excluded from merging
Input: Array<Annotation>
r-script ๐งช
This processor executes a R program and integrates the resulting data into the final HTML page. Resulting R output can be graphic files (jpg, png, gif, svg) or tabular text data. Arguments provided to R are in order: - temp directory path to work with and create result files - path to a JSON file consisting of the selection (annotations) or data provided to the processor
R scripts must follow a specific input/output syntax to be compatible with AVAA (see "Calling R" in the scripting guide).
-
filefile .R file to run -
sourcejs Plain R source code to run
reduce-corpus
Reduces the corpus to specific files. Useful to work on a subset of the corpus without modifying the corpus itself. This processor also accepts a selection of annotations, in which case only corpus files of these annotations will be kept.
-
groupregexp Which group of corpus files to keep (regular expression) -
fileregexp Which file from the corpus to keep (regular expression) -
tagregexp Files from the corpus to keep that have a tag satisfying this regular expression -
filterjs A custom filter function called for each file that should return true to keep the file, like (f)=>f.filename.includes('.eaf')
Input: Array<Annotation>
reduce-corpus-media
Filters media files from the corpus, keeping only files that match a specific critera. The "exclude" attribute can be used to alternatively exclude these files from the corpus. Useful to work on a subset of the corpus media files without modifying the corpus itself.
-
excludebool = false Whether to exclude the filtered files instead of keeping them -
fileregexp Which media file from the corpus to keep (regular expression) -
filterjs A custom filter function called for each file that should return true to keep the file, like (mf)=>mf.extension.includes('mp4')
remove-sequences-from-corpus
Takes a selection of annotations and modifies the corpus, using the input annotations as sequences, each sequence being removed from the corpus file, with its associated media segment and all the annotations included in that sequence.
Input: Array<Annotation>
rename-tiers
Rename tiers in all or specific corpus files.
-
fileregexp Restrict file(s) on which to rename tiers -
fromstring A single tier name to be replaced -
tostring The replacement name to use with the "from" attribute -
mapjs A JSON object mapping the names to replace (keys) to the new names (values), like {"OLD NAME":"NEW NAME"} -
templatejs =${name}A template function to build the name of each tier, called with the original tier name and object, like (name,tier) =>${name.toUpperCase()}. When this attribute is specified all tiers (eventually filtered via the "file" attribute) will be affected.
reset-corpus
Resets the pipeline corpus to its original state.
Useful when working with loops.
sequences-to-corpus
Takes a selection of annotations and recreates the corpus, using the input annotations as sequences, each sequence being transformed into one corpus file with its associated media file and all the annotations/tiers included in that sequence.
name-templatejs =sequence-${i}-${a.af.id}The template function to build the name of each corpus file, provided with the annotation and its index, like (a,i) =>${i} - ${a.value}
Input: Array<Annotation>
speaker-diarization-pyannote
Speaker diarization is the process of marking segments of voice with their speaker. This processor takes a selection of annotations, and adds to the corpus new annotations associated with their speaker tier.
-
hf-tokenstring The HuggingFace access token, required to download pyannote models -
speakersint = 0 Number of speakers in the audio
Input: Array<Annotation>
Output: Array<Annotation>
speech-to-text-faster-whisper
A speech to text processor using SYSTRAN Faster Whisper to transcribe and automatically create annotations
-
languagestring The language to transcribe, if not specified autodetect will be attempted -
modelselect = tiny The trained model to use for transcription(tiny, small, medium, large, distil-large-v3) -
temperaturenumber = 0 Temperature, adjust to fix hallucinations -
deviceselect = auto The processing device to use(auto, cpu, cuda) -
precisionselect = int8 (auto, int8, fp16, fp32) -
beam-sizeint = 5 The decoding beam size -
batchedbool = false Whether to use batch processing (faster) -
word-timestampsfalse Whether to output word-level timestamps -
output-tierstring = stt-faster-whisper Tier id for the extracted annotations -
vad-thresholdnumber = 0.5 Speech threshold. Silero VAD outputs speech probabilities for each audio chunk, probabilities ABOVE this value are considered as SPEECH. It is better to tune this parameter for each dataset separately, but "lazy" 0.5 is pretty good for most datasets. -
vad-min-speech-durationint = 250 Final speech chunks shorter than this are thrown out (in milliseconds) -
vad-max-speech-durationnumber Maximum duration of speech chunks in seconds. Chunks longer than max_speech_duration_s will be split at the timestamp of the last silence that lasts more than 100ms (if any), to prevent aggressive cutting. Otherwise, they will be split aggressively just before max-speech-duration. -
vad-min-silence-durationint = 2000 In the end of each speech chunk wait for min-silence-duration before separating it (in milliseconds). -
vad-speech-pad-msint = 400 Final speech chunks are padded by vad-speech-pad milliseconds each side -
verbosebool = false Whether to log the transcriptions as soon as they are detected
Output: Array<Annotation>
speech-to-text-whisper
A speech to text processor using OpenAI Whisper to transcribe and automatically create annotations
-
languagestring The language to transcribe, if not specified autodetect will be attempted -
modelstring = small The trained model to use for transcription(tiny, small, medium, large-v3) -
temperaturenumber = 0 Temperature, adjust to fix hallucinations -
output-tierstring = stt-whisper Tier id for the extracted annotations -
verbosebool = false Whether to log the transcriptions as soon as they are detected
Output: Array<Annotation>
speech-to-text-whisper-at ๐งช
A variation of OpenAI Whisper designed to extract audio events of the 527-class AudioSet, Whisper-AT processor outputs general audio events as annotations.
-
languagestring The language to transcribe will also affect the names of the audio events -
modelstring = tiny The trained model to use for transcription(tiny, small, medium, large-v3)
Output: Array<Annotation>
vad-silero
Silero's Voice Activity Detector processor creates annotations for each segment of input audio containing voice.
-
output-tierstring = vad-silero Tier id for the generated annotations -
thresholdnumber = 0.5 Use a higher threshold for noisy audio -
sampling-ratenumber = 16000 (8000, 16000, 32000, 48000) -
min-silence-durationint = 500 Number of milliseconds -
min-speech-durationint = 1000 Minimum duration (in milliseconds) of activity to consider a voice segment
Output: Array<Annotation>
video-anonymizer-cartoon
Anonymize videos with a cartoon effect and optional blurring.
-
effectselect = cartoon The type of anonymisation effect to apply on the video(cartoon, cartoon-blur) -
cartoon-diffspacenumber = 0.9995 The difference space parameter (between 0 and 1) -
cartoon-triplevelnumber = 0.004 The trip level parameter (between 0 and 1) -
blur-intensitynumber = 0.4 Intensity of blurring the cartoonized video
video-anonymizer-deface
Detect and blur faces with ORB-HD deface.
-
modeselect = blur Anonymization filter mode for face regions(blur, solid, mosaic) -
thresholdnumber = 0.2 Detection threshold (tune this to trade off between false positive and false negative rate) -
mask-scalenumber = 1.3 Scale factor for face masks, to make sure that masks cover the complete face -
mosaicsizeint = 20 Width of the mosaic squares when deface-mode is mosaic -
boxesbool = false Use boxes instead of ellipse masks -
draw-scoresbool = false Draw detection scores onto outputs, useful to find the best threshold -
downscalestring Downscale resolution for the network inference (WxH)