Operations

2 🗸

calculate-duration-by

Calculates sum/average of annotations' duration (in milliseconds) by grouping on a property (value/tier/group/participant).

If the property is not specified, input object will be directly used as groups.

If the property is the asterisk (wildcard), input annotations will be grouped in a single asterisk group.

(arg) string ~(group, tier, participant, value, start, duration, file, extra.*)
mode select = sum (sum, average, min, max, all)
factor int = 1 Convert milliseconds to other units
truncate int = -1 Truncate factored value to X decimal places

Input: Array<Annotation>
Input: Object<Array<Annotation>>

1 🗸

calculate-duration-of-pause

Calculates sum/average of annotations' pause duration (in milliseconds).

mode string = sum (sum, average, median, min, max, all)
factor int = 1 Convert milliseconds to other units
truncate int = -1 Truncate factored value to X decimal places
min-threshold int = 0 Min pause in milliseconds between 2 consecutive annotations (pauses shorter than this will not be counted)
max-threshold int = 500 Max pause in milliseconds between 2 consecutive annotations (pauses longer than this will not be counted)

Input: Array<Annotation>
Input: Object<Array<Annotation>>

calculate-percent-by

Calculates percentage of an annotations' property value occurrence.

(arg) string ~(group, tier, participant, value, start, duration, file, extra.*)
truncate int = 0 Truncate factored value to X decimal places

Input: Object<Array<Annotation>>

9 🕮

clone

Clones each annotation in the selection, so they can be modified without affecting the originals.

Cloned annotations are not added to the corpus, they exist temporarily as a selection.

Input: Array<Annotation>
Output: Array<Annotation>

combine-overlapping-annotations

Combine into one annotation all the overlapping annotations

Input: Array<Annotation>
Output: Array<Annotation>

combine-same-tier-consecutive-annotations

Combine into one annotation all the consecutive annotations of a same tier

value-separator string = , The separator to use when combining annotations' values missing punctuation
sentence-separator string A separator to insert between sentences

Input: Array<Annotation>
Output: Array<Annotation>

control-vocabulary

Checks each annotation value against a specified controlled vocabulary. If the value does not match a controlled vocabulary item, a warning will be issued. The operation can also fix common mistakes, in which case the original input annotations will be modified (use the "clone" operation beforehand to prevent original annotations modification).

vocabulary string The Controlled Vocabulary ID to use for checking the value of the annotations
fix-whitespaces bool = false Whether to correct when mistake is surrounding whitespaces
fix-repeats bool = false Whether to correct when mistake is a repeated value
return select = input The operation by default returns the original (eventually corrected) input, but can also return the list of valid (or invalid) annotations(input, valid, invalid, corrected)

Input: Array<Annotation>
Output: Array<Annotation>

convert-to-slices

Takes each tier and slices its annotations to segments of a specific duration, aligning all the segments between tiers. The initial alignment starts with the earliest annotation found.

slice-duration number = 0 The duration of each slice annotation, in seconds
slice-count int = 0 Alternative to slice-duration, the total amount of slices to create
overlap-threshold number = 0 todo

Input: Array<Annotation>
Output: Array<Annotation>

14 🗸

count

Counts the elements in each array element of the input array/object, replacing the array element itself with an integer value

depth int = 0 How deep to go in the input tree before counting
remove-uncountable bool = false Whether to remove the properties that are not countable
remove-uncountable-elements bool = false Whether to remove the uncountable elements from array, resulting in a shorter array
count-strings bool = false Whether to consider strings as arrays for counting

Input: Array<Array>
Input: Object<Array>
Output: Array<Integer>
Output: Object<Integer>

1 🗸

count-by

Groups array elements by a specific property and counts them.

(arg) string The property to group elements by(group, tier, participant, value, start, duration, file, extra.*)

Input: Array<Annotation>
Input: Object<Array<Annotation>>
Output: Object<Integer>
Output: Object<Array<Integer>>

count-object-keys

Count keys across all input objects

Input: Array<Object>
Output: Object

detect-sequences

Detects sequences by grouping annotations that are close to each other.

range number = 30 Minimum number of seconds between two annotations to consider as a new sequence
to-tier string = sequences The created tier wich will contain the sequences annotations
output choices = array The output type, which can be an Array of sequences' annotations, or an Object mapping a sequence to its list of annotations.(array, object)

Input: Array<Annotation>
Output: Array<Annotation>
Output: Object<Array<Annotation>>

8 🗸2 🕮

each

Iterates over the array(s) values, and executes the provided function.

(arg) js The function called on each array element like (o) => log.info(o.value)

Input: Array
Input: Object<Array>
Output: Array
Output: Object<Array>

4 🗸1 🕮

each-file

Iterates over each (corpus) file, and executes the provided function.

Example to log each corpus file id:

af => log.info(af.id)

Example to build an array of corpus files paths:

af => af.file.absolutePath

(arg) js The function called on each file, which can return a value to build the output array

Input: None
Output: Array

2 🕮

extend-duration

Takes each input annotation and extends its duration by changing their start and/or stop times.

This will by default clone all input annotations so the originals stay unaffected.

(arg) number = 0 Seconds to add before and after
before number = arg Seconds to add before start
after number = arg Seconds to add after stop
clone bool = true Whether to clone the annotation before changing its duration

Input: Array<Annotation>
Output: Array<Annotation>

9 🗸3 🕮

filter

Filters the input array(s) keeping only elements passing the filter expression.

Example to keep only annotations with a duration greater than 5 seconds:

a => a.duration > 5000

Example to keep only objects with a defined foo property:

o => o.foo

(arg) js The function used to filter each element

Input: Array
Input: Object<Array>
Output: Array
Output: Object<Array>

6 🗸

flatten

Transforms nested objects into a flat array of objects.

Takes all keys of the input object and map them to an array of objects, each object built upon the provided structure reflecting the key name and its associated value.

Syntax for one nested level:

{name, value}

Syntax for two nested levels:

{group, name, value}

Input: Object
Output: Array<Object>

4 🗸

group-by

Groups the array elements by the value of a specific element's property.

The result is an object whose keys are the property values, mapped to arrays of elements.

(arg) string The property to group elements by(group, tier, participant, value, start, duration, file, extra.*)

Input: Array
Input: Object<Array>
Output: Object<Array>
Output: Object<Object<Array>>

group-by-file-tags

Groups the annotations by specific file tags (comma separated or JSON array of strings). Corpus files can have multiple tags, therefore an annotation could appear in multiple groups. The result is an object whose keys are the chosen tags, mapped to arrays of annotations.

Input: Array<Annotation>
Output: Object<Array>

group-by-ref-value

improve-transcript 🧪

Improves transcribed content with various heuristic techniques.

hallucination-char-factor int = 1 todo
hallucination-time-diff int = 1000 Maximum pause time between 2 hallucinated annotations (in milliseconds)
merge-comma-end bool = true Merge 2 annotations if the first one ends with a comma
merge-lowercase-start bool = true Merge 2 annotations if the second one starts with a lower-case character

Input: Array<Annotation>
Output: Array<Annotation>

load-annotations-from-forms

Loads JSON files resulting from forms, as virtual annotations. The JSON files must be in the folder of the processed XML file.

dimension string The dimension to extract from the form
exclude-from-intercoding bool = false Whether to tag the extracted annotations so they are excluded from intercoding calculations
extract-original-annotations bool = false Whether to extract only the original annotations from the form, so the results are safe from corpus changes. When this attribute is specified, no dimension will be extracted: use again the operation to extract a dimension.
match-to-original-annotations bool = false Try to correct loaded annotations so they all have a match with the original ones

Input: None
Output: Array<Annotation>

6 🗸1 🕮

load-data-from-json

Loads a JSON file or set data directly from embedded JSON. The JSON file must be in the folder of the processed XML file.

json js json data
file string file name

Input: None
Output: Object
Output: Array

load-data-from-script

Runs a JS function which result will be set as current selection/data Global variables can also be used in scripts via variables.varname

script js a function called with current selection/data, which should return data

load-data-from-variables

Use variables to build an object

object-delimiter string A delimiter used to cut a variable name into an object structure

Output: Object
Output: Array

load-data-from-xls

Loads a XLS or CSV file.

file file file name

Input: None
Output: Object
Output: Array

2 🕮

map

Creates a new array populated with the results of calling the provided function on each element of the input array(s). Special pseudo-object syntax can be used to facilitate direct mapping of object properties:

annotation => ({value: annotation.value, tier: annotation.tier.id})

{value: .value, tier: .tier.id}

{value, tier: .tier.id}

(arg) js ~

Input: Array
Input: Object<Array>

mongo-clear

Clears a MongoDB Collection (drops the collection)

mongo-create-ref

Creates references between documents across collections

source-fields string Name of the document fields to transform into a reference (comma separated)
target-field The field that will hold the reference(s) (an array if multiple source fields are specified)
reference-collection string The collection containing the (eventually created) references
reference-field string The reference field to compare values
trim bool = false Whether to trim the values before comparing them

Output: Array<Object>

mongo-find

Loads objects from a MongoDB Collection

query js The mongo query, a plain JS object like {property:"value"}
projection js The query projection, a plain array like ['prop1','prop2'] listing the properties to return
limit int = 1000 The maximum number of documents to return

Output: Array<Object>

mongo-insert

Inserts input objects into a MongoDB Collection

Input: Array<Object>
Input: Object

mongo-remove

Removes objects from a MongoDB Collection

Input: Array<Object>
Input: Object

1 🗸2 🕮

randomize

Randomizes the input selection with a PRNG allowing reproducible results based on an initial seed.

limit int = 0 Maximum number of elements for the output array
limit-per-file 0 Maximum number of annotations to take from one particular annotation file (if the input array contains annotations)
seed int = 1 The initial seed for reproducible randomness
prng string = LCG The algorithm for random number generation

Input: Array
Output: Array

3 🗸

reduce

Executes a "reducer" function on each element of the input array(s), in order, passing in the return value from the calculation on the preceding element. The final result of running the reducer across all elements of an array is a single value.

(accumulator, currentValue) => accumulator + currentValue,  initialValue

Input: Array
Input: Object<Array>
Output: Integer
Output: Object<Integer>

replace-with-annotations-in-sequence

Considers each input annotation as a sequence, and selects all annotations (of the same file) that are included in the sequences.

overlap bool = true Whether to include overlaping annotations (not fully contained in the sequence)
range number = 0 Number of seconds to add before and after the sequence, for considering annotations in sequence
range-before number = 0 Additional seconds to add before
range-after number = 0 Additional seconds to add after
distinct bool = false Whether to remove duplicate annotations from the resulting list
combine bool = false Whether to combine in one final annotation, all the annotations found in the sequence
limit int = 0 Max number of annotations to return (0 = all annotations found in sequence, 1 = first annotation found)
reverse bool = false Whether to reverse the list of annotations found in sequence before applying the limit. This can be used to return the last annotation found in sequence (via limit=1).
separator string = | A separator to insert when combining values of multiple annotations
default-to-null bool = false Whether to add a null element in the output array if no corresponding annotation is found for a given input annotation (default to false, or true if the operation runs in an EXTRA block)

Input: Array<Annotation>
Output: Array<Annotation>

1 🕮

replace-with-annotations-in-sequence-from-tier

Considers each input annotation as a sequence, and selects those from another tier (of the same file) that are included in the sequences.

(arg) tier ~
overlap bool = true Whether to include overlaping annotations (not fully contained in the sequence)
range number = 0 Number of seconds to add before and after the sequence, for considering annotations in sequence
range-before number = 0 Additional seconds to add before
range-after number = 0 Additional seconds to add after
distinct bool = false Whether to remove duplicate annotations from the resulting list
combine bool = false Whether to combine in one final annotation, all the annotations found in the sequence
limit int = 0 Max number of annotations to return (0 = all annotations found in sequence, 1 = first annotation found)
reverse bool = false Whether to reverse the list of annotations found in sequence before applying the limit. This can be used to return the last annotation found in sequence (via limit=1).
separator string = | A separator to insert when combining values of multiple annotations
default-to-null bool = false Whether to add a null element in the output array if no corresponding annotation is found for a given input annotation (default to false, or true if the operation runs in an EXTRA block)

Input: Array<Annotation>
Output: Array<Annotation>

replace-with-next-timecode-annotations-from-tier

Replaces each input annotation with one from another tier (of the same annotations' file), the first found whose start time is after input annotation's start time.

range int = 0 Maximum time range to find next annotation (0 = no maximum)

Input: Array<Annotation>
Output: Array<Annotation>

replace-with-previous-timecode-annotations-from-tier

Replaces each input annotation with one from another tier, the first found whose start time is before input annotation's start time.

range int = 0 Maximum time range to find next annotation (0 = no maximum)

Input: Array<Annotation>
Output: Array<Annotation>

replace-with-same-timecode-annotations-from-tier

Replaces each input annotation with one from another tier that has the same start time.

(arg) tier The tier from which to find annotations with same timecode
multiple bool = false Whether to select multiple annotations if more than one is in range
range number = 0 Acceptable range in seconds to consider 2 timecodes as equivalent
default-to-null bool Whether to add a null element in the output array if no corresponding annotation is found for a given input annotation (default to false, or true if the operation runs in an EXTRA block)

Input: Array<Annotation>
Output: Array<Annotation>

sanitize-strip-xml-tags

Remove XML tags from annotations' value

Input: Array<Annotation>
Output: Array<Annotation>

1 🕮

save-data-to-csv

Saves current data to a CSV file. The file will be saved in the processed XML file's folder, overwriting any existing file.

Input: Object
Input: Array
Output: Input

save-data-to-json

Saves current data to a JSON file. The file will be saved in the processed XML file's folder, overwriting any existing file.

Input: Object
Input: Array
Output: Input

3 🗸1 🕮

save-variable

Sets a global variable which becomes available in HTML blocks as {{varname}}. It can then be used in scripts with variables.varname, and in attributes via the ${} syntax, like ${0.05*variables.counter}.

If the value attribute is not defined, value saved will be the current selection/data

value js a function called with current selection/data, which should return the value of the variable

Input: Any
Output: Input

4 🕮

scrape

Loads an URL (or a file) and extract data

url string URL to fetch for scraping
file file a file path to load content instead of an URL
cache bool = false Whether to cache in memory the downloaded URL
jsoup js a function to extract data called with a Jsoup document as argument (to scrape HTML)
js js a function to extract data called with the content as a string argument (to scrape JSON, text files...)

Input: None
Output: Any

selection-to-data

Transforms input annotations to an array of objects

export-value bool = true
export-value-as string = value
export-tier bool = true
export-tier-as string = tier
export-start bool = true
export-start-as string = start
export-stop bool = true
export-stop-as string = stop

Input: Array<Annotation>
Output: Array<Object>

set-tag

Adds or removes annotation's tag.

Example to add the "relevant" tag: "set-tag = +relevant"

(arg) string The name of the tag, prefixed with + to add, or - to remove the tag

Input: Array<Annotation>
Output: Array<Annotation>

5 🕮

set-tier

Changes the tier of each input annotation.

By default will clone annotations to keep original ones unaffected.

Examples of template function:

(annotation, tier) => `Speaker ${tier.id}`

(a, tier) => tier.id.toUpperCase()

(arg) string Name of the tier to associate the annotations to
clone bool = true
template js A function called for each annotation, that must return its new tier name

Input: Array<Annotation>
Output: Array<Annotation>

2 🗸3 🕮

sort

Sorts input array by a given field

natural bool = false Whether to sort based only on the digits contained in the field, and not string comparison
func js A custom comparison function like (a,b)=>(a>b?-1:1)

Input: Array
Output: Array

sort-by-file

Sorts annotations first by their file, and then by the provided field or compare function. When providing a field, first character must indicate sorting order with +/- (ascending/descending)

order string (+) (ascending/descending) Sorting order of the AF groups(+/-)
field string A field to sort on, like "+start"

Input: Array<Annotation>
Output: Array<Annotation>

sort-by-group

Sorts annotations by their group, and then inside each group by their relative start time (using eventual file time-offset) If no field is provided, sorting will be done around //TODO

order string (+) Sorting order of groups (ascending/descending)(+, -)
natural bool = false Natural integer sorting instead of string
field string A field to sort on, like "+start"

Input: Array<Annotation>
Output: Array<Annotation>

swap-nested-objects

Transforms an object of structure {A:{a:1,b:2}} into structure {a:{A:1},b:{A:2}}

Input: Object<Object>
Output: Object<Object>

5 🗸

type-token

Computes the type-token of input annotations.

group-by-attribute string An optional attribute name to be used for grouping type-tokens together
strip-punctuation bool = true Whether to remove all punctuation before type-token processing
case-sensitive bool = false Whether to consider capital letters in words comparison
replace-regex regexp A regular expression to replace text before type-token processing
replace-func js A js function to replace text before type-token processing
split-func js A js function to use instead of the basic space splitting, for extracting words from strings

Input: Array<Annotation>
Input: Object<Array<Annotation>>
Input: Object<Array<String>>
Input: Object<String>
Output: Object