Operations

calculate-duration-by

Calculates sum/average of annotations' duration (in milliseconds) by grouping on a property (value/tier/group/participant).

If the property is not specified, input object will be directly used as groups.

If the property is the asterisk (wildcard), input annotations will be grouped in a single asterisk group.

  • (arg) string ~(group, tier, participant, value, start, duration, file, extra.*)

  • mode select = sum (sum, average, min, max, all)

  • factor int = 1 Convert milliseconds to other units

  • truncate int = -1 Truncate factored value to X decimal places

Input: Array<Annotation>
Input: Object<Array<Annotation>>


calculate-duration-of-pause

Calculates sum/average of annotations' pause duration (in milliseconds).

  • mode string = sum (sum, average, median, min, max, all)

  • factor int = 1 Convert milliseconds to other units

  • truncate int = -1 Truncate factored value to X decimal places

  • min-threshold int = 0 Min pause in milliseconds between 2 consecutive annotations (pauses shorter than this will not be counted)

  • max-threshold int = 500 Max pause in milliseconds between 2 consecutive annotations (pauses longer than this will not be counted)

Input: Array<Annotation>
Input: Object<Array<Annotation>>


calculate-percent-by

Calculates percentage of an annotations' property value occurrence.

  • (arg) string ~(group, tier, participant, value, start, duration, file, extra.*)

  • truncate int = 0 Truncate factored value to X decimal places

Input: Object<Array<Annotation>>


clone

Clones each annotation in the selection, so they can be modified without affecting the originals.

Input: Array<Annotation>
Output: Array<Annotation>


combine-overlapping-annotations

Combine into one annotation all the overlapping annotations

Input: Array<Annotation>
Output: Array<Annotation>


combine-same-tier-consecutive-annotations

Combine into one annotation all the consecutive annotations of a same tier

  • value-separator string = , The separator to use when combining annotations' values missing punctuation

  • sentence-separator string A separator to insert between sentences

Input: Array<Annotation>
Output: Array<Annotation>


control-vocabulary

Checks each annotation value against a specified controlled vocabulary. If the value does not match a controlled vocabulary item, a warning will be issued. The operation can also fix common mistakes, in which case the original input annotations will be modified (use the "clone" operation beforehand to prevent original annotations modification).

  • vocabulary string The Controlled Vocabulary ID to use for checking the value of the annotations

  • fix-whitespaces bool = false Whether to correct when mistake is surrounding whitespaces

  • fix-repeats bool = false Whether to correct when mistake is a repeated value

  • return select = input The operation by default returns the original (eventually corrected) input, but can also return the list of valid (or invalid) annotations(input, valid, invalid, corrected)

Input: Array<Annotation>
Output: Array<Annotation>


convert-to-slices

Takes each tier and slices its annotations to segments of a specific duration, aligning all the segments between tiers. The initial alignment starts with the earliest annotation found.

  • slice-duration number = 0 The duration of each slice annotation, in seconds

  • slice-count int = 0 Alternative to slice-duration, the total amount of slices to create

  • overlap-threshold number = 0 todo

Input: Array<Annotation>
Output: Array<Annotation>


count

Counts the elements in each array element of the input array/object, replacing the array element itself with an integer value

  • depth int = 0 How deep to go in the input tree before counting

  • remove-uncountable bool = false Whether to remove the properties that are not countable

  • remove-uncountable-elements bool = false Whether to remove the uncountable elements from array, resulting in a shorter array

  • count-strings bool = false Whether to consider strings as arrays for counting

Input: Array<Array>
Input: Object<Array>
Output: Array<Integer>
Output: Object<Integer>


count-by

Groups array elements by a specific property and counts them.

  • (arg) string The property to group elements by(group, tier, participant, value, start, duration, file, extra.*)

Input: Array<Annotation>
Input: Object<Array<Annotation>>
Output: Object<Integer>
Output: Object<Array<Integer>>


count-object-keys

Count keys across all input objects

Input: Array<Object>
Output: Object


detect-sequences

Detects sequences by grouping annotations that are close to each other.

  • range number = 30 Minimum number of seconds between two annotations to consider as a new sequence

  • to-tier string = sequences The created tier wich will contain the sequences annotations

  • output choices = array The output type, which can be an Array of sequences' annotations, or an Object mapping a sequence to its list of annotations.(array, object)

Input: Array<Annotation>
Output: Array<Annotation>
Output: Object<Array<Annotation>>


each

Iterates over the array(s) values, and executes the provided function.

  • (arg) js The function called on each array element like (o) => log.info(o.value)

Input: Array
Input: Object<Array>
Output: Array
Output: Object<Array>


each-file

Iterates over each (corpus) file, and executes the provided function.

Example to log each corpus file id:

af => log.info(af.id)

Example to build an array of corpus files paths:

af => af.file.getAbsolutePath()
  • (arg) js The function called on each file, which can return a value to build the output array

Input: None
Output: Array
Output: Undefined


extend-duration

Takes each input annotation and extends its duration by changing their start and/or stop times.

This will by default clone all input annotations so the originals stay unaffected.

  • (arg) number = 0 Seconds to add before and after

  • before number = arg Seconds to add before start

  • after number = arg Seconds to add after stop

  • clone bool = true Whether to clone the annotation before changing its duration

Input: Array<Annotation>
Output: Array<Annotation>


filter

Filters the input array(s) keeping only elements passing the filter expression.

Example to keep only annotations with a duration greater than 5 seconds:

a => a.duration > 5000

Example to keep only objects with a defined foo property:

o => o.foo
  • (arg) js The function used to filter each element

Input: Array
Input: Object<Array>
Output: Array
Output: Object<Array>


flatten

Transforms nested objects into a flat array of objects.

Takes all keys of the input object and map them to an array of objects, each object built upon the provided structure reflecting the key name and its associated value.

Syntax for one nested level:

{name,value}

Syntax for two nested levels:

{group,name,value}

Input: Object
Output: Array<Object>


group-by

Groups the array elements by the value of a specific element's property.

The result is an object whose keys are the property values, mapped to arrays of elements.

  • (arg) string The property to group elements by(group, tier, participant, value, start, duration, file, extra.*)

Input: Array
Input: Object<Array>
Output: Object<Array>
Output: Object<Object<Array>>


group-by-file-tags

Groups the annotations by specific file tags (comma separated or JSON array of strings). Corpus files can have multiple tags, therefore an annotation could appear in multiple groups. The result is an object whose keys are the chosen tags, mapped to arrays of annotations.

Input: Array<Annotation>
Output: Object<Array>


group-by-ref-value


improve-transcript 🧪

Improves transcribed content with various heuristic techniques.

  • hallucination-char-factor int = 1 todo

  • hallucination-time-diff int = 1000 Maximum pause time between 2 hallucinated annotations (in milliseconds)

  • merge-comma-end bool = true Merge 2 annotations if the first one ends with a comma

  • merge-lowercase-start bool = true Merge 2 annotations if the second one starts with a lower-case character

Input: Array<Annotation>
Output: Array<Annotation>


load-annotations-from-forms

Loads JSON files resulting from forms, as virtual annotations. The JSON files must be in the folder of the processed XML file.

  • dimension string The dimension to extract from the form

  • exclude-from-intercoding bool = false Whether to tag the extracted annotations so they are excluded from intercoding calculations

  • extract-original-annotations bool = false Whether to extract only the original annotations from the form, so the results are safe from corpus changes. When this attribute is specified, no dimension will be extracted: use again the operation to extract a dimension.

  • match-to-original-annotations bool = false Try to correct loaded annotations so they all have a match with the original ones

Input: None
Output: Array<Annotation>


load-data-from-json

Loads a JSON file or set data directly from embedded JSON. The JSON file must be in the folder of the processed XML file.

  • json js json data

  • file string file name

Input: None
Output: Object
Output: Array


load-data-from-script

Runs a JS function which result will be set as current selection/data Global variables can also be used in scripts via variables.varname

  • script js a function called with current selection/data, which should return data

load-data-from-variables

Use variables to build an object

  • object-delimiter string A delimiter used to cut a variable name into an object structure

Output: Object
Output: Array


load-data-from-xls

Loads a XLS or CSV file.

  • file file file name

Input: None
Output: Object
Output: Array


map

Creates a new array populated with the results of calling the provided function on each element of the input array(s). Special pseudo-object syntax can be used to facilitate direct mapping of object properties:

annotation => ({value: annotation.value, tier: annotation.tier.id})
{value: .value, tier: .tier.id}
{value, tier: .tier.id}
  • (arg) js ~

Input: Array
Input: Object<Array>


mongo-clear

Clears a MongoDB Collection (drops the collection)


mongo-create-ref

Creates references between documents across collections

  • source-fields string Name of the document fields to transform into a reference (comma separated)

  • target-field The field that will hold the reference(s) (an array if multiple source fields are specified)

  • reference-collection string The collection containing the (eventually created) references

  • reference-field string The reference field to compare values

  • trim bool = false Whether to trim the values before comparing them

Output: Array<Object>


mongo-find

Loads objects from a MongoDB Collection

  • query js The mongo query, a plain JS object like {property:"value"}

  • projection js The query projection, a plain array like ['prop1','prop2'] listing the properties to return

  • limit int = 1000 The maximum number of documents to return

Output: Array<Object>


mongo-insert

Inserts input objects into a MongoDB Collection

Input: Array<Object>
Input: Object


mongo-remove

Removes objects from a MongoDB Collection

Input: Array<Object>
Input: Object


randomize

Randomizes the input selection with a PRNG allowing reproducible results based on an initial seed.

  • limit int = 0 Maximum number of elements for the output array

  • limit-per-file 0 Maximum number of annotations to take from one particular annotation file (if the input array contains annotations)

  • seed int = 1 The initial seed for reproducible randomness

  • prng string = LCG The algorithm for random number generation

Input: Array
Output: Array


reduce

Executes a "reducer" function on each element of the input array(s), in order, passing in the return value from the calculation on the preceding element. The final result of running the reducer across all elements of an array is a single value.

(accumulator, currentValue) => accumulator + currentValue,  initialValue

Input: Array
Input: Object<Array>
Output: Integer
Output: Object<Integer>


replace-with-annotations-in-sequence

Considers each input annotation as a sequence, and selects all annotations (of the same file) that are included in the sequences.

  • overlap bool = true Whether to include overlaping annotations (not fully contained in the sequence)

  • range number = 0 Number of seconds to add before and after the sequence, for considering annotations in sequence

  • range-before number = 0 Additional seconds to add before

  • range-after number = 0 Additional seconds to add after

  • distinct bool = false Whether to remove duplicate annotations from the resulting list

  • combine bool = false Whether to combine in one final annotation, all the annotations found in the sequence

  • limit int = 0 Max number of annotations to return (0 = all annotations found in sequence, 1 = first annotation found)

  • reverse bool = false Whether to reverse the list of annotations found in sequence before applying the limit. This can be used to return the last annotation found in sequence (via limit=1).

  • separator string = | A separator to insert when combining values of multiple annotations

  • default-to-null bool = false Whether to add a null element in the output array if no corresponding annotation is found for a given input annotation (default to false, or true if the operation runs in an EXTRA block)

Input: Array<Annotation>
Output: Array<Annotation>


replace-with-annotations-in-sequence-from-tier

Considers each input annotation as a sequence, and selects those from another tier (of the same file) that are included in the sequences.

  • (arg) tier ~

  • overlap bool = true Whether to include overlaping annotations (not fully contained in the sequence)

  • range number = 0 Number of seconds to add before and after the sequence, for considering annotations in sequence

  • range-before number = 0 Additional seconds to add before

  • range-after number = 0 Additional seconds to add after

  • distinct bool = false Whether to remove duplicate annotations from the resulting list

  • combine bool = false Whether to combine in one final annotation, all the annotations found in the sequence

  • limit int = 0 Max number of annotations to return (0 = all annotations found in sequence, 1 = first annotation found)

  • reverse bool = false Whether to reverse the list of annotations found in sequence before applying the limit. This can be used to return the last annotation found in sequence (via limit=1).

  • separator string = | A separator to insert when combining values of multiple annotations

  • default-to-null bool = false Whether to add a null element in the output array if no corresponding annotation is found for a given input annotation (default to false, or true if the operation runs in an EXTRA block)

Input: Array<Annotation>
Output: Array<Annotation>


replace-with-next-timecode-annotations-from-tier

Replaces each input annotation with one from another tier (of the same annotations' file), the first found whose start time is after input annotation's start time.

  • range int = 0 Maximum time range to find next annotation (0 = no maximum)

Input: Array<Annotation>
Output: Array<Annotation>


replace-with-previous-timecode-annotations-from-tier

Replaces each input annotation with one from another tier, the first found whose start time is before input annotation's start time.

  • range int = 0 Maximum time range to find next annotation (0 = no maximum)

Input: Array<Annotation>
Output: Array<Annotation>


replace-with-same-timecode-annotations-from-tier

Replaces each input annotation with one from another tier that has the same start time.

  • (arg) tier The tier from which to find annotations with same timecode

  • multiple bool = false Whether to select multiple annotations if more than one is in range

  • range number = 0 Acceptable range in seconds to consider 2 timecodes as equivalent

  • default-to-null bool Whether to add a null element in the output array if no corresponding annotation is found for a given input annotation (default to false, or true if the operation runs in an EXTRA block)

Input: Array<Annotation>
Output: Array<Annotation>


sanitize-strip-xml-tags

Remove XML tags from annotations' value

Input: Array<Annotation>
Output: Array<Annotation>


save-data-to-csv

Saves current data to a CSV file. The file will be saved in the processed XML file's folder, overwriting any existing file.

Input: Object
Input: Array
Output: Input


save-data-to-json

Saves current data to a JSON file. The file will be saved in the processed XML file's folder, overwriting any existing file.

Input: Object
Input: Array
Output: Input


save-variable

Sets a global variable which becomes available in HTML blocks as {{varname}}. It can then be used in scripts with variables.varname, and in attributes via the ${} syntax, like ${0.05*variables.counter}.

If the value attribute is not defined, value saved will be the current selection/data

  • value js a function called with current selection/data, which should return the value of the variable

Input: Any
Output: Input


scrape

Loads an URL (or a file) and extract data

  • url string URL to fetch for scraping

  • file file a file path to load content instead of an URL

  • cache bool = false Whether to cache in memory the downloaded URL

  • jsoup js a function to extract data called with a Jsoup document as argument (to scrape HTML)

  • js js a function to extract data called with the content as a string argument (to scrape JSON, text files...)

Input: None
Output: Any


selection-to-data

Transforms input annotations to an array of objects

  • export-value bool = true

  • export-value-as string = value

  • export-tier bool = true

  • export-tier-as string = tier

  • export-start bool = true

  • export-start-as string = start

  • export-stop bool = true

  • export-stop-as string = stop

Input: Array<Annotation>
Output: Array<Object>


set-tag

Adds or removes annotation's tag.

Example to add the "relevant" tag: "set-tag = +relevant"

  • (arg) string The name of the tag, prefixed with + to add, or - to remove the tag

Input: Array<Annotation>
Output: Array<Annotation>


set-tier

Changes the tier of each input annotation.

By default will clone annotations to keep original ones unaffected.

Examples of template function:

(annotation, tier) => `Speaker ${tier.id}`
(a, tier) => tier.id.toUpperCase()
  • (arg) string Name of the tier to associate the annotations to

  • clone bool = true

  • template js A function called for each annotation, that must return its new tier name

Input: Array<Annotation>
Output: Array<Annotation>


sort

Sorts input array by a given field

  • natural bool = false Whether to sort based only on the digits contained in the field, and not string comparison

  • func js A custom comparison function like (a,b)=>(a>b?-1:1)

Input: Array
Output: Array


sort-by-file

Sorts annotations first by their file, and then by the provided field or compare function. When providing a field, first character must indicate sorting order with +/- (ascending/descending)

  • order string (+) (ascending/descending) Sorting order of the AF groups(+/-)

  • field string A field to sort on, like "+start"

Input: Array<Annotation>
Output: Array<Annotation>


sort-by-group

Sorts annotations by their group, and then inside each group by their relative start time (using eventual file time-offset) If no field is provided, sorting will be done around //TODO

  • order string (+) Sorting order of groups (ascending/descending)(+, -)

  • natural bool = false Natural integer sorting instead of string

  • field string A field to sort on, like "+start"

Input: Array<Annotation>
Output: Array<Annotation>


swap-nested-objects

Transforms an object of structure {A:{a:1,b:2}} into structure {a:{A:1},b:{A:2}}

Input: Object<Object>
Output: Object<Object>


type-token

Computes the type-token of input annotations.

  • group-by-attribute string An optional attribute name to be used for grouping type-tokens together

  • strip-punctuation bool = true Whether to remove all punctuation before type-token processing

  • case-sensitive bool = false Whether to consider capital letters in words comparison

  • replace-regex string A regular expression to replace text before type-token processing

  • replace-func string A js function to replace text before type-token processing

  • split-func string A js function to use instead of the basic space splitting, for extracting words from strings

Input: Array<Annotation>
Input: Object<Array<Annotation>>
Input: Object<Array<String>>
Input: Object<String>
Output: Object