Operations
calculate-duration-by
Calculates sum/average of annotations' duration (in milliseconds) by grouping on a property (value/tier/group/participant).
If the property is not specified, input object will be directly used as groups.
If the property is the asterisk (wildcard), input annotations will be grouped in a single asterisk group.
-
(arg)string ~(group, tier, participant, value, start, duration, file, extra.*) -
modeselect = sum (sum, average, min, max, all) -
factorint = 1 Convert milliseconds to other units -
truncateint = -1 Truncate factored value to X decimal places
Input: Array<Annotation>
Input: Object<Array<Annotation>>
calculate-duration-of-pause
Calculates sum/average of annotations' pause duration (in milliseconds).
-
modestring = sum (sum, average, median, min, max, all) -
factorint = 1 Convert milliseconds to other units -
truncateint = -1 Truncate factored value to X decimal places -
min-thresholdint = 0 Min pause in milliseconds between 2 consecutive annotations (pauses shorter than this will not be counted) -
max-thresholdint = 500 Max pause in milliseconds between 2 consecutive annotations (pauses longer than this will not be counted)
Input: Array<Annotation>
Input: Object<Array<Annotation>>
calculate-percent-by
Calculates percentage of an annotations' property value occurrence.
-
(arg)string ~(group, tier, participant, value, start, duration, file, extra.*) -
truncateint = 0 Truncate factored value to X decimal places
Input: Object<Array<Annotation>>
clone
Clones each annotation in the selection, so they can be modified without affecting the originals.
Input: Array<Annotation>
Output: Array<Annotation>
combine-overlapping-annotations
Combine into one annotation all the overlapping annotations
Input: Array<Annotation>
Output: Array<Annotation>
combine-same-tier-consecutive-annotations
Combine into one annotation all the consecutive annotations of a same tier
-
value-separatorstring = , The separator to use when combining annotations' values missing punctuation -
sentence-separatorstring A separator to insert between sentences
Input: Array<Annotation>
Output: Array<Annotation>
control-vocabulary
Checks each annotation value against a specified controlled vocabulary. If the value does not match a controlled vocabulary item, a warning will be issued. The operation can also fix common mistakes, in which case the original input annotations will be modified (use the "clone" operation beforehand to prevent original annotations modification).
-
vocabularystring The Controlled Vocabulary ID to use for checking the value of the annotations -
fix-whitespacesbool = false Whether to correct when mistake is surrounding whitespaces -
fix-repeatsbool = false Whether to correct when mistake is a repeated value -
returnselect = input The operation by default returns the original (eventually corrected) input, but can also return the list of valid (or invalid) annotations(input, valid, invalid, corrected)
Input: Array<Annotation>
Output: Array<Annotation>
convert-to-slices
Takes each tier and slices its annotations to segments of a specific duration, aligning all the segments between tiers. The initial alignment starts with the earliest annotation found.
-
slice-durationnumber = 0 The duration of each slice annotation, in seconds -
slice-countint = 0 Alternative to slice-duration, the total amount of slices to create -
overlap-thresholdnumber = 0 todo
Input: Array<Annotation>
Output: Array<Annotation>
count
Counts the elements in each array element of the input array/object, replacing the array element itself with an integer value
-
depthint = 0 How deep to go in the input tree before counting -
remove-uncountablebool = false Whether to remove the properties that are not countable -
remove-uncountable-elementsbool = false Whether to remove the uncountable elements from array, resulting in a shorter array -
count-stringsbool = false Whether to consider strings as arrays for counting
Input: Array<Array>
Input: Object<Array>
Output: Array<Integer>
Output: Object<Integer>
count-by
Groups array elements by a specific property and counts them.
(arg)string The property to group elements by(group, tier, participant, value, start, duration, file, extra.*)
Input: Array<Annotation>
Input: Object<Array<Annotation>>
Output: Object<Integer>
Output: Object<Array<Integer>>
count-object-keys
Count keys across all input objects
Input: Array<Object>
Output: Object
detect-sequences
Detects sequences by grouping annotations that are close to each other.
-
rangenumber = 30 Minimum number of seconds between two annotations to consider as a new sequence -
to-tierstring = sequences The created tier wich will contain the sequences annotations -
outputchoices = array The output type, which can be an Array of sequences' annotations, or an Objectmapping a sequence to its list of annotations.(array, object)
Input: Array<Annotation>
Output: Array<Annotation>
Output: Object<Array<Annotation>>
each
Iterates over the array(s) values, and executes the provided function.
(arg)js The function called on each array element like (o) => log.info(o.value)
Input: Array
Input: Object<Array>
Output: Array
Output: Object<Array>
each-file
Iterates over each (corpus) file, and executes the provided function.
Example to log each corpus file id:
af => log.info(af.id)
Example to build an array of corpus files paths:
af => af.file.getAbsolutePath()
(arg)js The function called on each file, which can return a value to build the output array
Input: None
Output: Array
Output: Undefined
extend-duration
Takes each input annotation and extends its duration by changing their start and/or stop times.
This will by default clone all input annotations so the originals stay unaffected.
-
(arg)number = 0 Seconds to add before and after -
beforenumber = arg Seconds to add before start -
afternumber = arg Seconds to add after stop -
clonebool = true Whether to clone the annotation before changing its duration
Input: Array<Annotation>
Output: Array<Annotation>
filter
Filters the input array(s) keeping only elements passing the filter expression.
Example to keep only annotations with a duration greater than 5 seconds:
a => a.duration > 5000
Example to keep only objects with a defined foo property:
o => o.foo
(arg)js The function used to filter each element
Input: Array
Input: Object<Array>
Output: Array
Output: Object<Array>
flatten
Transforms nested objects into a flat array of objects.
Takes all keys of the input object and map them to an array of objects, each object built upon the provided structure reflecting the key name and its associated value.
Syntax for one nested level:
{name,value}
Syntax for two nested levels:
{group,name,value}
Input: Object
Output: Array<Object>
group-by
Groups the array elements by the value of a specific element's property.
The result is an object whose keys are the property values, mapped to arrays of elements.
(arg)string The property to group elements by(group, tier, participant, value, start, duration, file, extra.*)
Input: Array
Input: Object<Array>
Output: Object<Array>
Output: Object<Object<Array>>
group-by-file-tags
Groups the annotations by specific file tags (comma separated or JSON array of strings). Corpus files can have multiple tags, therefore an annotation could appear in multiple groups. The result is an object whose keys are the chosen tags, mapped to arrays of annotations.
Input: Array<Annotation>
Output: Object<Array>
group-by-ref-value
improve-transcript 🧪
Improves transcribed content with various heuristic techniques.
-
hallucination-char-factorint = 1 todo -
hallucination-time-diffint = 1000 Maximum pause time between 2 hallucinated annotations (in milliseconds) -
merge-comma-endbool = true Merge 2 annotations if the first one ends with a comma -
merge-lowercase-startbool = true Merge 2 annotations if the second one starts with a lower-case character
Input: Array<Annotation>
Output: Array<Annotation>
load-annotations-from-forms
Loads JSON files resulting from forms, as virtual annotations. The JSON files must be in the folder of the processed XML file.
-
dimensionstring The dimension to extract from the form -
exclude-from-intercodingbool = false Whether to tag the extracted annotations so they are excluded from intercoding calculations -
extract-original-annotationsbool = false Whether to extract only the original annotations from the form, so the results are safe from corpus changes. When this attribute is specified, no dimension will be extracted: use again the operation to extract a dimension. -
match-to-original-annotationsbool = false Try to correct loaded annotations so they all have a match with the original ones
Input: None
Output: Array<Annotation>
load-data-from-json
Loads a JSON file or set data directly from embedded JSON. The JSON file must be in the folder of the processed XML file.
-
jsonjs json data -
filestring file name
Input: None
Output: Object
Output: Array
load-data-from-script
Runs a JS function which result will be set as current selection/data Global variables can also be used in scripts via variables.varname
scriptjs a function called with current selection/data, which should return data
load-data-from-variables
Use variables to build an object
object-delimiterstring A delimiter used to cut a variable name into an object structure
Output: Object
Output: Array
load-data-from-xls
Loads a XLS or CSV file.
filefile file name
Input: None
Output: Object
Output: Array
map
Creates a new array populated with the results of calling the provided function on each element of the input array(s). Special pseudo-object syntax can be used to facilitate direct mapping of object properties:
annotation => ({value: annotation.value, tier: annotation.tier.id})
{value: .value, tier: .tier.id}
{value, tier: .tier.id}
(arg)js ~
Input: Array
Input: Object<Array>
mongo-clear
Clears a MongoDB Collection (drops the collection)
mongo-create-ref
Creates references between documents across collections
-
source-fieldsstring Name of the document fields to transform into a reference (comma separated) -
target-fieldThe field that will hold the reference(s) (an array if multiple source fields are specified) -
reference-collectionstring The collection containing the (eventually created) references -
reference-fieldstring The reference field to compare values -
trimbool = false Whether to trim the values before comparing them
Output: Array<Object>
mongo-find
Loads objects from a MongoDB Collection
-
queryjs The mongo query, a plain JS object like {property:"value"} -
projectionjs The query projection, a plain array like ['prop1','prop2'] listing the properties to return -
limitint = 1000 The maximum number of documents to return
Output: Array<Object>
mongo-insert
Inserts input objects into a MongoDB Collection
Input: Array<Object>
Input: Object
mongo-remove
Removes objects from a MongoDB Collection
Input: Array<Object>
Input: Object
randomize
Randomizes the input selection with a PRNG allowing reproducible results based on an initial seed.
-
limitint = 0 Maximum number of elements for the output array -
limit-per-file0 Maximum number of annotations to take from one particular annotation file (if the input array contains annotations) -
seedint = 1 The initial seed for reproducible randomness -
prngstring = LCG The algorithm for random number generation
Input: Array
Output: Array
reduce
Executes a "reducer" function on each element of the input array(s), in order, passing in the return value from the calculation on the preceding element. The final result of running the reducer across all elements of an array is a single value.
(accumulator, currentValue) => accumulator + currentValue, initialValue
Input: Array
Input: Object<Array>
Output: Integer
Output: Object<Integer>
replace-with-annotations-in-sequence
Considers each input annotation as a sequence, and selects all annotations (of the same file) that are included in the sequences.
-
overlapbool = true Whether to include overlaping annotations (not fully contained in the sequence) -
rangenumber = 0 Number of seconds to add before and after the sequence, for considering annotations in sequence -
range-beforenumber = 0 Additional seconds to add before -
range-afternumber = 0 Additional seconds to add after -
distinctbool = false Whether to remove duplicate annotations from the resulting list -
combinebool = false Whether to combine in one final annotation, all the annotations found in the sequence -
limitint = 0 Max number of annotations to return (0 = all annotations found in sequence, 1 = first annotation found) -
reversebool = false Whether to reverse the list of annotations found in sequence before applying the limit. This can be used to return the last annotation found in sequence (via limit=1). -
separatorstring = | A separator to insert when combining values of multiple annotations -
default-to-nullbool = false Whether to add a null element in the output array if no corresponding annotation is found for a given input annotation (default to false, or true if the operation runs in an EXTRA block)
Input: Array<Annotation>
Output: Array<Annotation>
replace-with-annotations-in-sequence-from-tier
Considers each input annotation as a sequence, and selects those from another tier (of the same file) that are included in the sequences.
-
(arg)tier ~ -
overlapbool = true Whether to include overlaping annotations (not fully contained in the sequence) -
rangenumber = 0 Number of seconds to add before and after the sequence, for considering annotations in sequence -
range-beforenumber = 0 Additional seconds to add before -
range-afternumber = 0 Additional seconds to add after -
distinctbool = false Whether to remove duplicate annotations from the resulting list -
combinebool = false Whether to combine in one final annotation, all the annotations found in the sequence -
limitint = 0 Max number of annotations to return (0 = all annotations found in sequence, 1 = first annotation found) -
reversebool = false Whether to reverse the list of annotations found in sequence before applying the limit. This can be used to return the last annotation found in sequence (via limit=1). -
separatorstring = | A separator to insert when combining values of multiple annotations -
default-to-nullbool = false Whether to add a null element in the output array if no corresponding annotation is found for a given input annotation (default to false, or true if the operation runs in an EXTRA block)
Input: Array<Annotation>
Output: Array<Annotation>
replace-with-next-timecode-annotations-from-tier
Replaces each input annotation with one from another tier (of the same annotations' file), the first found whose start time is after input annotation's start time.
rangeint = 0 Maximum time range to find next annotation (0 = no maximum)
Input: Array<Annotation>
Output: Array<Annotation>
replace-with-previous-timecode-annotations-from-tier
Replaces each input annotation with one from another tier, the first found whose start time is before input annotation's start time.
rangeint = 0 Maximum time range to find next annotation (0 = no maximum)
Input: Array<Annotation>
Output: Array<Annotation>
replace-with-same-timecode-annotations-from-tier
Replaces each input annotation with one from another tier that has the same start time.
-
(arg)tier The tier from which to find annotations with same timecode -
multiplebool = false Whether to select multiple annotations if more than one is in range -
rangenumber = 0 Acceptable range in seconds to consider 2 timecodes as equivalent -
default-to-nullbool Whether to add a null element in the output array if no corresponding annotation is found for a given input annotation (default to false, or true if the operation runs in an EXTRA block)
Input: Array<Annotation>
Output: Array<Annotation>
sanitize-strip-xml-tags
Remove XML tags from annotations' value
Input: Array<Annotation>
Output: Array<Annotation>
save-data-to-csv
Saves current data to a CSV file. The file will be saved in the processed XML file's folder, overwriting any existing file.
Input: Object
Input: Array
Output: Input
save-data-to-json
Saves current data to a JSON file. The file will be saved in the processed XML file's folder, overwriting any existing file.
Input: Object
Input: Array
Output: Input
save-variable
Sets a global variable which becomes available in HTML blocks as {{varname}}.
It can then be used in scripts with variables.varname,
and in attributes via the ${} syntax, like ${0.05*variables.counter}.
If the value attribute is not defined, value saved will be the current selection/data
valuejs a function called with current selection/data, which should return the value of the variable
Input: Any
Output: Input
scrape
Loads an URL (or a file) and extract data
-
urlstring URL to fetch for scraping -
filefile a file path to load content instead of an URL -
cachebool = false Whether to cache in memory the downloaded URL -
jsoupjs a function to extract data called with a Jsoup document as argument (to scrape HTML) -
jsjs a function to extract data called with the content as a string argument (to scrape JSON, text files...)
Input: None
Output: Any
selection-to-data
Transforms input annotations to an array of objects
-
export-valuebool = true -
export-value-asstring = value -
export-tierbool = true -
export-tier-asstring = tier -
export-startbool = true -
export-start-asstring = start -
export-stopbool = true -
export-stop-asstring = stop
Input: Array<Annotation>
Output: Array<Object>
set-tag
Adds or removes annotation's tag.
Example to add the "relevant" tag: "set-tag = +relevant"
(arg)string The name of the tag, prefixed with+to add, or-to remove the tag
Input: Array<Annotation>
Output: Array<Annotation>
set-tier
Changes the tier of each input annotation.
By default will clone annotations to keep original ones unaffected.
Examples of template function:
(annotation, tier) => `Speaker ${tier.id}`
(a, tier) => tier.id.toUpperCase()
-
(arg)string Name of the tier to associate the annotations to -
clonebool = true -
templatejs A function called for each annotation, that must return its new tier name
Input: Array<Annotation>
Output: Array<Annotation>
sort
Sorts input array by a given field
-
naturalbool = false Whether to sort based only on the digits contained in the field, and not string comparison -
funcjs A custom comparison function like (a,b)=>(a>b?-1:1)
Input: Array
Output: Array
sort-by-file
Sorts annotations first by their file, and then by the provided field or compare function. When providing a field, first character must indicate sorting order with +/- (ascending/descending)
-
orderstring (+) (ascending/descending) Sorting order of the AF groups(+/-) -
fieldstring A field to sort on, like "+start"
Input: Array<Annotation>
Output: Array<Annotation>
sort-by-group
Sorts annotations by their group, and then inside each group by their relative start time (using eventual file time-offset) If no field is provided, sorting will be done around //TODO
-
orderstring (+) Sorting order of groups (ascending/descending)(+, -) -
naturalbool = false Natural integer sorting instead of string -
fieldstring A field to sort on, like "+start"
Input: Array<Annotation>
Output: Array<Annotation>
swap-nested-objects
Transforms an object of structure {A:{a:1,b:2}} into structure {a:{A:1},b:{A:2}}
Input: Object<Object>
Output: Object<Object>
type-token
Computes the type-token of input annotations.
-
group-by-attributestring An optional attribute name to be used for grouping type-tokens together -
strip-punctuationbool = true Whether to remove all punctuation before type-token processing -
case-sensitivebool = false Whether to consider capital letters in words comparison -
replace-regexstring A regular expression to replace text before type-token processing -
replace-funcstring A js function to replace text before type-token processing -
split-funcstring A js function to use instead of the basic space splitting, for extracting words from strings
Input: Array<Annotation>
Input: Object<Array<Annotation>>
Input: Object<Array<String>>
Input: Object<String>
Output: Object