Subcorpus Café

Often researchers work on huge corpora and want to share only small parts to peers for feedback. The process is quite tedious, sometimes impossible depending on the annotation software. Here we explore a simple workflow: - Use annotations from the corpus as sequences - Replace the corpus with one built from the sequences - Anonymize the video files - Export to EAF format ready to be edited in ELAN

Exporting first café sequence as EAF

We use the sequences-to-corpus processor to transform our OUICHEF corpus into a café-only corpus
  • We want each created corpus file to be named based on its original corpus file
We must provide the processor with a selection of annotations, our café sequences We limit the selection to the T1 group where we have the first café sequence To export all the café sequences, we would simply remove the group criteria
We intend to share this subcorpus so we want the faces to be removed
  • The type of anonymisation to apply on the video
And maybe we want to add a cartoon-style anonymization!
  • The type of anonymisation to apply on the video
Now let's export the whole thing to EAF
  • We want the video files to be put next to the EAF files
That's it, you can open these in ELAN right away