Multi-CAST: Guidelines for contributors

The shared utility of Multi-CAST grows with increasing typological representativity of the language sample it contains. We therefore encourage scholars to contribute additional data sets to the collection, which can be incorporated into Multi-CAST as stand-alone resources, citable with your name as the author/annotator.

If you wish to contribute data, here are some points to consider:

  • Open access corpus data. Your data should be free of copyright and other restrictions on availability or usage. Multi-CAST is committed to open science, and hence makes all of its data freely available under a Creative Commons licence (CC BY-NC-SA 4.0 International). All data sets are citable online resources, with your name as author/annotator.
  • Monologues. Texts should be (predominantly) monologic. Coping with multi-person discourse raises additional issues of annotation and analysis, which we have chosen not to tackle in this collection.
  • Media-linked time-aligned annotations. Ideally, transcribed texts are accompanied by a sound file in an uncompressed WAV file format, morphologically glossed, and translated into English. Annotations are time-aligned with the audio recording.
  • Minimum size of 1000 clauses. All corpora in Multi-CAST minimally contain 1000 clause units.

If you have a data set that complies with the above conditions and you are interested in contributing it, then please contact Geoffrey Haig or Stefan Schnell in order to coordinate the next steps. Technically speaking, this involves transferring your data into the EAF file format of the annotation software ELAN, for which purpose we will provide you with a Multi-CAST ELAN template, and annotating your texts with GRAID. The latter involves some quite tricky analytical decisions, and we strongly recommend that potential contributors liaise with us before undertaking this task. The actual labour input required will vary from language to language, but we will certainly assist you and be able to give you a realistic assessment of what may be necessary.