Corpus building workshop

The Centre of Excellence for the Dynamics of Language held a workshop in Melbourne last week to “develop accessible corpora from lesser-described languages for new ways of empirical research on diverse languages.” Many language documentation projects collect a range of texts (both audio and written) which can be collated into a corpus for various different types of research. Having these resources available in appropriate formats makes it easier to do research, both within a particular language and across different languages.

After a plenary by Ulrike Mosel from the University of Kiel, a number of different researchers spoke about their own corpora – including some of the challenges and opportunities presented by their collections. Michael Christie spoke of the growing collection of texts by Yolŋu researchers on topics such as housing, gambling, education, etc., created through the Yolŋu Aboriginal Consultants Initiative, the Yolŋu Studies program (both at CDU), and many other research projects. He commented on the depth of  knowledge available in Yolŋu Matha texts, and the opportunities for Yolŋu themselves to explore and expand these.

It was gratifying to hear a number of presenters refer to the Living Archive as a source of material for various language corpora, and Greg Dickson also described a small experiment he did creating a small corpus of Kriol texts from LAAL and using a corpus linguistics tool called AntConc to do some simple analysis.

Having been mentioned several times on the first day, the Living Archive was officially presented on the second day of the conference, with one attendee describing it as “magnificent”, and some suggestions about how to handle materials for which we haven’t yet got signed permission.  We are already sharing LAAL resources with some other researchers who are developing a corpus of Warlpiri materials, and are happy to share ideas and resources with others.

This was a good opportunity to connect with others collecting materials in languages for which there may not be much material available, and to discuss the best ways to keep the materials safe, best practice for metadata, as well as what kinds of access conditions are most appropriate. We look forward to the next steps so we can ‘shepherd’ these materials into useful resources for research and reuse.