Category Archives: Technical

Language codes in ISO 639-3

UPDATE – February 2016 – the ISO has accepted our requests to create a new code for Dhuwaya language [dwy], and to rename the language with the code [dhg] as Dhaŋu-Djaŋu. The process of ‘removing’ Dhuwaya from its previous classification as Dhuwal means that Dhuwal gets a new 3-letter code [dwu]. See here for more details (or contact us)

Language identification can be problematic – there are often different names for languages, as well as different spellings. The name a group uses for its own language may be different to what other groups use to identify that language, or different terms may be used for the same language.

An international standard exists for the consistent identification of language names – ISO 639-3, which aims to define three-letter identifiers for all known human languages. In the Living Archive we’ve been careful to stick to the ISO 639-3 codes, in accordance with best practice as recommended by OLAC, while also listing the Austlang codes used by AIATSIS. For the most part, the codes work well for the language materials in our collection, despite some spelling variations and some ‘lumping and splitting’ discrepancies.

When it comes to Yolŋu Matha languages however, things don’t work so well. This is due to the complex nature of language identification and classification according to the Yolŋu worldview (for example see Christie, 1993), which doesn’t map well to the very Western ISO 639-3 system. Languages can be identified at the clan level (eg Maḏarrpa or Dätiwuy), or grouped into categories based on the word for ‘this’ (eg Dhuwal, Dhaŋu). Currently the ISO system combines both levels of classification, giving codes to 6 of the 9 ‘this’ level languages, plus 3 of the major clan languages (Gumatj, Djambarrpuyŋu, Gupapuyŋu) (see here for more information from Ethnologue (Lewis et al, 2015)).

Because of the flat structure of the ISO 639-3 system, the hierarchy of language relationships can’t be seen. There’s not even a code for Yolŋu – despite it being the most common way of referring to this group of languages. It makes it hard in the Living Archive collection, because we have materials in languages which don’t have a code, and sometimes different translations of a book end up looking like they’re in the same language. There’s also not an easy way to view all Yolŋu language materials together. Our interim solution lists the languages without ISO codes, but this is not an ideal solution.

Dhangu whale books           LAAL languages with/without ISO codes

The system does allow people to request changes to the existing code set, which we tried in 2012 but were not successful. We’re trying again this year, and our proposals are currently online for public discussion.  Here’s a short summary of what we’re proposing and why (with links to the submissions):

  • Dhuwaya – create
    • Dhuwaya is a ‘koine’ language used around the Yirrkala area, documented by Rob Amery (1993), which sits outside the existing classification. The bilingual program at Yirrkala School shifted from Gumatj to Dhuwaya some years ago, and materials being produced at the school are all in Dhuwaya language. Currently Dhuwaya is listed on Ethnologue as a “dialect” of Dhuwal , but many Yolŋu object to this classification.
  • Dhuwal – split
    • this one is required to remove Dhuwaya from the existing classification of Dhuwal
  • Dhuwal – create
    • again, if Dhuwaya is removed from Dhuwal, then a new code has to be created, as the existing Dhuwal code would no longer be valid
  • Dhangu – update
    • Yolŋu people distinguish between Dhaŋu languages (such as Galpu, Wangurri, Rirratjiŋu) and Djaŋu languages (such as Warramiri), but they are largely mutually intelligible. While there is an argument for splitting them – using Yolŋu categories – such an argument is unlikely to pass the ISO’s mostly linguistic criteria for splitting. Instead, we are proposing to rename this to Dhangu-Djangu to ensure that both groups are included within the same code
LAAL Yolngu ISO change requests

LAAL requests 2015

These proposals are online now and open for public comment. We’d love to get input from linguists and others interested in this issue, especially Yolŋu people. The site states that “You are invited to submit a comment on any change request, either in support of or in opposition to its proposed changes (as a whole or for individual changes).” You can send your comments to iso639-3@sil.org and include the change request number in the subject line.

We presented some of the issues facing the Living Archive project relating to this at a conference last year (Bow, 2014), and we’re planning to write more about it for a journal article.

REFERENCES

Amery, Rob (1993). An Australian koine: Dhuwaya, a variety of Yolŋu Matha spoken at Yirrkala in North East Arnhemland. International Journal of the Sociology of Language. Volume 99, Issue 1, Pages 45–64

Bow, C. (2014). Shoehorning Yolŋu language names into the ISO 639‐3 standard. Paper presented at Australian Linguistics Society conference. University of Newcastle, 10-12 December 2014. Abstract available at www.als.asn.au/sites/default/files/ALSNewcastle2014Abstracts.pdf  p.33

Christie, Michael J. (1993). Yolngu linguistics. Ngoonjook, No. 8, Jun 1993: 58-77.

Lewis, M. Paul, Gary F. Simons, and Charles D. Fennig (eds.). 2015. Ethnologue: Languages of the World, Eighteenth edition. Dallas, Texas: SIL International. Online version: http://www.ethnologue.com

Technical support needed

CALL FOR EXPRESSIONS OF INTEREST

FOR DEVELOPERS TO INTEGRATE AN API WITH the LIVING ARCHIVE OF ABORIGINAL LANGUAGES WEB APPLICATION

BACKGROUND

The Living Archive of Aboriginal Languages is a digital collection of endangered literature in Australian Indigenous languages from around the Northern Territory. Around 3500 items have been digitised and stored in CDU Library’s eSpace: a Fedora repository managed by Fez software. Only records for which we have signed permission forms are visible on the site. Around 50% of the materials are not currently available for public viewing.  These we refer to as hidden items. A public discovery layer (the LAAL app) was built over the collection in PHP (Slim) by Stephen McPhillips. Solr handles the indexing. Javascript (jQuery, LeafletJS) enable  geospatial visualisations via a map interface at http://laal.cdu.edu.au/. The current LAAL app allows users to search, browse, read and download materials, using various devices and platforms without a login. To allow elders, authors and others in remote communities to login and perform specific tasks in the eSpace repository a REST API was developed by Catalyst IT.

THE TASKS

A. Preparation

Integrate API functionality throughout the LAAL app to allow authorised users to perform functions in eSpace such as initiate sessions, access all resources, and upload new ones. (A rich domain model is envisaged, as we wish to to avoid user interactions that are cumbersome and unmaintainable.)

B. Develop an interface to the API that handles two basic functions:

(1) logging in/ logging off by authorised users, and

(2) revealing hidden items (see above) in all LAAL app views, and indicating (perhaps by greying them out) that these are hidden. Once a working prototype is available, we would like to subject it to user testing for two weeks.

C.  Allow additional (moderated) user actions

(3) Create some forms to allow editing and file transfer. Their purpose would be to allow additional user actions:

  • Upload permission forms and attachments to a temporary holding
    • All items need to be accompanied by signed permission from the creators. This form needs to be available to download from the site, and to upload when signed.
  • Add new materials
    • Users will create new records and add relevant metadata by filling in information on a form before uploading files.
    • Users who have related items (sound files, e-books) will attach these to existing records with appropriate metadata.
  • Edit existing materials
    • Users may identify errors and propose corrections by editing metadata.
    • Users may also make corrections to text files.

Note concerning moderator actions:

  • All of the additional actions listed above need to be moderated by Living Archive project staff.
  • The simplest way to achieve that would be email notification with the actions then completed manually in eSpace by LAAL project staff.
  • The interface code should be written to enable automated actions via the Fez API at a later date.

Additional notes:

  • Access to source code and documentation about the Fez API and LAAL app can be made available on request.
  • Sufficient documentation is required to ensure that the system can be maintained by project technical staff.
  • Development of the interface will be an iterative process of refining the design based on user requirements.
  • The interface must comply with all legal requirements for accessibility.
  • All development must use existing technology in the LAAL app where possible.
  • CDU will retain all copyright on the product. The application may be released under an open source licence as part of the Fez project http://sourceforge.net/projects/fez/.

Preferred timeline

  • Stage one: applications submitted with mock-up of interface—due at the end June 2015
  • Stage two: the successful applicant will be given access to source code and asked to develop a prototype—due mid-August
  • Stage three: trial implementation leading to refinements to product in response to feedback from users—end of September
  • Stage four: final product to be delivered by the end of October 2015

How to get involved

PDF Version here

Uploading items

Once books are scanned, there are several different steps to go through before they appear on the Living Archive website.

All the metadata (information about the book, like title, the names of people involved in creating the book, date and place of publication, etc) is stored in a spreadsheet. Once a group of books is scanned, the metadata is double-checked on the spreadsheet, as this is the information that will appear on the website. Some categories have ‘controlled vocabulary‘ which means that the information has to match exactly – so things like place names have to spelt correctly. This can be a challenge for places or languages with different names, so we have created standard names such as Gunbalanya (Oenpelli) or Arrarnta, Western. Also, all creators are assigned a code so that people whose names may appear differently in different books can have all their items grouped together.

Once we’re happy that all the information is correct, and all the files and folders are consistently named, the technical team at CDU Library do a batch upload. This takes all the information from the spreadsheet and the files from the server and uploads them all to the eSpace digital repository. These then have to be checked, because we usually find some issue – sometimes even supposedly minor things like spaces or commas can make a difference (this time it was the shape of the apostrophe in Galiwin’ku!)

Once they’re uploaded, we have to check which items can then be made public, which means they have had permission granted by all the relevant people. These items go into the ‘open’ collection which appears on the Living Archive website, and all the others stay in a ‘closed’ collection, awaiting permissions. Once they’re made public, the books appear immediately on the website, but sometimes the counting of items takes a day to catch up (these things reset overnight).

We also need to link any ‘related items‘, such as translations in other languages or different versions, sometimes audio or multimedia files. Some books have lots of these, such as the Little Frog book which is in 6 different languages! Then all the files and folders are merged into the appropriate collections, on both eSpace and the local server.

The whole process can take quite a while, and it rarely goes completely smoothly. We usually do a few collections at once, so hundreds of items may be uploaded at different points throughout the year. Despite all the quality control processes, we still manage to find errors – if you find any, feel free to let us know so we can fix them!

Welcome to 2015

The Living Archive project is about to enter its fourth year – not bad for something that was originally funded for 1 year! Careful spending and a second successful funding application has allowed us to continue the project over a longer period, and do so much more than we could have done in a single year.

Some statistics:

At the end of 2014 we had 3115 items uploaded to CDU Library’s eSpace server, of which 1353 are publicly available through the LAAL website. The public items represent only 43% of the total, and one of our goals for this year is to increase that percentage. The main reason for this is the difficulty of identifying and tracking down the creators of each item to get their permission to make the items public. We’ve already collected over 600 signatures, yet there are still nearly 1000 additional names to find. Some items don’t have any information about the creators, so they can’t be made public without someone giving permission, but who do we ask?

  • Number of languages represented: 30
  • Number of communities represented: 27
  • Language with the most items publicly available: Pintupi-Luritja (189 items)
  • Language with the most items uploaded: Warlpiri (517 items)
  • Language with the highest proportion of uploaded items made public: Maung (80%)

More items are being uploaded regularly – some have already gone up this week, and more will follow soon. The process takes a while to ensure that all the information is correctly recorded and uploaded, and we still find errors even after careful checking! If you find an error, feel free to let us know.

Comparison figures for the end of each year so far:

year uploaded public
2012 436 89
2013 1453 645
2014 3115 1353

Looking back at our plans for 2014, it’s nice to see that we achieved most of them. Some of our achievements include:

  • revamping the website to include a more intuitive map page and a separate project site where we can post other information
  • developing a social media presence on Facebook and Twitter an efficient means of sharing information quickly, as well as a record of some of our activities
  • having public ‘launches’ in both Canberra and Darwin which attracted a very positive response
  • adding materials identified through the ‘Search and Rescue‘ strand, particularly through the workshop held at Batchelor Institute in July. There are already 134 items from 3 languages, with more coming soon
  • having our archive added to OLAC and ANDS and accessible through Trove

The to-do list for 2015 looks enormous already, but we’re looking forward to developing the Archive further, with input from our users.

Here are a few things we have planned for this year:

  • our LAAL Reader app to allow mobile users to download whole collections for use offline
  • an API which will allow users to log in to the site and made changes. This will include visiting some communities to test the functionality
  • working out how to make information about items not yet public available to users
  • engaging with the academic community to encourage researchers to use the archive
  • working more closely with remote schools and others to find creative ways to use the materials in the classroom
  • update our language map with feedback from community members and other experts
  • more writing – we had two academic papers published last year, with two more due out soon, and two more currently under review. We have ideas for several more, the challenge is finding the time to write them!
  • more permissions
  • streamlining processes for adding audio files and e-book files to the archive
  • continue to try to update codes for Yolŋu language names in ISO 639-3 to better reflect language naming practices in current use
  • sustainability audit to work out how best to maintain the archive once our funding expires at the end of 2015

Thanks for helping us make this a ‘living archive’ by engaging with us as we develop it, and by using the materials in the archive.

Link

A few e-books have been made with the agreement and help of language and story owners. These will be uploaded alongside the original stories for others to enjoy. One of these will an e-book about making dillybags (bathi) from strips of pandanus. This story was written, illustrated, translated and read aloud by Elizabeth Milmilany Dhurrkay (Räkay).

Screen Shot 2014-07-21 at 10.30.58 am

Räkay has now been working with Dr Brian Devlin to produce a version which can be read using a web browser, mobile phone, tablet or computer. (To see an early draft please go to bathi.netii.net). This has been very much a team effort. Cathy Bow segmented the sound files. Brian prepared an HTML5 template to integrate the audio, text and images.

A colour coding scheme was developed at Milingimbi to indicate the reading levels of printed books produced for use in the bilingual program. One interesting question is whether the ebooks that go into the LAAL archive should also use the same scheme.

Screen Shot 2014-07-21 at 10.26.51 am

Welcome to 2014

2014 is shaping up to be a huge year in the Living Archive project.

We’ll be aiming to complete many of the activities of Stage I of the project including

  • Materials
    • finish processing digitised files
    • finish checking and uploading to eSpace
    • prioritising additional records for digitisation and uploading
    • arrange export of metadata to OLAC
    • add audio and ebooks
  • Website
    • redo home page layout
    • create content site for updates, news, publicity, etc
    • make more records public as permissions are obtained
    • fix ‘Warning’ page
    • create ‘good faith’ notice
    • insert notes about rights holders
    • insert caveat about photos of deceased people
    • tidy up some inconsistencies in metadata
  • Publications
    • three papers currently under review
      • ACSA, LDC, Australian Archivist
    • four papers in preparation
      • launch booklet, Paradisec, user evaluation, technical paper
  • Other activities
    • launch the archive
    • negotiate licensing with NT Department of Ed
    • discuss licensing with other sources/publishers
    • adjust ISO codes if our changes are accepted

At the end of 2013 we had:

  • 1453 items uploaded to eSpace
  • 616 items publicly available on the website
  • 529 items ready to upload
  • approximately 1500 items still in preparation
  • several sources still untapped (other libraries, private collections)

Stage II will be exciting and we’re looking forward to working with our new partners

  • Batchelor Institute
  • Northern Territory Library
  • Catholic Education Office

and our existing partners

  • NT Department of Education
  • Australian National University

We’ll be meeting together early in February to plan how to get things moving.

Strap yourselves in!