Australian National Database of Spoken Language (ANDOSL)

What are the current data holdings of ANDOSL?


Current Holdings are divided into those from native speakers of Australian English (born and fully educated in Australia) and those from non-native speakers of Australian English (first generation migrants having a non-English native language).
  1. Native Speakers of Australian English

  2.  

     

    Each speaker contributed in a single session, 200 phonetically rich (Australianised SCRIBE) sentences, 19 hVd words, the isolated digits, 2 (MAP-task) sessions, and related word lists.

    Availability: the 200 sentence data is available in CDROM format on 9 disks and comprises the sampled data (20000s/s) with NIST Sphere headers, comprehensive Data Description Files (DDFs) suited for DBMS access. Phonemic level annotation files will be made available electronically over the net to registered purchasers. An ongoing programme of manual and automated assessments of the quality of our annotation data is the reason for keeping these data in electronic rather than CDROM format.

    Two 6-speaker speaker groups appear on each CDROM.

    An additional CDROM contains the hVD and digit word lists for these speakers.
     

  3. Non-native speakers of Australian English

  4.  The 200 phonetically rich sentences were also recorded for nine reference speakers from the most populous migrant language groups comprising Italian, Greek, Cantonese, Serbian/Croatian, Vietnamese (southern), Arabic (Lebanese), German, Polish, and Spanish (South American). This material is not as yet available in CDROM format.

    Two migrant language groups have been collected in greater speaker detail but with a revised set of 50 phonetically rich sentences.

    Each speaker contributed in a single session, 50 phonetically rich (Australianised SCRIBE) sentences, 19 hVd words, the isolated digits, 2 (MAP-task) sessions, and related word lists.

    Availability: the 50 sentence data is available in CDROM format on 4 disks and comprises the sampled data (20000s/s) with NIST Sphere headers, comprehensive Data Description Files (DDFs) suited for DBMS access. Phonemic level annotation files will be made available electronically over the net. An ongoing programme of manual and automated assessments of the quality of our annotation data is the reason for keeping these data in electronic rather than CDROM format.

    An additional CDROM contains the hVD and digit word lists for these speakers.
     

  5. MAP task (speaker pairs)

  6.  Each speaker, native-born and migrant, was paired with another speaker, who was most often from the same phonological grouping, for the performance of the MAP task. Each speaker in the pair had a turn at "leading" and at "following" in the MAP task and the two MAP task sessions thus generated were held together as a unit and classified according to the "gender relationship" and the "familiarity relationship" between the two speakers. Speaker-pair categories were then kept together as much as possible in assigning MAP task data to CDROMs.  The specific filenaming conventions used as well as some anomalies are  available. 

    Availability: the MAP task data is available in CDROM format on a total of 15 disks and comprises the sampled data (20000s/s) with NIST Sphere headers, comprehensive Data Description Files (DDFs) suited for DBMS access. Some transcription files can be made available electronically over the internet.



 HOME

webmaster@andosl.anu.edu.au
Last modified: 24 March 1999.