Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orthography profile: non meaningful contrasts ? #12

Open
XachaB opened this issue Nov 26, 2020 · 18 comments
Open

Orthography profile: non meaningful contrasts ? #12

XachaB opened this issue Nov 26, 2020 · 18 comments

Comments

@XachaB
Copy link

XachaB commented Nov 26, 2020

Hi,

I have the impression that there might be spurious contrasts in the orthography profile, in particular voicing contrasts in occlusives (p/b, g/k, t/d). I also suspect that the ɹ/r contrast is maybe not meaningful.

I suggest that we figure out precisely which contrasts are due to variation in descriptive practice, and which are truly contrasts imputable to sound change etc, and neutralize meaningless contrasts.

As to how to normalize, we have three other datasets with languages from these families, and should make sure we are using the same notations:

Lexibank dataset Sounds found
bowernpny + _ a aː ã b bː c cʷ cː d dʒ dʱ dː d̪ e eː f g gʷ gː h i iː j k kʷ kː l lʷ lː l̪ m mː n nʲ nː n̪ n̪ː o oː p pː q qː r rː s t tʃ tʲ tː t̪ t̪ʷ t̪ː u uː ũ v w x yː z æ ð ø ŋ œ ɐ ɑː ɔ ɖ ə ɛ ɛː ɜ ɣ ɤ ɨ ɪ ɭ ɲ ɳ ɹ ɽ ɾ ʀ ʈ ʊ ʒ ʔ ʔʲ ˀb ˀd ˀdʒ ˀk ˀm ˀn ˀr ˀt ˀt̪ ˀw ˀɭ β θ
johanssonsoundsymbolic + a aː c i j k l l̪ m n n̪ p r rː t t̪ u w ŋ ɭ ɲ ɳ ɽ ʎ
joophonosemantic a aː i j k l l̻ m n n̻ p r t t̻ u uː w ŋ ɭ ɳ ɽ ʈ
wold + _ a g i iː j k l m n p r t u w y ŋ ɲ

As you can see, the other ones use p-k-t, not b-g-d, and have a single /r/ sound. https://github.com/lexibank/wold does have a k/g contrast (if it is also meaningless, we should change it there).

@erichround, @chirila, could you chime in on whether these contrasts should be kept here ? Are there other contrasts that should be neutralized in the list above ? @tresoldi, it looks like the orthography profile was from you, do you remember if there was specific motivations for these contrasts ?

For a closer look, the list of sounds with counts can be found in the TRANSCRIPTION file: https://github.com/lexibank/bowernpny/blob/master/TRANSCRIPTION.md

Having non meaningful contrasts causes issues with downstream analyses of the data, especially in the sound correspondence study.

@XachaB
Copy link
Author

XachaB commented Nov 26, 2020

In case it helps assess the (potential) problem, here is a (long) detailed list of which sets of sounds among p/k/t/b/d/g/ɹ/r etc are present for each languages:

dataset Glottocode Language_ID sounds_contrasts
bowernpny pall1243 Pallanganmiddang b d g k p r t
bowernpny ande1247 kalk1246 maln1239 mart1256 pitj1243 walm1241 wanm1242 warl1255 west2441 Antekerrepenhe Kalkatungu Malngin MartuWangka Pitjantjatjara SouthernWalmajarri Warlmanpa Warnman WesternArrarnta k p r t ɹ
bowernpny kanj1260 umpi1239 Kaanju Umpila k p r t
bowernpny alya1239 djuw1238 guga1239 kurr1243 mang1383 ngaa1240 nort2753 nort2754 nyan1301 pint1250 pint1251 warl1254 Alyawarr Jiwarliny Kukatj Kurrama Ngaanyatjarra Ngardily NorthernMangarla NorthernNyangumarta Nyangumarta PintupiLuritja Wangkatja Warlpiri g k p r t ɹ
bowernpny wang1289 Wangkayutyuru d k p r t ɹ
bowernpny wikm1247 WikMungkan d g k p r t ɹ
bowernpny yulp1238 Yulparija b k p r t ɹ
bowernpny kart1247 waru1265 Kartujarra Warumungu b g k p r t ɹ
bowernpny guri1247 paka1251 yany1243 yarl1238 Gurindji Pakanh Yanyuwa Yarluyandi b d k p r t ɹ
bowernpny kara1476 KarajarriNW b d g ɹ
bowernpny kung1258 Gunggari b d g r t ɹ
bowernpny duun1241 gudj1237 wulg1239 Duungidjawu Gudjal Wulguru b d g k r ɹ
bowernpny dhur1239 yaga1256 Dhurga Durubul b d g k r t ɹ
bowernpny kuku1280 ngad1257 KuguNganhcara Ngadjuri b d g k p t ɹ
bowernpny warr1255 Wargamay b d g k p r ɹ ɽ
bowernpny baty1234 bili1250 kumb1268 waru1264 Batyala Bilinarra Gumbaynggir Warungu b d g k p r ɹ
bowernpny gurd1238 Kurtjar b d g k p r t ɹ ʀ
bowernpny kala1380 muru1266 yind1248 Garlali Muruwari Yindjilandji b d g k p r t ɹ ɾ
bowernpny dyir1250 Dyirbal b d g k p r t ɹ ɽ
bowernpny adny1235 angu1242 arab1267 awab1243 badi1246 badj1244 band1358 bang1339 bayu1240 bidy1243 biri1256 birr1241 bula1255 bung1264 cola1237 darl1243 dayi1244 dhal1245 dhan1270 dhar1247 dhar1248 dhud1236 dier1241 djam1256 djap1238 djin1253 djiw1241 flin1247 gami1243 gang1268 gath1234 gugu1255 guma1253 gund1249 guny1241 gupa1247 gure1255 guwa1242 guya1249 hawk1239 jaru1254 kala1377 kala1379 kara1476 kari1304 karr1236 kaur1267 kera1256 kuka1246 kuku1273 kula1275 kuun1236 leni1238 lowe1402 madh1244 malg1242 maly1234 marg1253 maya1280 mayi1234 mayi1235 mayi1236 mbab1239 minj1242 mith1236 narr1259 naru1238 ngad1258 ngam1284 ngar1235 ngar1287 ngar1296 ngar1297 ngaw1240 ngun1277 nhan1238 nhir1234 nort2760 nyam1271 nyun1247 pany1241 pirr1240 pitt1247 rirr1238 rita1239 sydn1236 thur1254 wadi1249 wadi1260 waga1260 waja1257 waju1234 wang1290 wang1291 ward1248 wari1262 warl1256 wath1238 wira1262 wira1265 woiw1238 wong1246 yabu1234 yaga1262 yala1262 yand1253 yann1237 yawa1258 yidi1250 yind1247 yira1239 yort1237 yuga1244 yuwa1242 Adnyamathanha Arabana Awabakal Badimaya Badjiri Bandjalang Biri Birrpayi Bularnu Bunganditj Colac Darkinyung Dhangu Dharawal Dharuk Dharumbal Dhayyi Dhudhuroa Diyari Djambarrpuyngu Djapu Djinang FlindersIsland Gamilaraay Gangulu GoorengGooreng Gumatj Gunditjmara Gundungurra Gunya Gupapuyngu GuuguYimidhirr Guwa Guyani Iyora Jaru Jiwarli Kamilaroi Karajarri Kariyarra Karuwali Katthang Kaurna Keramin Kukatja KukuYalanji Kungkari Kurnu Linngithigh Mabuiag Malgana Malyangapa Margany MathiMathi MayiKulan MayiKutuna MayiThakurti MayiYapi Mbabaram Mbakwithi Minjungbal Mirniny Mithaka Narrungga Ngadjumaya Ngaiawang Ngamini Ngarigu Ngarinyman Ngarla Ngarluma Ngarrindjeri Ngawun Ngiyambaa Ngunawal Nhanta Nhirrpi Nyamal Nyungar Paakantyi Panyjima Parnkala Payungu Piangil Pirriya PittaPitta Rirratjingu Ritharrngu Thalanyji Tharrgari Wajarri Wakaya Wangkangurru Wangkumara Wardandi Warluwarra Warriyangga Wathawurrung Wathiwathi Watjuk Wiradjuri Wirangu Woiwurrung YabulaYabula Yagara Yalarnnga Yandruwandha Yannhangu Yawarrawarrka Yidiny Yindjibarndi Yiningay Yirandali YortaYorta Yugambeh Yuwaalaraay b d g k p r t ɹ
bowernpny dyaa1242 Djabugay b d g p r t ɹ
bowernpny kuuk1238 KuukuYau k p t ɹ
johanssonsoundsymbolic joophonosemantic ngar1287 pitj1243 Ngarluma Pitjantjatjara k p r t ɽ
wold guri1247 Gurindji g k p r t

Quite a few seem to have g/k, maybe a clue that it is sometimes contrastive ?

@LinguList
Copy link
Contributor

LinguList commented Nov 26, 2020 via email

@tresoldi
Copy link
Contributor

Some differences might be explained by different concept sets, others by the sample: wold has only Gurindji, while bowernpny has almost 200 varieties. I remember that my main source, along with references listed in Glottolog which I could track on-line, were the inventories provided in Phoible (which, for Pama-Nyungan languages, are in almost all cases given by @erichround).

Now, there are probably errors or at least questionable transcriptions, undoubtedly -- I am no expert in Pama-Nyungan by any means, and this is an old profile (from when we had a single one per dataset). Nonetheless, I suppose it is in part also related to the solutions different authors employ for rendering in the transcription not only contrasts in stop series, but surface differences. First, it might not be necessarily what we'd "expect" in terms of modal voice vs. fully open glottis -- after all, it is true that in most languages of Australia (with the exception of some Northern ones) we don't expect a voicing distinction and, even more, the strong phonological similarities are one of the main reasons for determining it as a family. There are cases where the consonants are only semivoiced, but the IPA graphemes for voiced consonants are used (this is the case of Hercus in her grammar of Wirangu, here).

Second, and more important, they are not necessarily phonological in terms of a global contrastive correspondence. If you look at the original files in the history, I was also including (automatic) alignments, which I used for studying the dataset, you can see patterns like k-g-k everywhere. I remember there are even a handful of synonyms which were just expressing k/g or t/d as allophones. But take the example from Alpher (2004) cited by Miceli (2005) (here, and see the example of Wirangu discussed above):

Language Form Meaning
PPN *kampa- ‘cook in earth oven’
Uradhi aβa- ‘cover with sand’
Wik-Mungknh ka:mp- ‘cook in earth oven’
Djabugay gampa(:) ‘cook in earth oven’
Wirangu gamba- ‘cook, eat’
Kaytetye ampe- ‘burn’
Manjiljarra kampa ‘cook, burn’
Warlpiri kampa- ‘be burning – of fire; burn it – of fire’
Walmajarri kampa ‘cook it’
Nyangumarta kampa- ‘cook it’ (tr), ‘burn’ (intr)
Martuthunira kampa ‘be burning, be cooking’
Jiwarli kampa- ‘cook, burn’
Yingkarta kampa-ñi ‘be burning, be cooking’

Djabugay and Wirangu have a word initial g- for k-, and Wirangu has a -b- for -p-. There are many things going on: in some cases it looks as an orthographic preferences (the "descriptive practice" you mention), in others it is an articulatory information that is not contrastive (i.e., it is phonetic and phonological), in some cases it looks like stuff in free variation while in others it is positional, and so on.

While bowernpny, the Lexibank dataset, would surely benefit from reviewed individual profiles, we would still face these "issues". I am not sure how you should treat them for the study on correspondences.

@LinguList
Copy link
Contributor

Please check with my latest commit, as I did already clean up the data more, since there were many invalid segments.

@LinguList
Copy link
Contributor

I had this in another branch, now merged this with the master branch.

@XachaB
Copy link
Author

XachaB commented Nov 26, 2020

Do you mean commit 3e48d89 ? It is called "update to get rid of bad forms", but does not change forms.csv, so I am not sure I understand what are the changes.

@erichround
Copy link

erichround commented Nov 26, 2020 via email

@LinguList
Copy link
Contributor

LinguList commented Nov 26, 2020 via email

@XachaB
Copy link
Author

XachaB commented Nov 26, 2020

Here is the updated table after checking out the current master version:

dataset Glottocode Language_ID sounds_contrasts
bowernpny pall1243 Pallanganmiddang b d g k p r t
bowernpny adny1235 angu1242 arab1267 awab1243 badi1246 badj1244 band1358 bang1339 bayu1240 bidy1243 biri1256 birr1241 bula1255 bung1264 cola1237 darl1243 dayi1244 dhal1245 dhan1270 dhar1247 dhar1248 dhud1236 dier1241 djam1256 djap1238 djin1253 djiw1241 flin1247 gami1243 gang1268 gath1234 gugu1255 guma1253 gund1249 guny1241 gupa1247 gure1255 guwa1242 guya1249 hawk1239 jaru1254 kala1377 kala1379 kara1476 kari1304 karr1236 kaur1267 kera1256 kuka1246 kuku1273 kula1275 kuun1236 leni1238 lowe1402 madh1244 malg1242 maly1234 marg1253 maya1280 mayi1234 mayi1235 mayi1236 mbab1239 minj1242 mith1236 narr1259 naru1238 ngad1258 ngam1284 ngar1235 ngar1287 ngar1296 ngar1297 ngaw1240 ngun1277 nhan1238 nhir1234 nort2760 nyam1271 nyun1247 pany1241 pirr1240 pitt1247 rirr1238 rita1239 sydn1236 thur1254 wadi1249 wadi1260 waga1260 waja1257 waju1234 wang1290 wang1291 ward1248 wari1262 warl1256 wath1238 wira1262 wira1265 woiw1238 wong1246 yabu1234 yaga1262 yala1262 yand1253 yann1237 yawa1258 yidi1250 yind1247 yira1239 yort1237 yuga1244 yuwa1242 Adnyamathanha Arabana Awabakal Badimaya Badjiri Bandjalang Biri Birrpayi Bularnu Bunganditj Colac Darkinyung Dhangu Dharawal Dharuk Dharumbal Dhayyi Dhudhuroa Diyari Djambarrpuyngu Djapu Djinang FlindersIsland Gamilaraay Gangulu GoorengGooreng Gumatj Gunditjmara Gundungurra Gunya Gupapuyngu GuuguYimidhirr Guwa Guyani Iyora Jaru Jiwarli Kamilaroi Karajarri Kariyarra Karuwali Katthang Kaurna Keramin Kukatja KukuYalanji Kungkari Kurnu Linngithigh Mabuiag Malgana Malyangapa Margany MathiMathi MayiKulan MayiKutuna MayiThakurti MayiYapi Mbabaram Mbakwithi Minjungbal Mirniny Mithaka Narrungga Ngadjumaya Ngaiawang Ngamini Ngarigu Ngarinyman Ngarla Ngarluma Ngarrindjeri Ngawun Ngiyambaa Ngunawal Nhanta Nhirrpi Nyamal Nyungar Paakantyi Panyjima Parnkala Payungu Piangil Pirriya PittaPitta Rirratjingu Ritharrngu Thalanyji Tharrgari Wajarri Wakaya Wangkangurru Wangkumara Wardandi Warluwarra Warriyangga Wathawurrung Wathiwathi Watjuk Wiradjuri Wirangu Woiwurrung YabulaYabula Yagara Yalarnnga Yandruwandha Yannhangu Yawarrawarrka Yidiny Yindjibarndi Yiningay Yirandali YortaYorta Yugambeh Yuwaalaraay b d g k p r t ɹ
bowernpny dyir1250 Dyirbal b d g k p r t ɹ ɽ
bowernpny kala1380 muru1266 yind1248 Garlali Muruwari Yindjilandji b d g k p r t ɹ ɾ
bowernpny gurd1238 Kurtjar b d g k p r t ɹ ʀ
bowernpny baty1234 bili1250 kumb1268 waru1264 Batyala Bilinarra Gumbaynggir Warungu b d g k p r ɹ
bowernpny warr1255 Wargamay b d g k p r ɹ ɽ
bowernpny kuku1280 ngad1257 KuguNganhcara Ngadjuri b d g k p t ɹ
bowernpny dhur1239 yaga1256 Dhurga Durubul b d g k r t ɹ
bowernpny duun1241 gudj1237 wulg1239 Duungidjawu Gudjal Wulguru b d g k r ɹ
bowernpny dyaa1242 Djabugay b d g p r t ɹ
bowernpny kung1258 Gunggari b d g r t ɹ
bowernpny kara1476 KarajarriNW b d g ɹ
bowernpny guri1247 paka1251 yany1243 yarl1238 Gurindji Pakanh Yanyuwa Yarluyandi b d k p r t ɹ
bowernpny kart1247 waru1265 Kartujarra Warumungu b g k p r t ɹ
bowernpny yulp1238 Yulparija b k p r t ɹ
bowernpny wikm1247 WikMungkan d g k p r t ɹ
bowernpny wang1289 Wangkayutyuru d k p r t ɹ
wold guri1247 Gurindji g k p r t
bowernpny alya1239 djuw1238 guga1239 kurr1243 mang1383 ngaa1240 nort2753 nort2754 nyan1301 pint1250 pint1251 warl1254 Alyawarr Jiwarliny Kukatj Kurrama Ngaanyatjarra Ngardily NorthernMangarla NorthernNyangumarta Nyangumarta PintupiLuritja Wangkatja Warlpiri g k p r t ɹ
bowernpny kanj1260 umpi1239 Kaanju Umpila k p r t
bowernpny ande1247 kalk1246 maln1239 mart1256 pitj1243 walm1241 wanm1242 warl1255 west2441 Antekerrepenhe Kalkatungu Malngin MartuWangka Pitjantjatjara SouthernWalmajarri Warlmanpa Warnman WesternArrarnta k p r t ɹ
johanssonsoundsymbolic joophonosemantic ngar1287 pitj1243 Ngarluma Pitjantjatjara k p r t ɽ
bowernpny kuuk1238 KuukuYau k p t ɹ

@XachaB
Copy link
Author

XachaB commented Nov 26, 2020

For example, for Pallanganmiddang, Phoible gives the sounds: "a e i j k l l̪ m n n̪ o p r t t̪ u w ŋ ȴ ȵ ȶ ɭ ɳ ɻ ʈ", but here we have (among others): "b d g k p r t". Phoible (this one is from @erichround ), does not distinguish k/g p/b t/d.

Is there a way for us to compare automatically, language by language, the set of sounds given in phoible and the set of sounds used in a dataset ? We might catch a lot of small variations in notation by doing that systematically.

@LinguList
Copy link
Contributor

LinguList commented Nov 26, 2020 via email

@XachaB
Copy link
Author

XachaB commented Nov 26, 2020

Amazing ! That's excellent. Looking forward to it !

@LinguList
Copy link
Contributor

Please see here for an example of how this can be done.

@chirila
Copy link

chirila commented Nov 27, 2020 via email

@LinguList
Copy link
Contributor

You were sent an invitationow

@chirila
Copy link

chirila commented Nov 27, 2020 via email

@chirila
Copy link

chirila commented Nov 27, 2020 via email

@XachaB
Copy link
Author

XachaB commented Dec 1, 2020

Thanks ! How could we go about getting to a better representation for this dataset ?
I understand the first step is to have separate profiles for each language, but beyond that, I am not sure how to decide how to decide on a specific transcription. Should we follow Phoible, and it not, how can we have more comparable representations ?
In the current state, I think the transcription here follows neither Phoible nor a more comparable inventory set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants