-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Orthography profile: non meaningful contrasts ? #12
Comments
In case it helps assess the (potential) problem, here is a (long) detailed list of which sets of sounds among p/k/t/b/d/g/ɹ/r etc are present for each languages:
Quite a few seem to have g/k, maybe a clue that it is sometimes contrastive ? |
I suggest to also run a direct comparison with this dataset against the
inventories as they are provided in
cldf-clts/clts#76
As we integrate all data now, we could then directly compare basic
similarities, etc.
|
Some differences might be explained by different concept sets, others by the sample: Now, there are probably errors or at least questionable transcriptions, undoubtedly -- I am no expert in Pama-Nyungan by any means, and this is an old profile (from when we had a single one per dataset). Nonetheless, I suppose it is in part also related to the solutions different authors employ for rendering in the transcription not only contrasts in stop series, but surface differences. First, it might not be necessarily what we'd "expect" in terms of modal voice vs. fully open glottis -- after all, it is true that in most languages of Australia (with the exception of some Northern ones) we don't expect a voicing distinction and, even more, the strong phonological similarities are one of the main reasons for determining it as a family. There are cases where the consonants are only semivoiced, but the IPA graphemes for voiced consonants are used (this is the case of Hercus in her grammar of Wirangu, here). Second, and more important, they are not necessarily phonological in terms of a global contrastive correspondence. If you look at the original files in the history, I was also including (automatic) alignments, which I used for studying the dataset, you can see patterns like k-g-k everywhere. I remember there are even a handful of synonyms which were just expressing k/g or t/d as allophones. But take the example from Alpher (2004) cited by Miceli (2005) (here, and see the example of Wirangu discussed above):
Djabugay and Wirangu have a word initial g- for k-, and Wirangu has a -b- for -p-. There are many things going on: in some cases it looks as an orthographic preferences (the "descriptive practice" you mention), in others it is an articulatory information that is not contrastive (i.e., it is phonetic and phonological), in some cases it looks like stuff in free variation while in others it is positional, and so on. While |
Please check with my latest commit, as I did already clean up the data more, since there were many invalid segments. |
I had this in another branch, now merged this with the master branch. |
Do you mean commit 3e48d89 ? It is called "update to get rid of bad forms", but does not change forms.csv, so I am not sure I understand what are the changes. |
Hi all,
It’s late here in Aus so I’ll be brief.
It would be good to see an updated list similar to what Sacha sent, but with Mattis’s committed fixes. The list Sacha sent contains very many languages with voiced and voiceless symbols where there’s no phonemic contrast (cf the Phoible inventories you mentioned, Tiago). Little of that variation will be due to careful phonetic transcriptions in the originals, rather it’ll be mostly idiosyncratic orthographic decisions by linguists that aren’t consistent across languages.
In almost every Australian language there are at least two rhotics, so you can trust that the ɹ/r contrast is mainly correct.
A quick question which will help us with the paper on sound correspondences: is Lexibank in general (beyond Australia) also a mix of allophones in some languages and phonemes in others? Or would it be overwhelmingly be phonemic?
Best,
Erich
|
Check the last version of lexibank script and orthography profile.
|
Here is the updated table after checking out the current master version:
|
For example, for Pallanganmiddang, Phoible gives the sounds: "a e i j k l l̪ m n n̪ o p r t t̪ u w ŋ ȴ ȵ ȶ ɭ ɳ ɻ ʈ", but here we have (among others): "b d g k p r t". Phoible (this one is from @erichround ), does not distinguish k/g p/b t/d. Is there a way for us to compare automatically, language by language, the set of sounds given in phoible and the set of sounds used in a dataset ? We might catch a lot of small variations in notation by doing that systematically. |
My PR to lexicore essentially enumerates all the features. If you wait
until I have finished the work on pylexicore, you can do this very easily.
|
Amazing ! That's excellent. Looking forward to it ! |
Please see here for an example of how this can be done. |
It's possible that there are some but you can't look at this through the
dataset as a whole, since these are points on which the languages vary. For
example, some languages have only two rhotic phonemes (IPA ɹ and r,
practical orthography r and rr); others have three, either ɹ, r and ɾ (or
ɽ) or sometimes written as something else. LIkewise with voicing, where
some languages only have a marginal contrast.
I'm happy to look at Mattis' examples but I don't have access to lexibank.
If you want to give me access I"m happy to take a look through.
Claire
…On Thu, Nov 26, 2020 at 8:15 AM Johann-Mattis List ***@***.***> wrote:
Please see here
<https://github.com/lexibank/pylexicore/blob/main/examples/pamanyungan/report.md>
for an example of how this can be done.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD7SQR6NNFDN25K27D2FOFDSRZIIJANCNFSM4UDQF6SQ>
.
--
Claire Bowern
Professor
Editor: *Diachronica*
Department of Linguistics, Yale University
she/her or they/them
|
You were sent an invitationow |
Pallanganmiddang is probably not the best source for an example as it's
reconstituted from old sources. Blake and Reid (1999) write both voiced and
voiceless segments, since both occur both initially and medially, but it's
not too clear if they actually contrast. For vowel length, the spellings in
the old sources suggest both long and short vowels but the material is not
systematic enough to be sure.
…On Thu, Nov 26, 2020 at 7:18 AM Sacha ***@***.***> wrote:
For example, for Pallanganmiddang, Phoible gives the sounds: "a e i j k l
l̪ m n n̪ o p r t t̪ u w ŋ ȴ ȵ ȶ ɭ ɳ ɻ ʈ", but here we have (among others):
"b d g k p r t". Phoible (this one is from @erichround
<https://github.com/erichround> ), does not distinguish k/g p/b t/d. Is
vowel length distinctive in Pallanganmiddang ?
Is there a way for us to compare automatically, language by language, the
set of sounds given in phoible and the set of sounds used in a dataset ? We
might catch a lot of small variations in notation by doing that
systematically.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD7SQR7VOYCA56LNE5MIALDSRZBSVANCNFSM4UDQF6SQ>
.
--
Claire Bowern
Professor
Editor: *Diachronica*
Department of Linguistics, Yale University
she/her or they/them
|
thanks, got it.
It looks like there are a fair number of just different glyph choices here.
e.g. ɽ or ɻ vs ɹ; ʈ vs ƫ, ţ, or ȶ (though these of course mean different
things). For Adnyamathanha v is equivalent to β (alternative
representations of the same segment). Phoible in general goes for maximal
specificity of representation whereas both @erichround and I have gone more
for comparability.
…On Fri, Nov 27, 2020 at 11:59 AM Johann-Mattis List < ***@***.***> wrote:
You were sent an invitationow
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD7SQR6CGSXIIND5A4ZN4E3SR7LHDANCNFSM4UDQF6SQ>
.
--
Claire Bowern
Professor
Editor: *Diachronica*
Department of Linguistics, Yale University
she/her or they/them
|
Thanks ! How could we go about getting to a better representation for this dataset ? |
Hi,
I have the impression that there might be spurious contrasts in the orthography profile, in particular voicing contrasts in occlusives (p/b, g/k, t/d). I also suspect that the ɹ/r contrast is maybe not meaningful.
I suggest that we figure out precisely which contrasts are due to variation in descriptive practice, and which are truly contrasts imputable to sound change etc, and neutralize meaningless contrasts.
As to how to normalize, we have three other datasets with languages from these families, and should make sure we are using the same notations:
As you can see, the other ones use p-k-t, not b-g-d, and have a single /r/ sound. https://github.com/lexibank/wold does have a k/g contrast (if it is also meaningless, we should change it there).
@erichround, @chirila, could you chime in on whether these contrasts should be kept here ? Are there other contrasts that should be neutralized in the list above ? @tresoldi, it looks like the orthography profile was from you, do you remember if there was specific motivations for these contrasts ?
For a closer look, the list of sounds with counts can be found in the TRANSCRIPTION file: https://github.com/lexibank/bowernpny/blob/master/TRANSCRIPTION.md
Having non meaningful contrasts causes issues with downstream analyses of the data, especially in the sound correspondence study.
The text was updated successfully, but these errors were encountered: