Release prep

lexibank · Jul 22, 2021 · a8c039e · a8c039e
1 parent bbe57b2
commit a8c039e
Show file tree

Hide file tree

Showing 13 changed files with 343 additions and 78 deletions.
diff --git a/.github/workflows/cldf-validation.yml b/.github/workflows/cldf-validation.yml
@@ -0,0 +1,29 @@
+name: CLDF-validation
+
+on:
+  push:
+    branches: [ master ]
+  pull_request:
+    branches: [ master ]
+
+jobs:
+  build:
+
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: [3.6]
+
+    steps:
+    - uses: actions/checkout@v2
+    - name: Set up Python ${{ matrix.python-version }}
+      uses: actions/setup-python@v2
+      with:
+        python-version: ${{ matrix.python-version }}
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install pytest-cldf
+    - name: Test with pytest
+      run: |
+        pytest --cldf-metadata=cldf/cldf-metadata.json test.py
diff --git a/.travis.yml b/.travis.yml
diff --git a/README.md b/README.md
@@ -1,8 +1,16 @@
 # CLDF dataset derived from Sūn's "Tibeto-Burman Phonology and Lexicon" from 1991
 
-Cite the source dataset as
+[![CLDF validation](https://github.com/lexibank/suntb/workflows/CLDF-validation/badge.svg)](https://github.com/lexibank/suntb/actions?query=workflow%3ACLDF-validation)
+
+## How to cite
+
+If you use these data please cite
+- the original source
+  > Sūn, Hóngkāi 孙宏开 (1991): Zangmianyu yuyin he cihui 藏缅语音和词汇 [Tibeto-Burman phonology and lexicon]. Beijing: Chinese Social Sciences Press.
+- the derived dataset using the DOI of the [particular released version](../../releases/) you were using
+
+## Description
 
-> Sūn, Hóngkāi 孙宏开 (1991): Zangmianyu yuyin he cihui 藏缅语音和词汇 [Tibeto-Burman phonology and lexicon]. Beijing: Chinese Social Sciences Press.
 
 This dataset is licensed under a CC-BY-4.0 license
 
@@ -14,7 +22,7 @@ Conceptlists in Concepticon:
 ## Statistics
 
 
-[![Build Status](https://travis-ci.org/lexibank/suntb.svg?branch=master)](https://travis-ci.org/lexibank/suntb)
+[![CLDF validation](https://github.com/lexibank/suntb/workflows/CLDF-validation/badge.svg)](https://github.com/lexibank/suntb/actions?query=workflow%3ACLDF-validation)
 ![Glottolog: 96%](https://img.shields.io/badge/Glottolog-96%25-green.svg "Glottolog: 96%")
 ![Concepticon: 93%](https://img.shields.io/badge/Concepticon-93%25-green.svg "Concepticon: 93%")
 ![Source: 100%](https://img.shields.io/badge/Source-100%25-brightgreen.svg "Source: 100%")
@@ -41,3 +49,10 @@ STEDT | https://stedt.berkeley.edu | digitization | Editor
 Sūn, Hóngkāi | | original data collection editor | Author
 
 
+
+
+## CLDF Datasets
+
+The following CLDF datasets are available in [cldf](cldf):
+
+- CLDF [Wordlist](https://github.com/cldf/cldf/tree/master/modules/Wordlist) at [cldf/cldf-metadata.json](cldf/cldf-metadata.json)
diff --git a/cldf/README.md b/cldf/README.md
@@ -0,0 +1,99 @@
+<a name="ds-cldfmetadatajson"> </a>
+
+# Wordlist CLDF dataset derived from Sūn's "Tibeto-Burman Phonology and Lexicon" from 1991
+
+**CLDF Metadata**: [cldf-metadata.json](./cldf-metadata.json)
+
+**Sources**: [sources.bib](./sources.bib)
+
+property | value
+ --- | ---
+[dc:bibliographicCitation](http://purl.org/dc/terms/bibliographicCitation) | Sūn, Hóngkāi 孙宏开 (1991): Zangmianyu yuyin he cihui 藏缅语音和词汇 [Tibeto-Burman phonology and lexicon]. Beijing: Chinese Social Sciences Press.
+[dc:conformsTo](http://purl.org/dc/terms/conformsTo) | [CLDF Wordlist](http://cldf.clld.org/v1.0/terms.rdf#Wordlist)
+[dc:format](http://purl.org/dc/terms/format) | <ol><li>http://concepticon.clld.org/contributions/Sun-1991-1004</li></ol>
+[dc:identifier](http://purl.org/dc/terms/identifier) | https://stedt.berkeley.edu/~stedt-cgi/rootcanal.pl/source/ZMYYC
+[dc:license](http://purl.org/dc/terms/license) | https://creativecommons.org/licenses/by/4.0/
+[dcat:accessURL](http://www.w3.org/ns/dcat#accessURL) | https://github.com/lexibank/suntb
+[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="https://github.com/lexibank/suntb/tree/bbe57b2">lexibank/suntb v3.0-10-gbbe57b2</a></li><li><a href="https://github.com/glottolog/glottolog/tree/v4.4">Glottolog v4.4</a></li><li><a href="https://github.com/concepticon/concepticon-data/tree/v2.5.0">Concepticon v2.5.0</a></li><li><a href="https://github.com/cldf-clts/clts/tree/v2.1.0">CLTS v2.1.0</a></li></ol>
+[prov:wasGeneratedBy](http://www.w3.org/ns/prov#wasGeneratedBy) | <ol><li><strong>lingpy-rcParams</strong>: <a href="./lingpy-rcParams.json">lingpy-rcParams.json</a></li><li><strong>python</strong>: 3.8.10</li><li><strong>python-packages</strong>: <a href="./requirements.txt">requirements.txt</a></li></ol>
+[rdf:ID](http://www.w3.org/1999/02/22-rdf-syntax-ns#ID) | suntb
+[rdf:type](http://www.w3.org/1999/02/22-rdf-syntax-ns#type) | http://www.w3.org/ns/dcat#Distribution
+
+
+## <a name="table-formscsv"></a>Table [forms.csv](./forms.csv)
+
+
+Raw lexical data item as it can be pulled out of the original datasets.
+
+This is the basis for creating rows in CLDF representations of the data by
+- splitting the lexical item into forms
+- cleaning the forms
+- potentially tokenizing the form
+
+
+property | value
+ --- | ---
+[dc:conformsTo](http://purl.org/dc/terms/conformsTo) | [CLDF FormTable](http://cldf.clld.org/v1.0/terms.rdf#FormTable)
+[dc:extent](http://purl.org/dc/terms/extent) | 50434
+
+
+### Columns
+
+Name/Property | Datatype | Description
+ --- | --- | --- 
+[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string` | Primary key
+[Local_ID](http://purl.org/dc/terms/identifier) | `string` | 
+[Language_ID](http://cldf.clld.org/v1.0/terms.rdf#languageReference) | `string` | References [languages.csv::ID](#table-languagescsv)
+[Parameter_ID](http://cldf.clld.org/v1.0/terms.rdf#parameterReference) | `string` | References [parameters.csv::ID](#table-parameterscsv)
+[Value](http://cldf.clld.org/v1.0/terms.rdf#value) | `string` | 
+[Form](http://cldf.clld.org/v1.0/terms.rdf#form) | `string` | 
+[Segments](http://cldf.clld.org/v1.0/terms.rdf#segments) | list of `string` (separated by ` `) | 
+[Comment](http://cldf.clld.org/v1.0/terms.rdf#comment) | `string` | 
+[Source](http://cldf.clld.org/v1.0/terms.rdf#source) | list of `string` (separated by `;`) | References [sources.bib::BibTeX-key](./sources.bib)
+`Cognacy` | `string` | 
+`Loan` | `boolean` | 
+`Graphemes` | `string` | 
+`Profile` | `string` | 
+
+## <a name="table-languagescsv"></a>Table [languages.csv](./languages.csv)
+
+property | value
+ --- | ---
+[dc:conformsTo](http://purl.org/dc/terms/conformsTo) | [CLDF LanguageTable](http://cldf.clld.org/v1.0/terms.rdf#LanguageTable)
+[dc:extent](http://purl.org/dc/terms/extent) | 51
+
+
+### Columns
+
+Name/Property | Datatype | Description
+ --- | --- | --- 
+[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string` | Primary key
+[Name](http://cldf.clld.org/v1.0/terms.rdf#name) | `string` | 
+[Glottocode](http://cldf.clld.org/v1.0/terms.rdf#glottocode) | `string` | 
+`Glottolog_Name` | `string` | 
+[ISO639P3code](http://cldf.clld.org/v1.0/terms.rdf#iso639P3code) | `string` | 
+[Macroarea](http://cldf.clld.org/v1.0/terms.rdf#macroarea) | `string` | 
+[Latitude](http://cldf.clld.org/v1.0/terms.rdf#latitude) | `decimal` | 
+[Longitude](http://cldf.clld.org/v1.0/terms.rdf#longitude) | `decimal` | 
+`Family` | `string` | 
+`SubGroup` | `string` | 
+
+## <a name="table-parameterscsv"></a>Table [parameters.csv](./parameters.csv)
+
+property | value
+ --- | ---
+[dc:conformsTo](http://purl.org/dc/terms/conformsTo) | [CLDF ParameterTable](http://cldf.clld.org/v1.0/terms.rdf#ParameterTable)
+[dc:extent](http://purl.org/dc/terms/extent) | 1004
+
+
+### Columns
+
+Name/Property | Datatype | Description
+ --- | --- | --- 
+[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string` | Primary key
+[Name](http://cldf.clld.org/v1.0/terms.rdf#name) | `string` | 
+[Concepticon_ID](http://cldf.clld.org/v1.0/terms.rdf#concepticonReference) | `string` | 
+`Concepticon_Gloss` | `string` | 
+`Chinese_Gloss` | `string` | 
+`Number` | `string` | 
+
diff --git a/cldf/cldf-metadata.json b/cldf/cldf-metadata.json
@@ -15,34 +15,38 @@
     "dcat:accessURL": "https://github.com/lexibank/suntb",
     "prov:wasDerivedFrom": [
         {
-            "rdf:type": "prov:Entity",
-            "dc:title": "Repository",
             "rdf:about": "https://github.com/lexibank/suntb",
-            "dc:created": "v3.0-9-g6fbd900"
+            "rdf:type": "prov:Entity",
+            "dc:created": "v3.0-10-gbbe57b2",
+            "dc:title": "Repository"
         },
         {
-            "rdf:type": "prov:Entity",
-            "dc:title": "Glottolog",
             "rdf:about": "https://github.com/glottolog/glottolog",
-            "dc:created": "v4.3"
+            "rdf:type": "prov:Entity",
+            "dc:created": "v4.4",
+            "dc:title": "Glottolog"
         },
         {
-            "rdf:type": "prov:Entity",
-            "dc:title": "Concepticon",
             "rdf:about": "https://github.com/concepticon/concepticon-data",
-            "dc:created": "v2.4.0"
+            "rdf:type": "prov:Entity",
+            "dc:created": "v2.5.0",
+            "dc:title": "Concepticon"
         },
         {
+            "rdf:about": "https://github.com/cldf-clts/clts",
             "rdf:type": "prov:Entity",
-            "dc:title": "CLTS",
-            "rdf:about": "https://github.com/cldf-clts/clts/",
-            "dc:created": "v2.0.0"
+            "dc:created": "v2.1.0",
+            "dc:title": "CLTS"
         }
     ],
     "prov:wasGeneratedBy": [
+        {
+            "dc:title": "lingpy-rcParams",
+            "dc:relation": "lingpy-rcParams.json"
+        },
         {
             "dc:title": "python",
-            "dc:description": "3.9.4"
+            "dc:description": "3.8.10"
         },
         {
             "dc:title": "python-packages",
@@ -57,6 +61,7 @@
     "tables": [
         {
             "dc:conformsTo": "http://cldf.clld.org/v1.0/terms.rdf#FormTable",
+            "dc:description": "\nRaw lexical data item as it can be pulled out of the original datasets.\n\nThis is the basis for creating rows in CLDF representations of the data by\n- splitting the lexical item into forms\n- cleaning the forms\n- potentially tokenizing the form\n",
             "dc:extent": 50434,
             "tableSchema": {
                 "columns": [
@@ -196,8 +201,8 @@
                     {
                         "datatype": {
                             "base": "decimal",
-                            "minimum": -90,
-                            "maximum": 90
+                            "minimum": "-90",
+                            "maximum": "90"
                         },
                         "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#latitude",
                         "required": false,
@@ -206,8 +211,8 @@
                     {
                         "datatype": {
                             "base": "decimal",
-                            "minimum": -180,
-                            "maximum": 180
+                            "minimum": "-180",
+                            "maximum": "180"
                         },
                         "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#longitude",
                         "required": false,

diff --git a/cldf/languages.csv b/cldf/languages.csv
@@ -22,7 +22,7 @@ Kaman,Kaman [Miju],miju1243,Kman,mxj,Eurasia,,,Sino-Tibetan,
 BlackLahu,Lahu (Black),naaa1244,Na (Lahu),,Eurasia,,,Sino-Tibetan,
 Lisu,Lisu,lisu1250,Lisu,lis,Eurasia,,,Sino-Tibetan,
 Maru,Maru [Langsu],lawn1238,Lawng Hsu,,Eurasia,,,Sino-Tibetan,
-Muya,Muya [Minyak],muya1239,Muya,mvm,Eurasia,,,Sino-Tibetan,
+Muya,Muya [Minyak],muya1239,Muya,,Eurasia,,,Sino-Tibetan,
 Namuyi,Namuyi,namu1246,Namuyi,nmy,Eurasia,,,Sino-Tibetan,
 LijiangNaxi,Naxi (Lijiang),lich1241,Lichiang,,Eurasia,,,Sino-Tibetan,
 YongningNaxi,Naxi (Yongning),yong1270,Yongning Na,nru,Eurasia,,,Sino-Tibetan,