Data FI sample #564

yoge1 · 2022-12-22T17:22:39Z

There are still some issues with the data.

Any idea why text content is not allowed in element seg, e.g.:

Data (Data/ParlaMint-FI/ParlaMint-FI_2015-05-26-ps-7.xml lines 99-101):

<u ana="#chair" who="#JuhaSipilä" xml:id="ParlaMint-FI_2015-05-22-ps-7.u1">
   <seg xml:id="ParlaMint-FI_2015-05-22-ps-7.seg1">Ensimmäiseen käsittelyyn esitellään päiväjärjestyksen 2. asia. Käsittelyn pohjana on hallintovaliokunnan mietintö HaVM 1/2015 vp. Nyt päätetään lakiehdotuksen sisällöstä.</seg>
</u>

Validation error:

Data/ParlaMint-FI/ParlaMint-FI_2015-05-26-ps-7.xml:100:234: error: text not allowed here; expected element "gap", "incident", "kinesic", "note", "pb", "s" or "vocal"
Data/ParlaMint-FI/ParlaMint-FI_2015-05-26-ps-7.xml:100:240: error: element "seg" incomplete; expected element "gap", "incident", "kinesic", "note", "pb", "s" or "vocal"

I tried to run make add-common-content-FI but it resulted in errors, at least due to country-code FI not being available in the file Scripts/parlamint-add-common-content.xsl:

FATAL : BAD COUNTRY!

TomazErjavec · 2022-12-22T18:18:44Z

ext not allowed here; expected element "gap", "incident", "kinesic", "note", "pb", "s" or "vocal"

This looks like the schema for the .ana version of the corpus was used, as it expects s(entence), rather than text.

For the rest, I hope @matyaskopp will be able to help.

matyaskopp · 2022-12-22T18:59:55Z

Any idea why text content is not allowed in element seg, e.g.:

you are including TEI version of component files in TEI.ana root file:
https://github.com/SemanticComputing/ParlaMint/blob/b8cde238c2a18b04e6cde5231c71e294a997d39b/Data/ParlaMint-FI/ParlaMint-FI.ana.xml#L4307-L4311

2. I tried to run make add-common-content-FI but it resulted in errors, at least due to country-code FI not being available in the file Scripts/parlamint-add-common-content.xsl:

FATAL : BAD COUNTRY!

I will fix it and let you know. #565

yoge1 · 2023-03-03T16:38:47Z

Hi @matyaskopp! I have now fixed the "easy" error cases in the FI sample data. Any ideas on how to proceed from here would be appreciated.

matyaskopp · 2023-03-03T19:26:01Z

Error: /home/runner/work/ParlaMint/ParlaMint/ParlaMint/Data/ParlaMint-FI/ParlaMint-FI.xml:44:19: error: element "extent" incomplete; missing required element "measure"
https://github.com/clarin-eric/ParlaMint/actions/runs/4325365846/jobs/7551440647#step:4:27

you don't have extend/measure elements in TEI version. in add-common-content script:

words (just numbers) are calculated in TEI.ana version and copied into TEI version. so you have to add:

<extent>
  <measure unit="words" quantity="0" xml:lang="fi">0 sanat</ns0:measure>
  <measure unit="words" quantity="0" xml:lang="en">0 words</ns0:measure>

into every TEI file

speeches are not calculated, so you have to do it in your pipeline

Error: /home/runner/work/ParlaMint/ParlaMint/ParlaMint/Data/ParlaMint-FI/ParlaMint-FI.xml:51:48: error: character content of element "idno" invalid; must be a URI matching the regular expression "https?://.+"

https://github.com/clarin-eric/ParlaMint/actions/runs/4325365846/jobs/7551440647#step:4:28

you can set

<idno subtype="handle" type="URI">http://hdl.handle.net/11356/XXXX</idno>

in the sample, Tomaž will add the proper handle in ParlaMint release

Error: /home/runner/work/ParlaMint/ParlaMint/ParlaMint/Data/ParlaMint-FI/ParlaMint-FI.xml:238:35: error: character content of element "term" invalid; must be a string matching the regular expression "(\S)|(\S[\S ]*\S)"

https://github.com/clarin-eric/ParlaMint/actions/runs/4325365846/jobs/7551440647#step:4:29

Empty term is not allowed, you should translate it into fi:

<catDesc xml:lang="fi">
 <term/>
</catDesc>

Error: /home/runner/work/ParlaMint/ParlaMint/ParlaMint/Data/ParlaMint-FI/ParlaMint-FI.xml:1379:33: error: character content of element "forename" invalid; must be a string matching the regular expression "(\S)|(\S[\S ]*\S)"
https://github.com/clarin-eric/ParlaMint/actions/runs/4325365846/jobs/7551440647#step:4:30

forename shouldn't be empty:

               <person xml:id="SDP">
                  <persName>
                     <surname>SDP</surname>
                     <forename/>
                  </persName>
               </person>

BTW this looks more like political party - not like person

I can continue in the same way for the rest of the errors.
Is this what you have expected from me? - to explain what error messages mean

yoge1 · 2023-04-11T14:59:05Z

@matyaskopp, regarding the errors that we still have in the FI sample data:

Error: ERROR[01] affiliation collision: (2008-02-07 --- 2010-02-03) is inside (2007-03-21 --- 2023-04-11) affiliation member-#fi_parliament
-- I will ask my colleague who has produced the affiliation data whether these could be fixed via an automatic process or do we need to fix them manually.
Error: /home/runner/work/ParlaMint/ParlaMint/ParlaMint/Data/ParlaMint-FI/ParlaMint-FI_2015-05-22-ps-7.ana.xml:9892:1185: error: text not allowed here; expected element "ns0:gap", "ns0:incident", "ns0:kinesic", "ns0:note", "ns0:pb", "ns0:s" or "ns0:vocal"
Error: /home/runner/work/ParlaMint/ParlaMint/ParlaMint/Data/ParlaMint-FI/ParlaMint-FI_2015-05-22-ps-7.ana.xml:9892:1193: error: element "ns0:seg" incomplete; expected element "ns0:gap", "ns0:incident", "ns0:kinesic", "ns0:note", "ns0:pb", "ns0:s" or "ns0:vocal"
...
Error: /home/runner/work/ParlaMint/ParlaMint/ParlaMint/Data/ParlaMint-FI/ParlaMint-FI_2015-05-22-ps-7.ana.xml:21046:81: error: text not allowed here; expected the element end-tag or element "ns0:gap", "ns0:incident", "ns0:kinesic", "ns0:note", "ns0:pb", "ns0:s" or "ns0:vocal"
--- As far as I understand, text content should be allowed in "seg" element (https://clarin-eric.github.io/ParlaMint/#TEI.seg)?
ERROR ParlaMint-FI_2015-05-22-ps-7.ana: Strange pointer 'ParlaMint-FI_2015-05-22-ps-7.seg1.1'
--- What might be the issue here?

TomazErjavec · 2023-04-11T16:42:50Z

@yoge1, @matyaskopp is ill, so I will try to answer:

As far as I understand, text content should be allowed in "seg" element (https://clarin-eric.github.io/ParlaMint/#TEI.seg)?

The short answer is, yes, it is allowed in the "plain text" version of the corpus, but not in the linguistically annotated version (so. .ana) which you are validating (cf. also https://clarin-eric.github.io/ParlaMint/#sec-ana-markup). Here all text content of the transcription proper should be inside <w> or <pc>.

E.g. you have
ParlaMint/Data/ParlaMint-FI/ParlaMint-FI_2015-05-22-ps-7.ana.xml:9892:1185: error: text not allowed here
here you have
```<ns0:seg xml:id="ParlaMint-FI_2015-05-22-ps-7.seg28">Maahanmuutto ja takinkääntö: Minusta on hyvä asia, jos mei...``
i.e. for some reason a segment which has not be linguistically analysed.

A bit longer answer: you point to the definition of seg in the TEI ODD generated schema - we do try to keep it as compatible as possible with the official schemas that are used for validation, and can be found in the Schema directory. But it is not always possible and the ODD schema allows some construction not allowed by the schemas in Schema/. The text content of <seg> in the .ana version being a case in point (we would need two ODD schemas in order to be able to enforce this).

ERROR ParlaMint-FI_2015-05-22-ps-7.ana: Strange pointer 'ParlaMint-FI_2015-05-22-ps-7.seg1.1'
--- What might be the issue here?

You have links like

<ns0:link ana="ud-syn:nummod"
                                  target="ParlaMint-FI_2015-05-22-ps-7.seg1.1.2 ParlaMint-FI_2015-05-22-ps-7.seg1.1.1"/>

but the contents of @target should be local pointer, i.e. they should have a # in front of the ID, like:

<ns0:link ana="ud-syn:nummod"
                                  target="#ParlaMint-FI_2015-05-22-ps-7.seg1.1.2 #ParlaMint-FI_2015-05-22-ps-7.seg1.1.1"/>

yoge1 · 2023-04-12T12:04:50Z

ERROR ParlaMint-FI_2015-05-22-ps-7.ana: Strange pointer 'ParlaMint-FI_2015-05-22-ps-7.seg1.1'
--- What might be the issue here?

You have links like
<ns0:link ana="ud-syn:nummod"
                                  target="ParlaMint-FI_2015-05-22-ps-7.seg1.1.2 ParlaMint-FI_2015-05-22-ps-7.seg1.1.1"/>
but the contents of @target should be local pointer, i.e. they should have a # in front of the ID, like:
<ns0:link ana="ud-syn:nummod"
                                  target="#ParlaMint-FI_2015-05-22-ps-7.seg1.1.2 #ParlaMint-FI_2015-05-22-ps-7.seg1.1.1"/>

Ok, I have a fix for these now, but I think the error message is actually about xml:id of element s, and not about these pointers to element w.

Here's the only occurrence of the string 'ParlaMint-FI_2015-05-22-ps-7.seg1.1' mentioned in the error message:

<ns0:s xml:id="ParlaMint-FI_2015-05-22-ps-7.seg1.1">

TomazErjavec · 2023-04-12T12:24:25Z

Here's the only occurrence of the string 'ParlaMint-FI_2015-05-22-ps-7.seg1.1' mentioned in the error message

Actually, no, there is also:

<ns0:link ana="ud-syn:root"
   target="ParlaMint-FI_2015-05-22-ps-7.seg1.1 ParlaMint-FI_2015-05-22-ps-7.seg1.1.3"/>

which is what the error message was probably referring to.

yoge1 · 2023-04-12T14:01:30Z

Thanks, you are right, of course! The "Strange pointer" issue is now fixed. And this caused new issues, which I'll look into next.

…xt contents with period (.) but instead remove such w elements

…tribute

yoge1 · 2023-05-30T20:34:58Z

I made some fixes to the sample data. No more errors in the local validation.
Any idea what's causing this in the GitHub ValidateCountries (FI) action?
ERROR: syntactic head #ParlaMint-FI_2015-05-22-ps-7.seg1.1.2 not found for id ParlaMint-FI_2015-05-22-ps-7.seg1.1.1

matyaskopp · 2023-05-31T06:35:33Z

Any idea what's causing this in the GitHub ValidateCountries (FI) action?
ERROR: syntactic head #ParlaMint-FI_2015-05-22-ps-7.seg1.1.2 not found for id ParlaMint-FI_2015-05-22-ps-7.seg1.1.1

It should be fixed now. You used a different namespace from the default one, and the script for conversion to conllu did not cover that.
But it would probably be better if you provide data without the ns0: namespace prefix (I am not sure if other scripts cover it or not)

fix getting component files when bit xi prefix is used (related to #564)

yoge1 · 2023-06-26T17:26:04Z

added gap sub-element for seg elements which our linguistic parser hasn't been able to process

TEI and TEI.ana contain different text.

TEI: https://github.com/SemanticComputing/ParlaMint/blob/721d9137b00e121afe41bc10ac7a0bad8ded009f/Data/ParlaMint-FI/ParlaMint-FI_2015-05-22-ps-7.xml#L182-L187

<u ana="#regular"
    who="#MariaTolppanen"
    xml:id="ParlaMint-FI_2015-05-22-ps-7.u14">
   <seg xml:id="ParlaMint-FI_2015-05-22-ps-7.seg27">Arvoisa puhemies! Näin olen todellakin ymmärtänyt, että veroasioissa on päästy aika lailla yhteisymmärrykseen. Sehän jo aika alkumetreillä kerrottiin, että verokanta ei tässä maassa muutu. Sehän on ihan selkeätä suomea, siinä olemme kaiketi kaikki samaa mieltä. Se, mitä sisällä tapahtuu, on sitten eri asia. Se tapahtuu erilaisissa huoneissa, eri ovien takana. Siellä ovat neuvottelijat, siellä ovat eri puolueista ammattitaitoiset neuvottelijat ja hyvät neuvottelijat, jotka siellä tällä hetkellä ovat neuvottelemassa ja varmastikin pääsevät niistä jonkinlaiseen lopputulokseen, koska ei hallitusta maahan saada, jos ei lopputulosta tule. Näinhän se on. Mutta nyt vaikuttaa todellakin hyvältä, että tähän maahan saadaan hallitus ja saadaan hallitus ripeässä tahdissa, jotta eduskunta pääsee tekemään työtään.</seg>
   <seg xml:id="ParlaMint-FI_2015-05-22-ps-7.seg28">Maahanmuutto ja takinkääntö: Minusta on hyvä asia, jos meillä on takki kääntynyt, koska tämän käännöksen mukana nyt vihdoinkin selvitetään maahanmuuttokustannukset. Tähänhän saakka näitten kustannusten selvittäminen on ollut täysin mahdotonta, ja sen on estänyt meidän lainsäädäntömme. Me emme pysty selvittämään esimerkiksi harkinnanvaraisia sosiaalitukia millään tavalla, koska meidän lainsäädäntömme estää sen. Me emme pysty määrittelemään sitä, kenelle on maksettu, mitä on maksettu, milloin on maksettu ja miksi on maksettu. Nyt, jos olemme yhteisesti sopineet — niin kuin olemme — sen, että nämä kustannukset selvitetään, sen jälkeen jokainen voi ihan tykönänsä katsoa, mitkä ovat todelliset kustannukset ja mihinkä meidän rahkeemme riittävät, mihin tämän maan työtä tekevän kansan veromarkat riittävät, kuinka pitkälle voidaan mennä. Sehän on se kaiken a ja o. Tehdäänpä sitä tai tätä, niin täytyy tietää, kuinka paljon on resursseja käyttää siihen, koska se on se vanha totuus, että kakkua ei voi yhtä aikaa säästää ja syödä. Ei vain voi. Jompikumpi täytyy tehdä, mutta ensin täytyy päättää, kumpi tehdään.</seg>
</u>

TEI.ana: after removing annotation:

<u ana="#regular"
    who="#MariaTolppanen"
    xml:id="ParlaMint-FI_2015-05-22-ps-7.u14">
   <seg xml:id="ParlaMint-FI_2015-05-22-ps-7.seg27">Keskustelu ja asian käsittely keskeytetään . Täysistunto keskeytetään , ja istuntoa jatketaan kello 15.</seg>
   <seg xml:id="ParlaMint-FI_2015-05-22-ps-7.u14.2">
      <gap reason="editorial">
         <desc xml:lang="en">Technical problem: content could not be processed by the linguistic parser</desc>
      </gap>
   </seg>
</u>

original TEI.ana fragment

<ns0:u ana="#regular"
    who="#MariaTolppanen"
    xml:id="ParlaMint-FI_2015-05-22-ps-7.u14">
   <ns0:seg xml:id="ParlaMint-FI_2015-05-22-ps-7.seg27">
      <ns0:s xml:id="ParlaMint-FI_2015-05-22-ps-7.seg27.1">
         <ns0:w xml:id="ParlaMint-FI_2015-05-22-ps-7.seg27.1.1"
    lemma="keskustelu"
    msd="UPosTag=NOUN|Case=Nom|Derivation=U|Number=Sing">Keskustelu</ns0:w>
         <ns0:w xml:id="ParlaMint-FI_2015-05-22-ps-7.seg27.1.2"
    lemma="ja"
    msd="UPosTag=CCONJ|">ja</ns0:w>
         <ns0:w xml:id="ParlaMint-FI_2015-05-22-ps-7.seg27.1.3"
    lemma="asia"
    msd="UPosTag=NOUN|Case=Gen|Number=Sing">asian</ns0:w>
         <ns0:w xml:id="ParlaMint-FI_2015-05-22-ps-7.seg27.1.4"
    lemma="käsittely"
    msd="UPosTag=NOUN|Case=Nom|Derivation=U|Number=Sing">käsittely</ns0:w>
         <ns0:w xml:id="ParlaMint-FI_2015-05-22-ps-7.seg27.1.5"
    lemma="keskeyttää"
    msd="UPosTag=VERB|Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Pass">keskeytetään</ns0:w>
         <ns0:pc xml:id="ParlaMint-FI_2015-05-22-ps-7.seg27.1.6" msd="UPosTag=PUNCT|">.</ns0:pc>
         <ns0:linkGrp targFunc="head argument" type="UD-SYN">
<ns0:link ana="ud-syn:obj"
          target="#ParlaMint-FI_2015-05-22-ps-7.seg27.1.5 #ParlaMint-FI_2015-05-22-ps-7.seg27.1.1"/>
<ns0:link ana="ud-syn:cc"
          target="#ParlaMint-FI_2015-05-22-ps-7.seg27.1.4 #ParlaMint-FI_2015-05-22-ps-7.seg27.1.2"/>
<ns0:link ana="ud-syn:nmod_gobj"
          target="#ParlaMint-FI_2015-05-22-ps-7.seg27.1.4 #ParlaMint-FI_2015-05-22-ps-7.seg27.1.3"/>
<ns0:link ana="ud-syn:conj"
          target="#ParlaMint-FI_2015-05-22-ps-7.seg27.1.1 #ParlaMint-FI_2015-05-22-ps-7.seg27.1.4"/>
<ns0:link ana="ud-syn:root"
          target="#ParlaMint-FI_2015-05-22-ps-7.seg27.1 #ParlaMint-FI_2015-05-22-ps-7.seg27.1.5"/>
<ns0:link ana="ud-syn:punct"
          target="#ParlaMint-FI_2015-05-22-ps-7.seg27.1.5 #ParlaMint-FI_2015-05-22-ps-7.seg27.1.6"/>
         </ns0:linkGrp>
      </ns0:s>
      <ns0:s xml:id="ParlaMint-FI_2015-05-22-ps-7.seg27.2">
         <ns0:w xml:id="ParlaMint-FI_2015-05-22-ps-7.seg27.2.7"
    lemma="täysistunto"
    msd="UPosTag=NOUN|Case=Nom|Number=Sing">Täysistunto</ns0:w>
         <ns0:w xml:id="ParlaMint-FI_2015-05-22-ps-7.seg27.2.8"
    lemma="keskeyttää"
    msd="UPosTag=VERB|Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Pass">keskeytetään</ns0:w>
         <ns0:pc xml:id="ParlaMint-FI_2015-05-22-ps-7.seg27.2.9" msd="UPosTag=PUNCT|">,</ns0:pc>
         <ns0:w xml:id="ParlaMint-FI_2015-05-22-ps-7.seg27.2.10"
    lemma="ja"
    msd="UPosTag=CCONJ|">ja</ns0:w>
         <ns0:w xml:id="ParlaMint-FI_2015-05-22-ps-7.seg27.2.11"
    lemma="istunto"
    msd="UPosTag=NOUN|Case=Par|Number=Sing">istuntoa</ns0:w>
         <ns0:w xml:id="ParlaMint-FI_2015-05-22-ps-7.seg27.2.12"
    lemma="jatkaa"
    msd="UPosTag=VERB|Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Pass">jatketaan</ns0:w>
         <ns0:w xml:id="ParlaMint-FI_2015-05-22-ps-7.seg27.2.13"
    lemma="kello"
    msd="UPosTag=NOUN|Case=Nom|Number=Sing">kello</ns0:w>
         <ns0:w xml:id="ParlaMint-FI_2015-05-22-ps-7.seg27.2.14"
    lemma="15."
    msd="UPosTag=NUM|">15.</ns0:w>
         <ns0:linkGrp targFunc="head argument" type="UD-SYN">
<ns0:link ana="ud-syn:obj"
          target="#ParlaMint-FI_2015-05-22-ps-7.seg27.2.8 #ParlaMint-FI_2015-05-22-ps-7.seg27.2.7"/>
<ns0:link ana="ud-syn:root"
          target="#ParlaMint-FI_2015-05-22-ps-7.seg27.2 #ParlaMint-FI_2015-05-22-ps-7.seg27.2.8"/>
<ns0:link ana="ud-syn:punct"
          target="#ParlaMint-FI_2015-05-22-ps-7.seg27.2.12 #ParlaMint-FI_2015-05-22-ps-7.seg27.2.9"/>
<ns0:link ana="ud-syn:cc"
          target="#ParlaMint-FI_2015-05-22-ps-7.seg27.2.12 #ParlaMint-FI_2015-05-22-ps-7.seg27.2.10"/>
<ns0:link ana="ud-syn:obj"
          target="#ParlaMint-FI_2015-05-22-ps-7.seg27.2.12 #ParlaMint-FI_2015-05-22-ps-7.seg27.2.11"/>
<ns0:link ana="ud-syn:conj"
          target="#ParlaMint-FI_2015-05-22-ps-7.seg27.2.8 #ParlaMint-FI_2015-05-22-ps-7.seg27.2.12"/>
<ns0:link ana="ud-syn:obl"
          target="#ParlaMint-FI_2015-05-22-ps-7.seg27.2.12 #ParlaMint-FI_2015-05-22-ps-7.seg27.2.13"/>
<ns0:link ana="ud-syn:nummod"
          target="#ParlaMint-FI_2015-05-22-ps-7.seg27.2.13 #ParlaMint-FI_2015-05-22-ps-7.seg27.2.14"/>
         </ns0:linkGrp>
      </ns0:s>
   </ns0:seg>
   <ns0:seg xml:id="ParlaMint-FI_2015-05-22-ps-7.u14.2">
      <ns0:gap reason="editorial">
         <ns0:desc xml:lang="en">Technical problem: content could not be processed by the linguistic parser</ns0:desc>
      </ns0:gap>
   </ns0:seg>
</ns0:u>

It seems that this is caused by the issue that some of the interruptions are marked as regular speeches (not as vocal elements, or u elements with #interrupting in the ana attribute.) in TEI files. I'm not aware of any programmatic fix for this, so I probably need to fix such interruptions manually.

@matyaskopp Do you happen to have an easy process (e.g. a ready-made script / one-liner) for finding out such cases?

matyaskopp · 2023-06-26T17:58:04Z

@matyaskopp Do you happen to have an easy process (e.g. a ready-made script / one-liner) for finding out such cases?

I hacked a conversion tei2text, so it can be used for it.

make text.seg-FI
make text.seg.ana-FI

produces folder Data/ParlaMint-FI/text.seg/ with ids, text and notes
you can then compare files:

meld Data/ParlaMint-FI/text.seg/{ParlaMint-FI_2015-05-22-ps-7.ana.txt,ParlaMint-FI_2015-05-22-ps-7.txt}

but I am suggesting starting with the join=right attribute, to get rid of most differences (additional spaces before interpunctions).

…elements (regular speeches)

… org events for legislative periods instead of parliamentary sessions; component files: add meeting elements for term, session, meeting and sitting

…liation to party.PV

… unprocessed

… and ana.xml content (don't create a seg for interruption in ana.xml to follow xml's practice)

…and ana.xml content (align the xml and ana.xml text contents by utilizing levenshtein distance; root cause: interruptions can be marked as utterances in xml, whereas in our linguistically processed data they are segments of interrupted speeches)

…s it's not part of the sample set

add ParlaMint-FI sample data

b8cde23

yoge1 added 9 commits December 23, 2022 13:18

Merge remote-tracking branch 'upstream/data' into data

ab0d834

FI sample data: include TEI.ana component files in TEI.ana root file

241e2ee

FI sample data: fix TEI.ana component file titles

58d4093

Merge remote-tracking branch 'upstream/data' into data

2accd07

Merge remote-tracking branch 'upstream/data' into data

5a8f425

FI sample data: insignificant changes in whitespaces

2fb0ef1

FI sample data: fixes to TEI.ana files

dfdf47e

FI sample: ParlaMint-taxonomy-UD-SYN: add missing categories

535c357

FI add common content

e7368eb

yoge1 added 10 commits April 5, 2023 18:25

Merge remote-tracking branch 'upstream/data' into data

647d3a4

FI sample data: fixes to TEI and TEI.ana files

5a65a6d

FI sample data: fix empty person forename and surname elements

2b48afb

Merge remote-tracking branch 'upstream/data' into data

476239b

FI sample data: add party.SMP and party.KP definitions

6174d3d

FI sample data: fix filenames to match the xml:id's

01df554

FI sample data: fix ana main title

b2f4721

FI sample data: remove attribute lemma from element pc

28e4910

FI sample data: remove empty w elements without text content

9d5b204

FI sample data: fix the existence from date of party.SIN

d3db1f4

FI sample data: fix link target pointers

1d6f93b

yoge1 added 3 commits May 30, 2023 22:57

FI sample data: don't replace empty lemma attributes and w element te…

8ede03e

…xt contents with period (.) but instead remove such w elements

FI sample data: remove trailing space in relation element's active at…

278d141

…tribute

UPosTag=PUNCT| -> UPosTag=PUNCT

7d76729

matyaskopp added a commit that referenced this pull request May 31, 2023

fix issue with non-default namespace in input files (related to #564)

c10cbcf

matyaskopp added a commit that referenced this pull request May 31, 2023

fix getting component files when bit xi prefix is used (related to #564)

0c449a8

matyaskopp added a commit that referenced this pull request May 31, 2023

Merge pull request #673 from clarin-eric/devel

7b6f091

fix getting component files when bit xi prefix is used (related to #564)

yoge1 added 2 commits May 31, 2023 11:22

Merge remote-tracking branch 'upstream/data' into data

3f76c96

FI sample data: use default namespace instead of prefix ns0 in ana files

84f5d72

matyaskopp linked an issue May 31, 2023 that may be closed by this pull request

FI Feedback #637

Open

15 tasks

yoge1 added 2 commits May 31, 2023 12:56

FI sample data: UPosTag: remove trailing |

f477b81

Merge remote-tracking branch 'upstream/data' into data

5edc385

yoge1 added 14 commits June 27, 2023 16:05

FI sample data: add missing join="right"

a084340

FI sample data: present interruptions as vocal elements instead of u …

832441a

…elements (regular speeches)

FI sample data: corpus root file: add meeting elements and parliament…

7c539a7

… org events for legislative periods instead of parliamentary sessions; component files: add meeting elements for term, session, meeting and sitting

FI sample data: fix to date to be later than from date

6682e64

FI sample data: add respStmt

344aac3

FI sample data: updata conllu files

e7b3f50

FI sample data: fix party.KP information and MP Paavo Väyrynen's affi…

01c72f9

…liation to party.PV

FI sample data: fix prev and next attribute's pointer format

987beba

Merge remote-tracking branch 'upstream/data' into data

5ddb9d5

FI sample data: retain the seg id of segments that are linguistically…

3b01486

… unprocessed

FI sample data: fix meeting numbers

f2adefe

FI sample data: fix for seg id mismatch (inside a speech) between xml…

f613e61

… and ana.xml content (don't create a seg for interruption in ana.xml to follow xml's practice)

FI sample data: remove extra corpus component file from the include a…

25cce24

…s it's not part of the sample set

matyaskopp merged commit a8f2ed5 into clarin-eric:data Sep 15, 2023
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data FI sample #564

Data FI sample #564

yoge1 commented Dec 22, 2022

TomazErjavec commented Dec 22, 2022

matyaskopp commented Dec 22, 2022 •

edited

Loading

yoge1 commented Mar 3, 2023

matyaskopp commented Mar 3, 2023

yoge1 commented Apr 11, 2023

TomazErjavec commented Apr 11, 2023

yoge1 commented Apr 12, 2023

TomazErjavec commented Apr 12, 2023

yoge1 commented Apr 12, 2023

yoge1 commented May 30, 2023

matyaskopp commented May 31, 2023

yoge1 commented Jun 26, 2023

matyaskopp commented Jun 26, 2023

Data FI sample #564

Data FI sample #564

Conversation

yoge1 commented Dec 22, 2022

TomazErjavec commented Dec 22, 2022

matyaskopp commented Dec 22, 2022 • edited Loading

yoge1 commented Mar 3, 2023

matyaskopp commented Mar 3, 2023

yoge1 commented Apr 11, 2023

TomazErjavec commented Apr 11, 2023

yoge1 commented Apr 12, 2023

TomazErjavec commented Apr 12, 2023

yoge1 commented Apr 12, 2023

yoge1 commented May 30, 2023

matyaskopp commented May 31, 2023

yoge1 commented Jun 26, 2023

matyaskopp commented Jun 26, 2023

matyaskopp commented Dec 22, 2022 •

edited

Loading