Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elements not documented in the EGD #299

Open
michaelnmmeyer opened this issue May 7, 2024 · 8 comments
Open

Elements not documented in the EGD #299

michaelnmmeyer opened this issue May 7, 2024 · 8 comments
Assignees

Comments

@michaelnmmeyer
Copy link
Member

While cleaning up our schema, I found a few elements that are not documented in the EGD but that occur in a significant number of inscriptions. This mainly concerns texts from tfb-eiad-epigraphy and from tfc-campa-epigraphy. Here is the list:

altIdentifier
biblFull
desc
editionStmt
editor
facsimile
graphic
institution
settlement
term

All these elements but term appear in the teiHeader, and most of them are due to the addition of bibliographical data under sourceDesc.

I am not sure what to do with the data, but, in any case, I would prefer not to allow bibliographic entries to be encoded in TEI (with biblFull). Things would be simpler for me if we used <bibl><ptr ref="..."/></bibl> with a Zotero entry everywhere.

@danbalogh
Copy link
Collaborator

I believe the contents of those two repositories are all files encoded for previous projects and not (yet) migrated to DHARMA norms. If Arlo confirms that, I think it would be best to just ignore them for the time being, and perhaps create a new list of undocumented elements occurring outside those repos.
Some comments:

  • <altIdentifier> may have been used in files ingested from earlier projects; its use has never been formalised. See "Preserving the identifier of the text in the earlier project" in EGD Appendix G.
  • <term> is permitted in our editions, as per EGD §7.2.2. If The term elements have gloss siblings, then I think this should be fine. If you think it is called for, we might want to enforce @xml:lang on <term>; I believe there aren't many occurrences, so updating them manually should not be too much of a load.

I don't know if any of the other elements are to be recognised as legitimate and in what contexts.

@michaelnmmeyer
Copy link
Member Author

@danbalogh Thank you. My bad for <term>, I missed it somehow.

@arlogriffiths
Copy link
Collaborator

Thanks.

I'd like to know where <term> occurs. It may be a practice from EGC that has crept into a few inscription xml files.

To my mind we never really discussed how to deal with legacy files obtained by conversion to our model from the Campa and EIAD corpora, after Axelle had handled the import. We should probably have that discussion now.

Unless you feel it is a bad idea, I'd be happy to delete from the teiHeaders in our converted INSCIC files all such elements that are inherited from the ancestor files. I can do this manually or perhaps @michaelnmmeyer could automate the process. The percentage of files inherited from the earlier Campa corpus that will eventually be part of tfc-campa-epigraphy will be less than 25%, I think, and we don't need to be slavish about whatever best practices may be for reuse of xml data.

In fact I had had on my mind to discuss with @michaelnmmeyer the issue of EIAD files. These have been imported by Axelle at a fairly early stage of the project stage from my private iksvaku-inscriptions repo. Since it is the latter which has the source code for http://hisoma.huma-num.fr/exist/apps/EIAD/index2.html, and my collaborator Vincent Tournier requested some updates after Axelle imported the xml source files to erc-dharma, a small number of asynchronisms have arisen, with better data in iksvaku-inscriptions than what we have in tfb-eiad-epigraphy. I estimate it's a handful of cases, and they can probablky be tracked down easily via the record of commits on tiksvaku-inscriptions. Would @michaelnmmeyer accept to track down and make a list of the meaningful differences, if I gave him access to iksvaku-inscriptions, so we can next freeze that repo, implement the same changes on erc-charma, and only use the latter versions of the EIAD files henceforward?

@michaelnmmeyer
Copy link
Member Author

@arlogriffiths

<term> occurs exclusively in files from tfb-eiad-epigraphy (about 40 of them in total), e.g. DHARMA_INSEIAD00002.

If you think the extra data in CIC and EIAD files is unnecessary, I can delete them. For the EIAD files, I can produce a diff of the files Axelle processed and the latest revision.

@arlogriffiths
Copy link
Collaborator

Thanks @michaelnmmeyer. I will take a look at those cases of <term>. I don't remember now why that element they would have been used there. I am now giving you access to iksvaku-inscriptions. Thanks for generating that diff!

About the removal of extra metadata from CIC and EIAD files, I'd like to have @danbalogh's advice. Can you look at a few files, Dan? In tfc-campa-epigraphy, converted files included DHARMA_INSCIC00001.xml, DHARMA_INSCIC00001.xml and DHARMA_INSCIC00064.xml. Thanks!

@danbalogh
Copy link
Collaborator

I would recommend against deleting any data that have already been encoded, unless we are very sure we don't need them. We still don't have a definitive setup for encoding roles and responsibilities in our DHARMA editions. We should perhaps try to sort that out in the EGD working group. At any rate, I think that until then the extra TEI header data in CIC and EIAD files should be either just ignored, or - if its presence bothers someone - commented out.

@arlogriffiths
Copy link
Collaborator

Thanks Dan. In the CIC files, there is stuff like <facsimile> referring to specific image files that we are not using in DHARMA, so I have trouble imagining any future use.

@michaelnmmeyer : do you think commenting out by machine is an option? or would you only be able to automate the process in case we opt for deletion?

The main issue indeed touches on encoding roles and editorial responsibilities. I think indeed it is a high priority to bring that discussion to a conclusion and I'd be happy if you could take the lead. We have potentially thousand of files to be revised on this matter once the decisions have been taken, so we'd better get our act together.

@danbalogh
Copy link
Collaborator

On roles and responsibilities, we need to sort out the missing details in my proposal in the Leftovers See also #242.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants