htr-united.yml

- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: TranscriboQuest 2024 Medieval Literary
  url: 10.5281/zenodo.13757440
  authors:
  - name: Jessie
    surname: Dummer
  - name: Emmanuelle
    surname: Kuhry
  - name: Zdzislaw
    surname: Koczarski
  - name: Sylvain
    surname: Besson
  - name: Caroline
    surname: Chevalier-Royet
    orcid: 0000-0002-7574-6742
  - name: Caroline
    surname: Vandyck
    roles:
    - project-manager
  institutions: []
  description: >-
    This dataset was created in the context of TranscriboQuest 2024 (Medieval
    Literary Team) held in Lyon (11/09/2024-13/09/2024). We opted to focus on
    medieval scientific documents that are damaged, in several different
    languages. The result is 808 lines transcribed by experts in the field. The
    dataset contains the images of the manuscripts and ALTO-XMLs.
  language:
  - lat
  - dum
  - fro
  - gmh
  production-software: eScriptorium + Kraken
  automatically-aligned: false
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '800'
    notAfter: '1500'
  hands:
    count: 1-per-folder
    precision: exact
  license:
    name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  volume:
  - metric: lines
    count: 800
  transcription-guidelines: CATMuS Guidelines (https://catmus-guidelines.github.io)
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: ÖNB, Cod. 3891. Ground Truth
  url: 10.5281/zenodo.7467249
  authors:
  - name: Ainonen
    surname: Tuija
    roles:
    - transcriber
  - name: Andresen
    surname: Suse
    roles:
    - transcriber
  - name: Bakker
    surname: Loïs
    roles:
    - transcriber
  - name: Boylan
    surname: Amy
    roles:
    - transcriber
  - name: Della Manna
    surname: Silvia
    roles:
    - transcriber
  - name: Dziemski
    surname: Wiktor
    orcid: 0000-0001-8166-2249
  - name: Henderson
    surname: C. E. M.
    orcid: 0000-0002-5040-9926
    roles:
    - transcriber
  - name: ' Impagnatiello'
    surname: Michele
    roles:
    - transcriber
  - name: Jenko Kovačič
    surname: Ana
    orcid: 0000-0001-7243-7082
    roles:
    - transcriber
  - name: Komatović
    surname: Stevan
    roles:
    - transcriber
  - name: Ku
    surname: Ruby Wai-Ying
    orcid: 0000-0003-2688-6287
    roles:
    - transcriber
  - name: Loss
    surname: Edward
    orcid: 0000-0002-9837-8321
    roles:
    - transcriber
  - name: Mairhofer
    surname: Daniela
    orcid: 0000-0002-3531-9658
    roles:
    - transcriber
    - project-manager
  - name: Morcos
    surname: Erene
    roles:
    - transcriber
  - name: Odstrčilík
    surname: Jan
    orcid: 0000-0001-9104-9827
    roles:
    - transcriber
  - name: Paternicò
    surname: Giuseppe
    orcid: 0000-0002-7124-8869
    roles:
    - transcriber
  - name: Riparante
    surname: Marta
    roles:
    - transcriber
  - name: Schimdt
    surname: Nathalie
    roles:
    - transcriber
  - name: Sołomieniuk
    surname: Michal
    roles:
    - transcriber
  - name: Walczak
    surname: 'Tomasz '
    roles:
    - transcriber
  - name: Zharov
    surname: Dmitry
    roles:
    - transcriber
  institutions: []
  description: >-
    The Ground Truth was produced by the participants of the HTR Winter School
    2022 in the Late Latin Group (more information:
    https://www.oeaw.ac.at/imafo/veranstaltungen/detail/introduction-into-handwritten-text-recognition).


    The Ground Thruth includes the following folios: 1-3r, 6-8, 11r, 27 and is
    still work in progress. We are adding more pages soon. If you find any errors
    we kindly ask you to contact Jan Odstrčilík (jan.odstrcilik@oeaw.ac.at).


    The Supervisors of the Late Latin Group: Jan Odstrčilík PhD, Austrian Acadamy
    of Sciences, Daniela Mairhofer PhD, Princeton University, Tobias Hodel PhD,
    University of Bern.
  project-name: HTR Winter School 2022, Vienna
  language:
  - lat
  production-software: Transkribus
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '1200'
    notAfter: '1299'
  hands:
    count: '1'
    precision: exact
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Page-XML
  volume:
  - metric: lines
    count: 952
  transcription-guidelines: |-
    Regular transcription with expansion of abbreviations. 
    - Normalization of J to I 
    - V to U in the vowel function, U to V in the consonant function
    - long S to S. 
    - No correction of mispellings (tagged in the ground truth)
    - No standardization of lower-case and upper-case letters
    - No added interpunction
  automatically-aligned: false
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Données vérité de terrain HTR+ Annuaire des propriétaires et des propriétés
    de Paris et du département de la Seine (1898-1923)
  url: http://dx.doi.org/10.34847/nkl.acb724xs
  project-name: "Groupe annuaires et adresses - Consortium Huma-num Paris Time Machine\n"
  project-website: https://paris-timemachine.huma-num.fr/groupe-adresses-et-annuaires/
  authors:
  - name: Elgarrista
    surname: Gabriela
    roles:
    - transcriber
    - quality-control
  - name: Mélanie-Becquet
    surname: Frédérique
    roles:
    - project-manager
    - quality-control
  - name: Brando
    surname: Carmen
    roles:
    - project-manager
    - quality-control
  description: "Annuaire des propriétaires et des propriétés de Paris et du département\
    \ de la Seine. Lien dans le catalogue de la BNF : https://catalogue.bnf.fr/ark:/12148/cb32697229h.\
    \ Crédits : Bibliothèque nationale de France. Données vérité de terrain résultant\
    \ de la transcription et la segmentation manuelle d’un échantillon de 169 pages\
    \ des annuaires appartenant aux volumes 1898 et 1923. Un modèle de transcription\
    \ HTR+ a été entrainé à partir de cet échantillon grâce à Transkribus et est disponible\
    \ sur cette plateforme en mode public. Ce modèle est valable pour transcrire automatiquement\
    \ les volumes de 1903 et 1913 et tout autre document imprimé à deux colonnes et\
    \ en utilisant l'alphabet latin et particulièrement en français. Le choix de l'échantillon\
    \ est fait par critère alphabétique car c'est le mode d'organisation de l'information\
    \ dans ce document. Les accolades présentes dans le document n'ont pas été segmentées.\
    \ 118 pages pour entrainer et 51 pages pour validation.\nContexte et financement\
    \ : Subvention DAHN (Dispositif de soutien à l'archivistique et aux humanités\
    \ numériques) par le MESRI. Equipes : Consortium Paris Time Machine - TGIR Humanum\
    \ EHESS / CNRS / LATTICE / INRIA Contact si besoin d'anonymiser les noms de personnes\
    \ : carmen.brando@ehess.fr.\n"
  language:
  - fra
  script:
  - iso: Latn
  script-type: only-typed
  time:
    notBefore: '1898'
    notAfter: '1923'
  hands:
    count: less-than-11
    precision: estimated
  license:
  - name: CC-BY-SA 4.0
    url: https://creativecommons.org/licenses/by-sa/4.0/
  format: Alto-XML
  volume:
  - count: 169
    metric: pages
  - count: 19022
    metric: lines
  - count: 641401
    metric: characters
  transcription-guidelines: "Transcription diplomatique. Les accolades n'ont pas été\
    \ segmentées.\n"
  production-software: Transkribus
  automatically-aligned: false
  _bibtex: "@misc{https://doi.org/10.34847/nkl.acb724xs,\n  doi = {10.34847/NKL.ACB724XS},\n\
    \  url = {https://nakala.fr/10.34847/nkl.acb724xs},\n  author = {Brando, Carmen\
    \ and Elgarrista, Gabriela and Mélanie-Becquet, Frédérique},\n  keywords = {Paris,\
    \ Historical source material, HTR, Transcripción, Apprentissage (intelligence\
    \ artificielle)},\n  language = {fr},\n  title = {Données vérité de terrain HTR+\
    \ Annuaire des propriétaires et des propriétés de Paris et du département de la\
    \ Seine (1898-1923)},\n  publisher = {NAKALA - https://nakala.fr (Huma-Num - CNRS)},\n\
    \  year = {2022},\n  copyright = {Creative Commons Attribution Non Commercial\
    \ Share Alike 4.0 International}\n}\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: >-
    University of Denver Jewish Consumptives Relief Society Medical Records   Training
    and Validation Set
  url: http://dx.doi.org/10.5281/zenodo.4243023
  authors:
  - name: Pham
    surname: Kim
    orcid: 0000-0002-9115-4739
    roles:
    - project-manager
  institutions: []
  description: >-
    Training and validation set. Transcribed records available upon request.

    The transcribed corpus of records from the Jewish Consumptive Relief Society
    contains data that include individually identifiable health information, among
    other sensitive information regarding persons and people.


    All individuals for whom records are provided have been deceased for at least
    70 years, but were they still living today, these records would be recognized
    as being protected health information under the US Health Insurance
    Portability and Accountability Act of 1996 (HIPAA).


    While HIPPA and other privacy laws no longer apply to these individuals, in
    providing these data the University of Denver wishes to foster research
    practices that express the utmost respect for the human beings whose lives are
    represented, at least in some part, in these collections. In addition, we ask
    researchers respect the lives of these individuals’ ancestors and their
    communities.


    To foster practices that honor patients, staff, nurses and physicians
    connected with the JCRS Sanitorium, as well as their families, ancestors and
    communities, we ask that researchers disclose their intended use of the
    collection for review by our Advisory Board (see reverse). This Board is
    comprised of ethicists, historians, librarians, attorneys, physicians, and
    members of the Jewish community.


    In addition, we ask researchers agree to conduct their work under the
    following set of principles:


    1. I affirm the role of JCRS patients and staff as data creators and will
    avoid exploiting and/or dehumanizing them by treating them simply as data.

    2. My research will, when possible and appropriate, account for the contexts
    surrounding the JCRS subjects as data arise. My work will recognize that all
    data and datasets are shaped by decisions about how histories are recorded,
    remembered, and valued.

    3. If the nature of my work is such that I am sharing the life stories and/or
    narratives of individuals in these data, and I can do so with no potential
    harm to their reputation or that of their ancestors, I will honor them by
    naming them. If the nature of my work is such that I am exploring large-scale
    patterns in the dataset, and naming individuals serves no specific research
    purpose, I will anonymize and/or redact names within the data. 

    4. If I am publishing the results of research conducted with these data, I
    will, if possible and appropriate, include a note of recognition and/or
    gratitude in my publication. We suggest a version of: “This work was made
    possible in part by the patients, staff, nurses, physicians, and community of
    the Jewish Consumptive Relief Society (JCRS). The people who lived, worked,
    and died at the JCRS sought to relieve human suffering. I am grateful to
    them.”
  project-name: >-
    Collections as Data - University of Denver Transcribing Handwritten Medical
    Records
  project-website: https://du-collections-as-data.netlify.app/
  language:
  - eng
  production-software: Transkribus
  script:
  - iso: Latn
  script-type: mainly-manuscript
  time:
    notBefore: '1900'
    notAfter: '1950'
  hands:
    count: unknown
    precision: estimated
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Page-XML
  volume:
  - metric: lines
    count: 36027
  - metric: characters
    count: 3494619
  - metric: files
    count: 2660
  - metric: regions
    count: 4254
  automatically-aligned: false
  _bibtex: "@misc{https://doi.org/10.5281/zenodo.4243023,\n  doi = {10.5281/ZENODO.4243023},\n\
    \  url = {https://zenodo.org/record/4243023},\n  author = {Pham, Kim},\n  title\
    \ = {University of Denver Collections as Data - HTR Train and Validation Set JCRS_2020_5_27},\n\
    \  publisher = {Zenodo},\n  year = {2020},\n  copyright = {Creative Commons Attribution\
    \ 4.0 International}\n}\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Ground truth data for printed Devanagari
  url: https://doi.org/10.11588/data/EGOKEI
  authors:
  - name: Nicole
    surname: Merkel-Hilf
    orcid: 0000-0002-0344-6169
    roles:
    - transcriber
    - project-manager
  - name: Daria
    surname: Peshcherova
    roles:
    - support
  institutions:
  - name: Heidelberg University Library
  description: >-
    Ground truth (GT) data (jpg and alto xml files) for an OCR model that
    recognizes printed text in Devanagari script.


    The GT data was trained on Transkribus with the HTR+ engine. The training was
    performed on appr. 220 pages with appr. 27,000 words. The validation set was
    10% of the training set.


    The training material is comprised of letterpress printings from the Naval
    Kishore Press (Lakhnau, North India) from the late 19th and early 20th century
    in the Hindi, Sanskrit, Braj Bhasha and Awadhi languages.


    Transcription was performed by Nicole Merkel-Hilf (CATS Library / Heidelberg
    University Library) with support by Daria Peshcherova (CATS Library /
    Heidelberg University Library).
  project-name: Naval Kishore Press - digital
  project-website: https://digi.ub.uni-heidelberg.de/en/sammlungen/suedasien/navalkishore.html
  language:
  - hin
  - san
  - bra
  production-software: Transkribus
  script:
  - iso: Deva
  script-type: only-typed
  time:
    notBefore: '1880'
    notAfter: '1953'
  hands:
    count: less-than-11
    precision: exact
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  volume:
  - metric: lines
    count: 4333
  transcription-guidelines: Diplomatic transcription, no correction of mispelling
  automatically-aligned: false
  _bibtex: "@misc{https://doi.org/10.11588/data/egokei,\n  doi = {10.11588/DATA/EGOKEI},\n\
    \  url = {https://heidata.uni-heidelberg.de/citation?persistentId=doi:10.11588/data/EGOKEI},\n\
    \  author = {Merkel-Hilf, Nicole},\n  title = {Ground Truth data for printed Devanagari},\n\
    \  publisher = {heiDATA},\n  year = {2022}\n}\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Ground Truth data for printed Malayalam
  url: https://doi.org/10.11588/data/L2KRZO
  authors: []
  institutions:
  - name: Tübingen University Library
    roles:
    - project-manager
  description: >-
    Ground Truth (GT) data (JPG and ALTO XML files) which can be used to train OCR
    models that recognize printed text in Malayalam script. The training material
    is gathered from 19th and 20th centuries prints.


    The GT data was trained in Transkribus with the HTR+ and the PyLaia engine
    with a resulting CER of 2.29% on validation set with HTR+ and 3,20% with
    PyLaia. The training was performed on 43 pages with appr. 9,000 words. The
    validation set consisted of 5 pages (ca. 1,000 words).


    Transcription was performed by Tübingen University Library, the Ground Truth
    data was created by Elena Mucciarelli (University of Groningen) with support
    and model training by Dorothee Huff (Tübingen University Library).
    (2022-11-02)
  project-name: DigitalSouthAsia
  project-website: http://idb.ub.uni-tuebingen.de/digitue/southasia
  language:
  - mal
  production-software: Transkribus
  script:
  - iso: Mlym
  script-type: only-typed
  time:
    notBefore: '1850'
    notAfter: '1996'
  hands:
    count: unknown
    precision: exact
  license:
    name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Page-XML
  volume:
  - metric: pages
    count: 43
  _bibtex: "@misc{https://doi.org/10.11588/data/l2krzo,\n  doi = {10.11588/DATA/L2KRZO},\n\
    \  url = {https://heidata.uni-heidelberg.de/citation?persistentId=doi:10.11588/data/L2KRZO},\n\
    \  author = {{Tübingen University Library}},\n  title = {Ground Truth data for\
    \ printed Malayalam},\n  publisher = {heiDATA},\n  year = {2023}\n}\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Incunabula Reichenau
  url: https://doi.org/10.5281/zenodo.11046061
  authors:
  - name: Annika
    surname: Stello
    orcid: 0000-0002-6305-4810
    roles:
    - project-manager
  - name: Gerit
    surname: Heim
    orcid: 0000-0002-5820-7771
    roles:
    - project-manager
  - name: Katharina
    surname: Ost
    orcid: 0000-0002-6234-9721
    roles:
    - transcriber
  institutions: []
  description: >-
    This data set contains the training data for the following three published
    Transkribus models\:

    German Incunabula (Reichenau)
    Latin Incunabula (Reichenau)
    Latin/German Bilingual Incunabula (Reichenau)

    This data set represents an excerpt of a collection of incunabula and post-incunabula
    of the   former Reichenau monastery, now held at the Badische Landesbibliothek
    in
    Karlsruhe (see https://digital.blb-karlsruhe.de/topic/view/7530707). As, typically,
    1-20 pages were drawn from single prints, it reflects a wide range of typefaces
    used
    by early printers from the German language area and Northern Italy.

    The data was created as part of the project Digitalisierung und Volltexterkennung
    der ehemals Reichenauer Inkunabeln at the Badische Landesbibliothek, which was
    funded by the Stiftung Kulturgut Baden-Württemberg.
  project-name: Digitalisierung und Volltexterkennung der ehemals Reichenauer Inkunabeln
  language:
  - lat
  - deu
  production-software: Transkribus
  automatically-aligned: false
  script:
  - iso: Latn
  - iso: Goth
  script-type: only-typed
  time:
    notBefore: '1470'
    notAfter: '1510'
  hands:
    count: more-than-10
    precision: exact
  license:
    name: CC-BY-SA 4.0
    url: https://creativecommons.org/licenses/by-sa/4.0/
  format: Page-XML
  volume:
  - metric: pages
    count: 2200
  transcription-guidelines: Abbreviations are represented through special characters,
    please see the project repository for a full documentation.
  _bibtex: "@misc{https://doi.org/10.5281/zenodo.11046061,\n  doi = {10.5281/ZENODO.11046061},\n\
    \  url = {https://zenodo.org/doi/10.5281/zenodo.11046061},\n  author = {{Badische\
    \ Landesbibliothek} and Ost, Katharina and Stello, Annika and Heim, Gerrit},\n\
    \  language = {de},\n  title = {Training Data Incunabula Reichenau},\n  publisher\
    \ = {Zenodo},\n  year = {2024},\n  copyright = {Creative Commons Attribution Share\
    \ Alike 4.0 International}\n}\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: TranscriboQuest_Arabic_team
  url: https://doi.org/10.5281/zenodo.13757236
  authors:
  - name: Ephrem Aboud
    surname: Ishac
    orcid: 0000-0003-2943-6556
    roles:
    - transcriber
    - aligner
    - quality-control
  - name: Enki
    surname: Baptiste
    orcid: 0009-0004-3456-9796
    roles:
    - transcriber
    - aligner
    - quality-control
  institutions: []
  description: 'Dataset on an Arabic corpus of Christian-Islamic theology. '
  project-name: TranscriboQuest 2024
  language:
  - ara
  production-software: eScriptorium + Kraken
  automatically-aligned: false
  script:
  - iso: Arab
  script-type: only-manuscript
  time:
    notBefore: '1200'
    notAfter: '1600'
  hands:
    count: 1-per-folder
    precision: estimated
  license:
    name: CC-BY-SA 4.0
    url: https://creativecommons.org/licenses/by-sa/4.0/
  format: Alto-XML
  volume:
  - metric: lines
    count: 153
  transcription-guidelines: >-
    ▶ Data format: XML ALTO

    ▶ Number of transcribed lines: 153

    ▶ author/creator/curator of the dataset: Enki Baptiste and Ephrem Aboud Ishac
    

    ▶ Segmentation tools, HTR engine and interface: OpenITI model
    (https://github.com/OpenITI/acdc_results/blob/main/models/gen2-print-n7m5-union-ft_best.mlmodel);
    eScriptorium; Kraken

    ▶ Language of the corpus, Date: Arabic, end of the 16th century

    ▶ Type, support of documents, script: paper; mashriqi naskh 

    ▶ Transcription method: diplomatic transcription respecting the tanwin, the
    shadda and the diacritic marks.

    ▶ Theme, collection, object of the dataset: theology; Maktabat al-Sālimī,
    Bidiyya, Oman, ms. AS 250 4v-5f
    (https://elibrary.mara.gov.om/en/omani-library/imam-nour-al-din-al-salmi-s-library/book/?id=324#book/7);
    St Mark Monastery, Jerusalem, SMMJ 00264 2v-5r
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Jeu de données OCR - Incunables sévillans 1494-1500
  url: https://doi.org/10.5281/zenodo.3643393
  authors:
  - name: Gille Levenson
    surname: Matthias
    orcid: 0000-0001-9488-5986
    roles:
    - transcriber
    - aligner
    - project-manager
  institutions: []
  description: >-
    The data set corresponds to 60 pages printed in 1494 by Estanislao Polono and
    Meinardo Ungut in Seville. These pages are taken from the Regimiento de los Prínçipes
    (also known as 'Glosa castellana al Regimiento de prínçipes'), and the exemplar
    used is the
     INC/901 of the Biblioteca Nacional de España. The type used for this incunabulum
    is 97G (Martín Abad and Moyano Andrés, Estanislao Polono, 2002, p. 61). This type
    was used between 1494 and 1500. For other incunabula produced in this period,
    see op. cit, p.112-121.
  language:
  - spa
  production-software: eScriptorium + Kraken
  script:
  - iso: Latn
  script-type: only-typed
  time:
    notBefore: '1494'
    notAfter: '1500'
  hands:
    count: '1'
    precision: exact
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  sources:
  - reference: >-
      Matthias Gille Levenson. (2022). Jeu de données de segmentation et de reconnaissance
      optique de caractères - Kraken - Incunables sévillans 1494-1500 (Version v5)
      [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7006981
    link: ''
  volume:
  - metric: lines
    count: 4836
  transcription-guidelines: >-
    Transcription diplomatique, sans normalisation, sans résolution d'abréviations
    ni corrections. 
  automatically-aligned: false
  _bibtex: "@misc{https://doi.org/10.5281/zenodo.3643393,\n  doi = {10.5281/ZENODO.3643393},\n\
    \  url = {https://zenodo.org/record/3643393},\n  author = {Levenson, Matthias\
    \ Gille},\n  keywords = {ocr, eScriptorium, kraken, incunabula, Gilles of Rome,\
    \ Estanislao Polono, Meinardo Ungut},\n  title = {Jeu de données de segmentation\
    \ et de reconnaissance optique de caractères - Kraken - Incunables sévillans 1494-1500},\n\
    \  publisher = {Zenodo},\n  year = {2022},\n  copyright = {Creative Commons Attribution\
    \ Non Commercial 4.0 International}\n}\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: 'Handwritten Text Recognition Ground Truth Set: StABS Ratsbücher O10, Urfehdenbuch
    X'
  url: https://doi.org/10.5281/zenodo.5153263
  authors:
  - name: Susanna
    surname: Burghartz
    roles:
    - project-manager
  - name: Calvi
    surname: Sonia
    roles:
    - project-manager
    - quality-control
  - name: Vogeler
    surname: Georg
    roles:
    - project-manager
  - name: Baur
    surname: Laila
    roles:
    - transcriber
  - name: Egli
    surname: Benedikt
    roles:
    - transcriber
  - name: Gehrig
    surname: Gabriela
    roles:
    - transcriber
  - name: Heini
    surname: Alexandra Isabelle
    roles:
    - transcriber
  - name: Rossi
    surname: Rosanna
    roles:
    - transcriber
  - name: Siegrist
    surname: Benjamin
    roles:
    - transcriber
  - name: Wasmer
    surname: Remo
    roles:
    - transcriber
  - name: Zimmermann
    surname: Lynn
    roles:
    - transcriber
  - name: Schoch
    surname: David
    roles:
    - aligner
  - name: Dängeli
    surname: Peter
    roles:
    - digitization
  - name: Hodel
    surname: Tobias
    roles:
    - project-manager
    - aligner
  description: Ground Truth for "Urfehdenbuch X der Stadt Basel (1563-1569)" at Staatsarchiv
    Basel-Stadt (StABS).
  project-website: hdl:11471/1010.2.1
  language:
  - deu
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '1563'
    notAfter: '1569'
  hands:
    count: unknown
    precision: estimated
  license:
  - name: CC-BY-SA 4.0
    url: https://creativecommons.org/licenses/by-sa/4.0/
  format: Page-XML
  volume:
  - metric: lines
    count: 8000
  transcription-guidelines: 'See: http://gams.uni-graz.at/o:ufbas.1563'
  production-software: Transkribus
  automatically-aligned: false
  _bibtex: "@misc{https://doi.org/10.5281/zenodo.5153263,\n  doi = {10.5281/ZENODO.5153263},\n\
    \  url = {https://zenodo.org/record/5153263},\n  author = {Hodel, Tobias and Schoch,\
    \ David and Dängeli, Peter},\n  keywords = {Handwritten Text Recognition, Ground\
    \ Truth, Early Modern German Kurrent},\n  language = {de},\n  title = {Handwritten\
    \ Text Recognition Ground Truth Set: StABS Ratsbücher O10, Urfehdenbuch X},\n\
    \  publisher = {Zenodo},\n  year = {2021},\n  copyright = {Creative Commons Attribution\
    \ Non Commercial Share Alike 4.0 International}\n}\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Charters and Records of Königsfelden Abbey and Bailiwick (1308-1662)
  url: https://doi.org/10.5281/zenodo.5179361
  authors:
  - name: Hodel
    surname: Tobias
    roles:
    - transcriber
    - project-manager
    - support
  - name: Halter-Pernet
    surname: Colette
    roles:
    - transcriber
    - aligner
    - project-manager
    - quality-control
    - digitization
    - support
  - name: Teuscher
    surname: Simon
    roles:
    - project-manager
  description: The data set is the publication of the data of the scholarly edition
    "Urkunden und Akten des Klosters und der Hofmeisterei Königsfelden".
  project-website: https://www.koenigsfelden.uzh.ch/
  language:
  - lat
  - deu
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '1292'
    notAfter: '1570'
  hands:
    count: more-than-10
    precision: estimated
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Page-XML
  volume:
  - metric: lines
    count: 60000
  transcription-guidelines: 'See: https://www.koenigsfelden.uzh.ch/exist/apps/ssrq/intro.html#richtlinien'
  production-software: Transkribus
  automatically-aligned: false
  _bibtex: "@misc{https://doi.org/10.5281/zenodo.5179361,\n  doi = {10.5281/ZENODO.5179361},\n\
    \  url = {https://zenodo.org/record/5179361},\n  author = {Halter-Pernet, Colette\
    \ and Teuscher, Simon and Hodel, Tobias and Barwitzki, Lukas and Egloff, Salome\
    \ and Henggeler, Fabian and Nadig, Michael and Steinmann, Anina and Stettler,\
    \ Sabine and Prada Ziegler, Ismail},\n  keywords = {Scholarly Edition, Monastery,\
    \ Königsfelden Abbey, Poor Clares, Franciscan Friars, Hapsburg, Handwritten Text\
    \ Recognition},\n  title = {Charters and Records of Königsfelden Abbey and Bailiwick\
    \ (1308-1662)},\n  publisher = {Zenodo},\n  year = {2021},\n  copyright = {Creative\
    \ Commons Attribution 4.0 International}\n}\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: >-
    GT and HTR of VOC (Dutch East-Asia Company), WIC (Dutch West-Asia Company) and
    notarial deeds.
  url: https://doi.org/10.5281/zenodo.6414086
  authors:
  - name: Keijser
    surname: Liesbeth
    roles:
    - transcriber
    - project-manager
  - name: Noppe
    surname: Vincent
  institutions:
  - name: National Archive Netherlands / Nationaal Archief
    roles:
    - digitization
    - support
  description: >-
    6000 ground truth of VOC and notarial deeds and 3.000.000 HTR of VOC, WIC and
    notarial deeds

    The National Archives of the Netherlands and Noord-Hollands Archief conducted
    a project using the Transkribus HTR (Handwritten Text Recognition) platform.
    The aim was to semi automatically transcribe 2 million pages of old Dutch
    texts.


    The transcribed archives are 17th and 18th century documents from the Dutch
    East-Asia Company (VOC). And 19th century notarial deeds from Noord-Hollands
    Archief and other archives in the provinces.


    In order to train the HTR software a team produced transcriptions of
    approximately 6000 scans. The scans are randomly selected from the dataset and
    contain hundreds of hands. With these transcriptions a model is trained that
    can recognize more than 90% of the characters correctly. Transkribus
    transcribed the 2 million scans automatically using the trained model.


    Later on, 1 million extra scans concerning the West India Company (WIC) were
    transcribed automatically without adding extra ground truth or training. These
    archives are from the 17th and 18th century.


    The datasets published in Zenodo contain the ground truth (scans in JPG,
    transcription in PAGE XML) and the HTR results (in PAGE XML and TXT). See the
    overview on the Zenodo page.


    A specification on which archives have been transcribed (both GT and HTR) can
    be found on the Zenodo.


    For open data access of scans and inventories of the National Archives click
    here:
    https://www.nationaalarchief.nl/onderzoeken/open-data/archiefinventarissen-digitale-objecten-en-scans-van-archieven
    

    Disclaimer: due to a variety of languages used and the bad state of the
    documents the HTR results of "1.05.21, Dutch series Guyana" can be of poor
    quality.
  project-name: De ijsberg zichtbaar maken
  project-website: >-
    https://www.nationaalarchief.nl/beleven/nieuws/kijk-symposium-de-ijsberg-zichtbaar-maken-terug#:~:text=In%20het%20project%20De%20IJsberg,de%20website%20zoekintranscripties.nl%20ontwikkeld.
  language:
  - nld
  production-software: Transkribus
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '1600'
    notAfter: '1899'
  hands:
    count: more-than-10
    precision: estimated
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Page-XML
  volume:
  - metric: pages
    count: 6000
  - {count: 251889, metric: lines}
  - {count: 6350, metric: files}
  - {count: 10735, metric: regions}
  - {count: 24432166, metric: characters}
  automatically-aligned: false
  _bibtex: "@misc{https://doi.org/10.5281/zenodo.6414086,\n  doi = {10.5281/ZENODO.6414086},\n\
    \  url = {https://zenodo.org/doi/10.5281/zenodo.6414086},\n  author = {Liesbeth\
    \ Keijser, },\n  keywords = {Transciptions, Verenigde Oost-Indische Compagnie,\
    \ West-Indische Compagnie, Notarial deeds, Nationaal Archief, Noord-Hollands Archief,\
    \ Transkribus},\n  language = {odt},\n  title = {6000 ground truth of VOC and\
    \ notarial deeds 3.000.000 HTR of VOC, WIC and notarial deeds},\n  publisher =\
    \ {Zenodo},\n  year = {2020},\n  copyright = {Creative Commons Attribution 4.0\
    \ International}\n}\n"

- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: 'Dataset for late medieval Castilian text recognition '
  url: https://doi.org/10.5281/zenodo.7386489
  authors:
  - name: Gille Levenson
    surname: Matthias
    orcid: 0000-0001-9488-5986
    roles:
    - transcriber
    - quality-control
  institutions: []
  description: >-
    HTR/OCR open access gold corpus for spanish late medieval sources, based

    on the allographetic transcription of more than 300 pages of several
    manuscripts of the Regimiento de los
    Prínçipes, as well as a first set of general transcription models trained with
    kraken and out-of-domain test data. See https://doi.org/10.5281/zenodo.7387376
    for full description of the dataset.
  language:
  - spa
  production-software: eScriptorium + Kraken
  script:
  - iso: Latn
  script-type: mainly-manuscript
  time:
    notBefore: '1300'
    notAfter: '1500'
  hands:
    count: more-than-10
    precision: estimated
  license:
  - name: CC-BY-SA 4.0
    url: https://creativecommons.org/licenses/by-sa/4.0/
  format: Alto-XML
  volume:
  - metric: lines
    count: 28000
  transcription-guidelines: >-
    Allographetic transcription. See the article
    (https://doi.org/10.5281/zenodo.7387376) for full transcription guidelines.

    320 pages in-domain; 40 pages out-of-domain

  automatically-aligned: false
  _bibtex: "@misc{https://doi.org/10.5281/zenodo.7386489,\n  doi = {10.5281/ZENODO.7386489},\n\
    \  url = {https://zenodo.org/doi/10.5281/zenodo.7386489},\n  author = {Matthias\
    \ Gille Levenson, },\n  keywords = {OCR, HTR, dataset, allographetic, medieval\
    \ castilian},\n  language = {en},\n  title = {Towards a general open dataset and\
    \ model for late medieval Castilian text recognition (HTR/OCR). Datasets and scripts},\n\
    \  publisher = {Zenodo},\n  year = {2023},\n  copyright = {Creative Commons Attribution\
    \ Non Commercial Share Alike 4.0 International}\n}\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: 'Klosterneuburg, Stiftsbibl., Cod. 48 - Ground Truth: Initial Release'
  url: https://doi.org/10.5281/zenodo.7466927
  authors:
  - name: Berger
    surname: Michael
    orcid: 0000-0002-6627-5272
  - name: Bolte
    surname: Henrike
  - name: Führer
    surname: Veronika
    orcid: 0000-0003-3145-4083
  - name: Hausleitner
    surname: Felix
    orcid: 0000-0002-9788-8127
  - name: Hutterer
    surname: Sarah
  - name: Lüthi
    surname: Tim
    orcid: 0000-0003-1925-7175
  - name: Nancu
    surname: Mihaela
  - name: Passoni
    surname: Erica
  - name: Pataki
    surname: Katalin
    orcid: 0000-0003-0331-8295
  - name: Schröcksnadel
    surname: Sophie
  - name: Verri
    surname: Giovanni
    orcid: 0000-0002-1297-2152
  - name: Wegener
    surname: Dennis
    orcid: 0000-0002-9410-9191
  institutions: []
  description: >-
    This is ground truth for the vast collection of sermons of Nikolaus von
    Dinkelsbühl (ca. 1360 to 17th March 1433), translated and reorganised by a
    German redactor, from the 15th century has never been edited until now. It
    consists of 361 folios of parchment and paper. The text speaks about various
    topics such as fasting and other religious practices. Being one of the leading
    intellectuals of his time, Nikolaus von Dinkelsbühl also contributed to the
    development of the University of Vienna. The manuscript was probably produced
    in the vicinity of Klosterneuburg in Austria and is still kept there today
    (Shelfmark: Cod. 48).


    Data collection and ground truth creation:


    The edition at hand was produced by an international team of researchers from
    various fields in the context of the Vienna HTR Winter School 2022 with the
    help of Transkribus Expert Client.


    We uploaded the images of the manuscript into the Transkribus platform,
    applied the line recognition tool and manually copied the transcribed text
    lines into the recognised line boxes. Various models were trained with the
    ground truth (20% of the entire codex) created by the team.


    Images of the Klosterneuburg, Augustiner-Chorherrenstift, Cod. 48 are
    available at: https://manuscripta.at/diglit/AT5000-48/0001
  project-name: HTR Winter School 2022, Vienna
  language:
  - gmh
  production-software: Transkribus
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '1440'
    notAfter: '1449'
  hands:
    count: '1'
    precision: exact
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  volume:
  - metric: pages
    count: 68
  - metric: lines
    count: 4605
  automatically-aligned: false
  _bibtex: "@misc{https://doi.org/10.5281/zenodo.7466927,\n  doi = {10.5281/ZENODO.7466927},\n\
    \  url = {https://zenodo.org/record/7466927},\n  author = {Berger, Michael and\
    \ Bolte, Henrike and Führer, Veronika and Hausleitner, Felix and Hutterer, Sarah\
    \ and Lüthi, Tim and Nancu, Mihaela and Passoni, Erica and Pataki, Katalin and\
    \ Schröcksnadel, Sophie and Verri, Giovanni and Wegener, Dennis and Hofert, Sandra},\n\
    \  keywords = {Digital Humanities, Handwritten Text Recognition, German, Nikolaus-von-Dinkelsbühl-Redaktor},\n\
    \  title = {Klosterneuburg, Stiftsbibl., Cod. 48 - Ground Truth: Initial Release},\n\
    \  publisher = {Zenodo},\n  year = {2022},\n  copyright = {Creative Commons Attribution\
    \ 4.0 International}\n}\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: 'GT4HistCommentLayout: Layout Ground Truth for Historical Commentaries'
  url: https://github.com/AjaxMultiCommentary/GT-commentaries-OLR
  authors:
  - name: Matteo
    surname: Romanello
    orcid: 0000-0002-7406-6286
    roles:
    - project-manager
  - name: Sven
    surname: Najem-Meyer
    orcid: 0000-0002-3661-4579
    roles:
    - transcriber
    - quality-control
  - name: Carla
    surname: Amaya
    roles:
    - transcriber
  description: 'This dataset contains layout annotations for ca. 370 pages sampled
    from 8 public domain classical commentaries, published in the 19th century in
    English, German and Latin. The commentaries concern Ancient Greek and Latin works
    from prose and poetry (caveat: AGreek poetry is slightly over-represented). Pages
    were annotated according to a taxonomy mapped to the SegmOnto controlled vocabulary.'
  project-name: Ajax Multi-Commentary
  project-website: https://mromanello.github.io/ajax-multi-commentary/
  language:
  - eng
  - deu
  - lat
  - grc
  production-software: Kraken + VGG Image Annotator (VIA)
  script:
  - iso: Latn
  - iso: Grek
  script-type: only-typed
  time:
    notBefore: '1835'
    notAfter: '1903'
  hands:
    count: '1'
    precision: exact
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  volume:
  - metric: characters
    count: 0
  - metric: files
    count: 371
  - metric: lines
    count: 0
  - metric: regions
    count: 2386
  transcription-guidelines: SegmOnto guidelines (v. 0.9)
  citation-file-link: https://github.com/AjaxMultiCommentary/GT-commentaries-layout/blob/master/CITATION.cff
  characters:
    mode: NFD
    members: []
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Matteo and Najem-Meyer, Sven and Amaya,\
    \ Carla},\ndoi = {10.5281/zenodo.7271729},\ntitle = {GT4HistCommentLayout: Layout\
    \ Ground Truth for Historical Commentaries}\n}\n"
  _apa: "Matteo, Najem-Meyer S., Amaya C. GT4HistCommentLayout: Layout Ground Truth\
    \ for Historical Commentaries (version 1.0). DOI: 10.5281/zenodo.7271729\n"
- authors:
  - name: Alexandre
    orcid: 0009-0007-4781-3294
    roles:
    - aligner
    - quality-control
    surname: Matos
  - name: Rui
    orcid: 0000-0001-5767-1583
    roles:
    - transcriber
    surname: Neves
  - name: Gonçalo
    roles:
    - transcriber
    surname: Monteiro
  - name: Catarina
    roles:
    - transcriber
    surname: Coelho
  - name: Pedro
    orcid: 0009-0004-9005-6688
    roles:
    - aligner
    surname: Bastos
  automatically-aligned: false
  description: >-
    This dataset was designed for training machine learning models in the context
    of the [iForal project](https://iforal.hypotheses.org/), which focuses on
    transcribing medieval Portuguese texts, specifically forais (charters). It
    includes images of medieval manuscripts, along with corresponding line-level
    transcription labels, to facilitate the development of models capable of
    recognizing and transcribing historical handwriting.

    The dataset is ideal for OCR/HTR tasks and segmentation tasks within the
    domain of medieval document transcription. It serves as a critical resource
    for advancing automated transcription tools for medieval texts, making
    historical archives more accessible.
  format: Page-XML
  hands:
    count: unknown
    precision: exact
  institutions: []
  language:
  - lat
  - por
  license:
    name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: iForal
  project-website: https://iforal.hypotheses.org/
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notAfter: '1491'
    notBefore: '1217'
  title: iForal-Dataset
  url: https://github.com/Arch-W/iForal-Dataset
  volume:
  - count: 776873
    metric: characters
  - count: 180
    metric: files
  - count: 8009
    metric: lines
  - count: 183
    metric: regions
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Fabliaux
  url: https://github.com/CIHAM-HTR/Fabliaux
  authors:
  - name: Corinne
    surname: Pierreville
    orcid: 0009-0003-3074-3841
    roles:
    - project-manager
  - name: Ariane
    surname: Pinche
    orcid: 0000-0002-7843-5050
    roles:
    - transcriber
    - aligner
    - quality-control
  institutions: []
  description: HTR data sets from medieval manuscripts (13th-14th c.) collecting "fabliaux"
    funded by Biblissima+
  project-website: https://projet.biblissima.fr/fr/appels-projets/projets-retenus/fabliaux
  language:
  - fro
  production-software: eScriptorium + Kraken
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '1200'
    notAfter: '1402'
  hands:
    count: 1-per-folder
    precision: exact
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  citation-file-link: https://github.com/CIHAM-HTR/Fabliaux/blob/master/CITATION.cff
  transcription-guidelines: The data follow the standards recommended by the CREMMALAB
    project, see Ariane Pinche. Transcription Guide for 10th to 15th Century Manuscripts.
    2022. ⟨hal-03697382⟩
  volume:
  - metric: characters
    count: 44963
  - metric: files
    count: 25
  - metric: lines
    count: 2070
  - metric: regions
    count: 94
  characters:
    mode: NFD
    members:
    - e
    - i
    - s
    - a
    - t
    - u
    - o
    - n
    - r
    - l
    - m
    - c
    - d
    - ̃
    - p
    - f
    - h
    - b
    - ⁊
    - g
    - .
    - q
    - z
    - ̾
    - Q
    - ꝑ
    - S
    - x
    - I
    - L
    - D
    - C
    - ͥ
    - E
    - A
    - ꝰ
    - T
    - k
    - ꝯ
    - M
    - N
    - O
    - P
    - U
    - ͣ
    - y
    - F
    - '9'
    - Ꝙ
    - B
    - G
    - J
    - '1'
    - /
    - ẜ
    - ł
    - ⟦
    - ⟧
    - ᷑
    - R
    - '7'
    - H
    - "'"
    - ͤ
    - w
    - ':'
    - '4'
    - '0'
    - '6'
    - '8'
    - '5'
    - K
    - 
    - ͦ
    - v
    - ͫ
    - V
    - ᷤ
    - ⁜
    - '3'
    - đ
    - X
    - ‸
    - ᷠ
    - '2'
    - ꝓ
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Pinche, Ariane and Pierreville, Corinne},\n\
    month = {4},\ntitle = {Fabliaux},\nurl = {https://github.com/CIHAM-HTR/Fabliaux/data},\n\
    year = {2023}\n}\n"
  _apa: "Pinche A., Pierreville C. (2023). Fabliaux URL: https://github.com/CIHAM-HTR/Fabliaux/data\n"
- authors:
  - name: Davide
    roles:
    - transcriber
    - aligner
    surname: Aruta
  - name: Martina
    roles:
    - transcriber
    - aligner
    surname: Lenzi
  - name: Armelle
    orcid: 0000-0001-7938-2686
    roles:
    - transcriber
    - aligner
    surname: Le Huërou
  - name: Marylène
    orcid: 0000-0002-9250-370X
    roles:
    - project-manager
    surname: Possamaï
  - name: Ariane
    orcid: 0000-0002-7843-5050
    roles:
    - quality-control
    surname: Pinche
  characters:
    members:
    - e
    - i
    - u
    - s
    - a
    - t
    - n
    - r
    - o
    - l
    - c
    - m
    - d
    - p
    - .
    - q
    - ̃
    - g
    - b
    - f
    - z
    - h
    - y
    - x
    - '-'
    - ͥ
    - ͣ
    - ⁊
    - E
    - ¶
    - ̾
    - ꝙ
    - C
    - ꝰ
    - ͦ
    - ꝑ
    - S
    - ꝓ
    - Q
    - H
    - ꝯ
    - I
    - M
    - ͭ
    - '2'
    - L
    - ͫ
    - D
    - ꝵ
    - T
    - ͨ
    - A
    - ł
    - ͬ
    - ͤ
    - ᷑
    - N
    - O
    - U
    - P
    - R
    - ħ
    - ':'
    - F
    - ꝭ
    - '7'
    - ᵈ
    - 
    - '3'
    - ⟦
    - ⟧
    - Y
    - ͧ
    - đ
    - G
    - '1'
    - '9'
    - B
    - ','
    - Ꝙ
    mode: NFD
  citation-file-link: https://github.com/CIHAM-HTR/Liber/blob/main/CITATION.cff
  description: HTR datasets of medieval manuscripts (14th-15th c.) with Pierre Bersuire’s
    translation into Old French of the work of Titus Livius and Nicolas Trevet Commentaries
  format: Alto-XML
  hands:
    count: '1'
    precision: estimated
  institutions: []
  language:
  - fro
  - lat
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-website: https://anr.fr/Projet-ANR-21-CE27-0008
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-manuscript
  sources:
  - link: https://github.com/CIHAM-HTR/Liber
    reference: Aruta, D., Lenzi, M., Le Huërou, A., Possamaï, M., & Pinche, A. (2023).
      Liber [Data set]. https://github.com/CIHAM-HTR/Liber/data
  time:
    notAfter: '1400'
    notBefore: '1300'
  title: Liber
  transcription-guidelines: 'Data follow the standards recommended by the CREMMA projects,
    see Ariane Pinche. Transcription Guide for 10th to 15th Century Manuscripts. 2022.
    hal-03697382 - and Thibault Clérice, Malamatenia Vlachou-Efstathiou, Alix Chagué.
    CREMMA Medii Aevi: Literary manuscript text recognition in Latin. Journal of Open
    Humanities Data, 2023, 9, pp.4. ⟨10.5334/johd.97⟩. ⟨hal-03828353v5⟩'
  url: https://github.com/CIHAM-HTR/Liber
  volume:
  - count: 134899
    metric: characters
  - count: 37
    metric: files
  - count: 3789
    metric: lines
  - count: 152
    metric: regions
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Aruta, Davide and Lenzi, Martina and\
    \ Le Huërou, Armelle and Possamaï, Marylène and Pinche, Ariane},\nmonth = {4},\n\
    title = {Liber},\nurl = {https://github.com/CIHAM-HTR/Liber/data},\nyear = {2023}\n\
    }\n"
  _apa: "Aruta D., Lenzi M., Le Huërou A., Possamaï M., Pinche A. (2023). Liber URL:\
    \ https://github.com/CIHAM-HTR/Liber/data\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: FoNDUE Spanish chapbooks 19th c. Dataset
  url: https://github.com/DesenrollandoElCordel/FoNDUE-Spanish-chapbooks-Dataset
  authors:
  - name: Carta
    surname: Constance
    roles:
    - transcriber
    - project-manager
  - name: Leblanc
    surname: Élina
    roles:
    - digitization
  - name: Jacsont
    surname: Pauline
    roles:
    - digitization
  - name: Palacios
    surname: Belinda
    roles:
    - transcriber
    - quality-control
  - name: Bermudez
    surname: Luana
    roles:
    - transcriber
    - quality-control
  description: Digital editions of the second part of the Genevan Spanish chapbooks
    collection (19th c.).
  project-name: Desenrollando El Cordel
  project-website: https://github.com/DesenrollandoElCordel
  language:
  - cat
  - spa
  - lat
  script:
  - iso: Latn
  script-type: only-typed
  time:
    notBefore: '1770'
    notAfter: '1920'
  hands:
    count: more-than-10
    precision: exact
  license:
  - name: CC-BY-SA 4.0
    url: https://creativecommons.org/licenses/by-sa/4.0/
  format: Alto-XML
  sources:
  - reference: ''
    link: https://unige.swisscovery.slsp.ch/permalink/41SLSP_UGE/btt5ev/alma991008229029705502
  - reference: ''
    link: https://unige.swisscovery.slsp.ch/permalink/41SLSP_UGE/kjkm12/alma991002834309705502
  volume:
  - metric: characters
    count: 270718
  - metric: lines
    count: 12526
  - metric: pages
    count: 198
  citation-file-link: https://github.com/DesenrollandoElCordel/FoNDUE-Spanish-chapbooks-Dataset/blob/main/Grountruth/CITATION.cff
  transcription-guidelines: "Les règles de transcription suivante ont été adoptées\
    \ :\n- Respecter les accents ;\n- Respecter la casse ;\n- Respecter la ponctuation\
    \ ;\n- Respecter les espaces ;\n- Respecter les retours à la ligne ;\n- Respecter\
    \ la graphie des mots (ne pas corriger les erreurs s’il y en a) ;\n- Supprimer\
    \ le bruit (tâches qui ont été prises pour du texte par l’OCR)."
  production-software: eScriptorium + Kraken
  automatically-aligned: false
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: EHRI Dataset
  url: https://github.com/FloChiff/ehri-dataset
  project-name: >
    European Holocaust Research Infrastructure
  project-website: https://www.ehri-project.eu/
  authors:
  - name: Floriane
    surname: Chiffoleau
    roles:
    - transcriber
  - name: Sarah
    surname: Beniere
    roles:
    - transcriber
  - name: Michal
    surname: Frankl
    roles:
    - transcriber
  - name: Wolfgang
    surname: Schellenbacher
    roles:
    - transcriber
  - name: Zoltán
    surname: Vági
    roles:
    - transcriber
  - name: Gábor
    surname: Kádár
    roles:
    - transcriber
  - name: Magdalena
    surname: Sedlická
    roles:
    - transcriber
  - name: Miriam
    surname: Schulz
    roles:
    - transcriber
  - name: Christine
    surname: Schmidt
    roles:
    - transcriber
  - name: Jessica
    surname: Green
    roles:
    - transcriber
  - name: Martina
    surname: Ravagnan
    roles:
    - transcriber
  - name: Daniela
    surname: Bartáková
    roles:
    - transcriber
  - name: Judith
    surname: Levin
    roles:
    - transcriber
  - name: Daphna
    surname: Sehayek
    roles:
    - transcriber
  - name: Michał
    surname: Czajka
    roles:
    - transcriber
  - name: Marta
    surname: Wojas
    roles:
    - transcriber
  - name: Dagmara
    surname: Chełstowska
    roles:
    - transcriber
  - name: Winfried
    surname: Garscha
    roles:
    - transcriber
  - name: Claudia
    surname: Kuretsidis-Haider
    roles:
    - transcriber
  description: >
    Multilingual dataset from various corpus of the EHRI project 
  language:
  - eng
  - ces
  - deu
  - slk
  - hun
  - pol
  - dan
  script:
  - iso: Latn
  script-type: only-typed
  time:
    notBefore: '1936'
    notAfter: '1958'
  hands:
    count: unknown
    precision: estimated
  license:
  - {name: CC-BY 4.0, url: https://creativecommons.org/licenses/by/4.0/}
  format: Alto-XML
  volume:
  - metric: files
    count: 252
  - metric: characters
    count: 540645
  - metric: lines
    count: 9203
  production-software: Unknown [Automatically filled]
  automatically-aligned: false
- authors:
  - name: Simon
    orcid: 0000-0001-9094-4475
    roles:
    - transcriber
    - project-manager
    - quality-control
    - support
    surname: Gabay
  - name: Jessica
    roles:
    - transcriber
    surname: Da Silva Fernandes
  - name: Myriam
    roles:
    - transcriber
    surname: Perregaux
  automatically-aligned: false
  characters:
    members:
    - e
    - t
    - o
    - n
    - a
    - i
    - r
    - s
    - h
    - d
    - l
    - c
    - u
    - m
    - f
    - g
    - p
    - ','
    - y
    - w
    - b
    - v
    - .
    - k
    - '1'
    - I
    - ¬
    - C
    - S
    - T
    - '-'
    - '9'
    - A
    - ;
    - '8'
    - M
    - x
    - '4'
    - '2'
    - /
    - '6'
    - N
    - G
    - R
    - D
    - q
    - '0'
    - '"'
    - H
    - E
    - '5'
    - z
    - P
    - W
    - U
    - '7'
    - (
    - j
    - )
    - '3'
    - B
    - "'"
    - ’
    - L
    - ':'
    - Y
    - O
    - V
    - Q
    - –
    - '?'
    - F
    - J
    - '!'
    - K
    - “
    - '['
    - ']'
    - X
    - Z
    - ́
    - ”
    - —
    mode: NFD
  citation-file-link: https://github.com/FoNDUE-HTR/FONDUE-EN-PRINT-20/blob/master/CITATION.cff
  description: Various prints (academic, archives, novels…)
  format: Alto-XML
  hands:
    count: unknown
    precision: exact
  institutions: []
  language:
  - eng
  license:
    name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: FoNDUE
  project-website: https://github.com/FoNDUE-HTR
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-typed
  time:
    notAfter: '1900'
    notBefore: '1999'
  title: FONDUE-EN-PRINT-20
  transcription-guidelines: SegmOnto
  url: https://github.com/FoNDUE-HTR/FONDUE-EN-PRINT-20
  volume:
  - count: 82834
    metric: characters
  - count: 30
    metric: files
  - count: 1728
    metric: lines
  - count: 72
    metric: regions
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Gabay, Simon and Perregaux, Myriam\
    \ and Da Silva Fernandes, Jessica},\nmonth = {12},\ntitle = {FONDUE-EN-PRINT-20},\n\
    url = {https://github.com/FoNDUE-HTR/FONDUE-EN-PRINT-20},\nyear = {2023}\n}\n"
  _apa: "Gabay S., Perregaux M., Da Silva Fernandes J. (2023). FONDUE-EN-PRINT-20\
    \ (version 1.0). URL: https://github.com/FoNDUE-HTR/FONDUE-EN-PRINT-20\n"
- authors:
  - name: Simon
    orcid: 0000-0001-9094-4475
    roles:
    - transcriber
    - project-manager
    - quality-control
    - support
    surname: Gabay
  - name: Carmen
    orcid: 0009-0004-1508-9076
    roles:
    - transcriber
    surname: Carrasco Luján
  automatically-aligned: false
  characters:
    members:
    - e
    - a
    - o
    - s
    - n
    - r
    - i
    - l
    - d
    - u
    - t
    - c
    - m
    - p
    - .
    - ́
    - ','
    - b
    - g
    - y
    - q
    - h
    - v
    - ¬
    - f
    - j
    - z
    - A
    - E
    - ;
    - –
    - '!'
    - ̃
    - S
    - x
    - I
    - P
    - C
    - L
    - B
    - U
    - D
    - R
    - ':'
    - T
    - '?'
    - O
    - N
    - '0'
    - H
    - Y
    - ¿
    - M
    - V
    - ¡
    - '1'
    - J
    - '2'
    - —
    - '"'
    - G
    - F
    - k
    - '8'
    - '7'
    - '4'
    - '5'
    - '-'
    - Q
    - '6'
    - '3'
    - ̀
    - K
    - '9'
    - (
    - )
    - ̈
    - X
    - »
    - W
    - '['
    - ']'
    - Z
    - '&'
    - w
    - '*'
    - §
    -  
    - °
    - ǝ
    - «
    mode: NFD
  citation-file-link: https://github.com/FoNDUE-HTR/FONDUE-ES-PRINT-19/blob/master/CITATION.cff
  description: Novels written in Spanish
  format: Alto-XML
  hands:
    count: unknown
    precision: exact
  institutions: []
  language:
  - spa
  license:
    name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: FoNDUE
  project-website: https://github.com/FoNDUE-HTR
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-typed
  time:
    notAfter: '1800'
    notBefore: '1899'
  title: FONDUE-ES-PRINT-19
  transcription-guidelines: SegmOnto
  url: https://github.com/FoNDUE-HTR/FONDUE-ES-PRINT-19
  volume:
  - count: 64038
    metric: characters
  - count: 48
    metric: files
  - count: 1668
    metric: lines
  - count: 129
    metric: regions
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Gabay, Simon and Carrasco Luján, Carmen},\n\
    month = {2},\ntitle = {FONDUE-ES-PRINT-19},\nurl = {https://github.com/FoNDUE-HTR/FONDUE-ES-PRINT-19},\n\
    year = {2024}\n}\n"
  _apa: "Gabay S., Carrasco Luján C. (2024). FONDUE-ES-PRINT-19 (version 1.0). URL:\
    \ https://github.com/FoNDUE-HTR/FONDUE-ES-PRINT-19\n"
- authors:
  - name: Peter
    roles:
    - transcriber
    surname: Nahon
  - name: Marco
    roles:
    - transcriber
    surname: Cicchini
  - name: Yvan
    roles:
    - transcriber
    surname: Jaureguy
  - name: Simon
    orcid: 0000-0001-9094-4475
    roles:
    - transcriber
    - project-manager
    - quality-control
    - support
    surname: Gabay
  - name: Loraine
    orcid: 0000-0002-9598-9151
    roles:
    - transcriber
    surname: Chappuis
  automatically-aligned: false
  characters:
    members:
    - e
    - a
    - s
    - r
    - t
    - n
    - u
    - i
    - o
    - l
    - d
    - c
    - m
    - p
    - v
    - ́
    - .
    - ','
    - q
    - h
    - f
    - g
    - b
    - "'"
    - y
    - L
    - M
    - C
    - S
    - x
    - j
    - E
    - '1'
    - z
    - ̀
    - I
    - ’
    - ̂
    - '2'
    - J
    - +
    - D
    - V
    - ¬
    - ʳ
    - ^
    - P
    - ':'
    - '4'
    - '3'
    - X
    - R
    - '7'
    - A
    - ̈
    - B
    - '6'
    - ;
    - '5'
    - T
    - G
    - '9'
    - ᵉ
    - '0'
    - '8'
    - N
    - —
    - ̧
    - O
    - F
    - '-'
    - ᵗ
    - '?'
    - ᵈ
    - Q
    - k
    - H
    - ⟦
    - ⟧
    - '['
    - ']'
    - œ
    - ˢ
    - ˡ
    - ᵇ
    - Z
    -  
    - W
    - α
    - w
    - U
    - ̃
    - (
    - )
    - ̓
    - ο
    - ν
    - '&'
    - K
    - ⁱ
    - μ
    - ω
    - τ
    - δ
    - ε
    - °
    - Y
    - ̄
    - ρ
    - φ
    - '{'
    - Ψ
    - ι
    - υ
    - π
    - λ
    - $
    - /
    mode: NFD
  citation-file-link: https://github.com/FoNDUE-HTR/FONDUE-FR-MSS-18/blob/master/CITATION.cff
  description: French Manuscripts of the 18th
  format: Alto-XML
  hands:
    count: unknown
    precision: exact
  institutions: []
  language:
  - fra
  license:
    name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: FoNDUE
  project-website: https://github.com/FoNDUE-HTR
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notAfter: '1799'
    notBefore: '1700'
  title: FONDUE-FR-MSS-18
  transcription-guidelines: SegmOnto
  url: https://github.com/FoNDUE-HTR/FONDUE-FR-MSS-18
  volume:
  - count: 232519
    metric: characters
  - count: 228
    metric: files
  - count: 6446
    metric: lines
  - count: 709
    metric: regions
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Gabay, Simon and Nahon, Peter and\
    \ Cicchini, Marco and Jaureguy, Yvan and Chappuis, Loraine},\nmonth = {11},\n\
    title = {FoNDUE-FR-MSS-18},\nurl = {https://github.com/FoNDUE-HTR/FONDUE-FR-MSS-18},\n\
    year = {2023}\n}\n"
  _apa: "Gabay S., Nahon P., Cicchini M., Jaureguy Y., Chappuis L. (2023). FoNDUE-FR-MSS-18\
    \ (version 1.0). URL: https://github.com/FoNDUE-HTR/FONDUE-FR-MSS-18\n"
- authors:
  - name: Gabay
    orcid: 0000-0001-9094-4475
    roles:
    - transcriber
    - project-manager
    - quality-control
    - support
    surname: Simon
  citation-file-link: https://github.com/FoNDUE-HTR/FONDUE-FR-PRINT-16/blob/master/CITATION.cff
  description: ' Transcriptions of French 16th c. prints '
  format: Alto-XML
  hands:
    count: unknown
    precision: exact
  language:
  - fra
  license:
    name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: FoNDUE
  project-website: https://github.com/FoNDUE-HTR/
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-typed
  time:
    notAfter: '1600'
    notBefore: '1500'
  title: FONDUE-FR-PRINT-16
  transcription-guidelines: SegmOnto
  url: https://github.com/FoNDUE-HTR/FONDUE-FR-PRINT-16
  volume:
  - count: 504656
    metric: characters
  - count: 930
    metric: files
  - count: 17817
    metric: lines
  - count: 2829
    metric: regions
  automatically-aligned: false
- authors:
  - name: Simon
    orcid: 0000-0001-9094-4475
    roles:
    - project-manager
    - quality-control
    - support
    surname: Gabay
  - name: Sophie
    orcid: 0009-0005-6841-0158
    roles:
    - transcriber
    surname: Dolto
  automatically-aligned: false
  characters:
    members:
    - e
    - a
    - s
    - i
    - t
    - r
    - n
    - u
    - l
    - o
    - d
    - c
    - p
    - m
    - ́
    - ','
    - .
    - v
    - ’
    - g
    - f
    - b
    - q
    - h
    - ̀
    - ̂
    - x
    - j
    - L
    - y
    - '-'
    - I
    - "'"
    - —
    - A
    - G
    - E
    - M
    - P
    - C
    - B
    - J
    - D
    - z
    - ̧
    - S
    - '!'
    - T
    - '?'
    - ¬
    - V
    - ;
    - U
    - O
    - R
    - Q
    - ':'
    - '1'
    - k
    - F
    - H
    - œ
    - '0'
    - (
    - )
    - “
    - '2'
    - N
    - '6'
    - '9'
    - '8'
    - '5'
    - ̈
    - '3'
    - w
    - W
    - '4'
    - Y
    - ”
    -  
    - '7'
    - Z
    - '*'
    - /
    - K
    - '"'
    - «
    - »
    mode: NFD
  citation-file-link: https://github.com/FoNDUE-HTR/FONDUE-FR-PRINT-20/blob/master/CITATION.cff
  description: French novels
  format: Alto-XML
  hands:
    count: unknown
    precision: exact
  institutions: []
  language:
  - eng
  license:
    name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: FoNDUE
  project-website: https://github.com/FoNDUE-HTR
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-typed
  time:
    notAfter: '1900'
    notBefore: '1999'
  title: FONDUE-FR-PRINT-20
  transcription-guidelines: SegmOnto
  url: https://github.com/FoNDUE-HTR/FONDUE-FR-PRINT-20
  volume:
  - count: 81599
    metric: characters
  - count: 55
    metric: files
  - count: 1604
    metric: lines
  - count: 64
    metric: regions
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Gabay, Simon and Dolto, Sophie},\n\
    month = {2},\ntitle = {FONDUE-FR-PRINT-20},\nurl = {https://github.com/FoNDUE-HTR/FONDUE-FR-PRINT-20},\n\
    year = {2024}\n}\n"
  _apa: "Gabay S., Dolto S. (2024). FONDUE-FR-PRINT-20 (version 1.0). URL: https://github.com/FoNDUE-HTR/FONDUE-FR-PRINT-20\n"
- authors:
  - name: Simon
    orcid: 0000-0001-9094-4475
    roles:
    - transcriber
    - project-manager
    - quality-control
    - support
    surname: Gabay
  - name: Maddalena
    roles:
    - transcriber
    surname: Zaglio
  automatically-aligned: false
  characters:
    members:
    - e
    - a
    - i
    - o
    - r
    - n
    - t
    - l
    - s
    - c
    - d
    - u
    - p
    - m
    - v
    - ','
    - g
    - h
    - f
    - b
    - .
    - z
    - ̀
    - ¬
    - q
    - I
    - '-'
    - C
    - A
    - "'"
    - ’
    - M
    - P
    - E
    - '"'
    - S
    - ;
    - L
    - '='
    - T
    - R
    - O
    - D
    - V
    - G
    - ':'
    - N
    - '1'
    - '!'
    - B
    - )
    - —
    - '4'
    - (
    - F
    - '['
    - ']'
    - Q
    - '2'
    - '?'
    - '0'
    - '3'
    - '9'
    - '5'
    - U
    - °
    - ⬪
    - '6'
    - y
    - Z
    - k
    - ᗅ
    - K
    - x
    - §
    - H
    - '8'
    - X
    - '7'
    - W
    - –
    - ^
    - “
    - ᑕ
    - ᗞ
    - w
    mode: NFD
  citation-file-link: https://github.com/FoNDUE-HTR/FONDUE-IT-PRINT-20/blob/master/CITATION.cff
  description: Archives and novels
  format: Alto-XML
  hands:
    count: unknown
    precision: exact
  institutions: []
  language:
  - ita
  license:
    name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: FoNDUE
  project-website: https://github.com/FoNDUE-HTR
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-typed
  time:
    notAfter: '1900'
    notBefore: '1999'
  title: FONDUE-IT-PRINT-20
  transcription-guidelines: SegmOnto
  url: https://github.com/FoNDUE-HTR/FONDUE-IT-PRINT-20
  volume:
  - count: 54628
    metric: characters
  - count: 28
    metric: files
  - count: 1150
    metric: lines
  - count: 67
    metric: regions
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Gabay, Simon and Zaglio, Maddalena},\n\
    month = {12},\ntitle = {FONDUE-IT-PRINT-20},\nurl = {https://github.com/FoNDUE-HTR/FONDUE-IT-PRINT-20},\n\
    year = {2023}\n}\n"
  _apa: "Gabay S., Zaglio M. (2023). FONDUE-IT-PRINT-20 (version 1.0). URL: https://github.com/FoNDUE-HTR/FONDUE-IT-PRINT-20\n"
- authors:
  - name: Gabay
    orcid: 0000-0001-9094-4475
    roles:
    - transcriber
    - project-manager
    - quality-control
    - support
    surname: Simon
  - name: Joyeux-Prunel
    orcid: 0000-0003-1046-7002
    roles:
    - transcriber
    surname: Béatrice
  - name: Rizzello
    orcid: 0000-0003-0131-192X
    roles:
    - transcriber
    surname: Martina
  - name: Berlincourt
    orcid: 0000-0001-5739-8839
    roles:
    - transcriber
    surname: Valéry
  - name: Rizzi
    orcid: 0000-0002-8542-7091
    roles:
    - transcriber
    surname: Elena Maria
  - affiliation: Ca' Foscari University
    name: Tesser
    orcid: 0000-0001-9553-1100
    roles:
    - transcriber
    surname: Stefania
  - name: Bukvic
    roles:
    - transcriber
    surname: Victoria
  - name: Diaz
    roles:
    - transcriber
    surname: Jaime
  - name: Aebi
    roles:
    - transcriber
    surname: Guillaume
  - name: Bickel
    roles:
    - transcriber
    surname: Raoul
  characters:
    members:
    - e
    - n
    - .
    - r
    - i
    - a
    - u
    - t
    - l
    - s
    - '0'
    - o
    - h
    - d
    - '1'
    - c
    - '2'
    - m
    - g
    - '5'
    - ̈
    - '3'
    - f
    - b
    - ','
    - M
    - B
    - '4'
    - S
    - A
    - '6'
    - F
    - G
    - '7'
    - '8'
    - v
    - p
    - )
    - (
    - L
    - '9'
    - z
    - P
    - k
    - R
    - V
    - D
    - K
    - y
    - W
    - E
    - H
    - C
    - –
    - ̀
    - w
    - J
    - T
    - Z
    - ́
    - '-'
    - N
    - I
    - —
    - q
    - O
    - U
    -  
    - ̂
    - ’
    - x
    - j
    - '"'
    - »
    - ¬
    - ;
    - œ
    - X
    - ̧
    - Q
    - "'"
    - ':'
    - ß
    - «
    - '?'
    - §
    - Y
    - æ
    - '['
    - ']'
    - /
    - †
    - '!'
    - „
    - “
    - …
    - '&'
    mode: NFD
  citation-file-link: https://raw.githubusercontent.com/FoNDUE-HTR/FONDUE-MLT-ART/main/CITATION.cff
  description: Swiss art exhibitions catalogues
  format: Alto-XML
  hands:
    count: '1'
    precision: exact
  institutions: []
  language:
  - deu
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: FoNDUE
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-typed
  time:
    notAfter: '1961'
    notBefore: '1842'
  title: FONDUE-MLT-ART
  transcription-guidelines: No segmentation, only transcription.
  url: https://github.com/FoNDUE-HTR/FONDUE-MLT-ART
  volume:
  - count: 141786
    metric: characters
  - count: 215
    metric: files
  - count: 5664
    metric: lines
  - count: 60
    metric: pages
  - count: 215
    metric: regions
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Joyeux-Prunel, Béatrice and Gabay,\
    \ Simon and Rizzello, Martina and Berlincourt, Valéry and Rizzi, Elena Maria and\
    \ Tesser, Stefania and Bukvic, Victoria and Diaz, Jaime and Aebi, Guillaume and\
    \ Bickel, Raoul},\nmonth = {11},\ntitle = {FONDUE-MLT-ART},\nurl = {https://github.com/FoNDUE-HTR/FONDUE-MLT-ART},\n\
    year = {2023}\n}\n"
  _apa: "Joyeux-Prunel B., Gabay S., Rizzello M., Berlincourt V., Rizzi E.M., Tesser\
    \ S., Bukvic V., Diaz J., Aebi G., Bickel R. (2023). FONDUE-MLT-ART (version 1.0).\
    \ URL: https://github.com/FoNDUE-HTR/FONDUE-MLT-ART\n"
- authors:
  - name: Pradier
    orcid: 0000-0002-3476-7248
    roles:
    - transcriber
    surname: Frédérine
  - name: Gabay
    orcid: 0000-0001-9094-4475
    roles:
    - transcriber
    - project-manager
    - quality-control
    - support
    surname: Simon
  - name: Kervegan
    orcid: 0000-0003-2821-8821
    roles:
    - transcriber
    surname: Paul
  - name: Janès
    orcid: 0000-0002-8971-6173
    roles:
    - transcriber
    surname: Juliette
  - name: Sánchez Oeconomo
    orcid: 0000-0002-8591-5394
    roles:
    - transcriber
    surname: Esteban
  citation-file-link: https://github.com/FoNDUE-HTR/FONDUE-MLT-CAT/blob/main/CITATION.cff
  description: Groundtruth for 19th/20th sale/exhibition catalogues, mainly printed
    in France but not only.
  format: Alto-XML
  hands:
    count: unknown
    precision: exact
  institutions: []
  language:
  - por
  - fra
  - ita
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: FoNDUE
  project-website: https://github.com/FoNDUE-HTR
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-typed
  time:
    notAfter: '1972'
    notBefore: '1818'
  title: FONDUE-MLT-CAT
  transcription-guidelines: 'Segmentation include an extra zone `CustomeZone: entry`'
  url: https://github.com/FoNDUE-HTR/FONDUE-MLT-CAT
  volume:
  - count: 1285120
    metric: characters
  - count: 1381
    metric: files
  - count: 43114
    metric: lines
  - count: 10713
    metric: regions
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Pradier, Frederine and Gabay, Simon\
    \ and Janès, Juliette and Sánchez Oeconomo, Esteban and Kervegan, Paul},\nmonth\
    \ = {10},\ntitle = {FoNDUE - Datasets for historical catalogues},\nurl = {https://github.com/FoNDUE-HTR/FONDUE-MLT-CAT},\n\
    year = {2022}\n}\n"
  _apa: "Pradier F., Gabay S., Janès J., Sánchez Oeconomo E., Kervegan P. (2022).\
    \ FoNDUE - Datasets for historical catalogues (version 0.9). URL: https://github.com/FoNDUE-HTR/FONDUE-MLT-CAT\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: FoNDUE_Kunsthistorisches-UZH_Archivdatenbank
  url: https://github.com/FoNDUE-HTR/FoNDUE_Kunsthistorisches-UZH_Archivdatenbank
  authors:
  - name: Pauline
    surname: Jacsont
    orcid: 0000-0002-6296-3246
    roles:
    - project-manager
    - transcriber
    - aligner
    - quality-control
  - name: Simon
    surname: Gabay
    orcid: 0000-0001-9094-4475
    roles:
    - project-manager
    - quality-control
    - support
  - name: Tristan
    surname: Weddigen
    orcid: 0000-0002-4609-8950
    roles:
    - support
  institutions: []
  description: HTR data made with the Kunsthistorisches UZH corpus.
  project-name: FoNDUE
  project-website: https://www.unige.ch/lettres/humanites-numeriques/recherche/projets-de-la-chaire/fondue
  language:
  - deu
  - fra
  - ita
  production-software: eScriptorium + Kraken
  script:
  - iso: Latn
  script-type: evenly-mixed
  time:
    notBefore: '1900'
    notAfter: '1999'
  hands:
    count: more-than-10
    precision: estimated
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  volume:
  - metric: pages
    count: 1100
  citation-file-link: >-
    https://github.com/FoNDUE-HTR/FoNDUE_Kunsthistorisches-UZH_Archivdatenbank/blob/main/CITATION.cff
  transcription-guidelines: "The transcription is strictly diplomatic: no abbreviations\
    \ are resolved. \LItems that are crossed out or struck through will be transcribed\
    \ with a \"€\"."
  automatically-aligned: false
- authors:
  - name: Gabay
    roles:
    - project-manager
    surname: Simon
  - name: Pinche
    roles:
    - project-manager
    surname: Ariane
  - name: Leroy
    roles:
    - transcriber
    surname: Noé
  - name: Christensen
    roles:
    - support
    surname: Kelly
  characters:
    members:
    - e
    - i
    - s
    - t
    - u
    - n
    - a
    - r
    - o
    - l
    - d
    - c
    - m
    - p
    - q
    - f
    - g
    - .
    - ̃
    - h
    - b
    - z
    - y
    - I
    - x
    - ⁊
    - ','
    - R
    - E
    - C
    - ̾
    - Q
    - L
    - S
    - A
    - D
    - M
    - ͣ
    - ꝑ
    - ͥ
    - P
    - ꝯ
    - T
    - N
    - ¶
    - O
    - B
    - ͤ
    - U
    - '-'
    - '1'
    - ꝰ
    - ᷑
    - ̽
    - '2'
    - '3'
    - ẜ
    - F
    - ⟦
    - ⟧
    - '6'
    - ħ
    - ꝓ
    - '7'
    - '4'
    - ͨ
    - '9'
    - '8'
    - ;
    - G
    - '0'
    - ͦ
    - '5'
    - H
    - "'"
    - ̀
    - ł
    - đ
    - ́
    - ͫ
    - ‸
    - '&'
    - k
    - °
    - ẞ
    - ͬ
    - ᷤ
    - K
    - '['
    - ']'
    - ͯ
    - ̧
    - (
    - )
    - Y
    - Z
    - ':'
    - ͧ
    - ᷠ
    - X
    mode: NFD
  citation-file-link: https://github.com/Gallicorpora/HTR-MSS-15e-Siecle/CITATION.
  description: Corpus d'entrainement pour l'HTR composé de manuscrits français du
    15e s.
  format: Alto-XML
  hands:
    count: 1-per-folder
    precision: estimated
  language:
  - frm
  - fra
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: Gallicorpora
  project-website: https://github.com/Gallicorpora
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notAfter: '1500'
    notBefore: '1400'
  title: Données HTR manuscrits du 15e siècle
  transcription-guidelines: 'Les normes de transcription suivent les préconisations
    du projet CREMMALAB : https://cremmalab.hypotheses.org'
  url: https://github.com/Gallicorpora/HTR-MSS-15e-Siecle
  volume:
  - count: 169207
    metric: characters
  - count: 85
    metric: files
  - count: 5937
    metric: lines
  - count: 458
    metric: regions
  automatically-aligned: false
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Données imprimés du 16e siècle
  description: Corpus d'entrainement pour l'HTR constitué d'imprimés du 16e siècle
  url: https://github.com/Gallicorpora/HTR-imprime-16e-siecle
  authors:
  - name: Gabay
    surname: Simon
    roles:
    - project-manager
  - name: Pinche
    roles:
    - project-manager
    surname: Ariane
  - name: Vlachou-Efstathiou
    surname: malamatenia
    roles:
    - transcriber
  - name: Christensen
    surname: Kelly
    roles:
    - support
  format: Alto-XML
  hands:
    count: 1-per-folder
    precision: estimated
  language:
  - frm
  - fra
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  project-name: Gallicorpora
  project-website: https://github.com/Gallicorpora
  script:
  - iso: Latn
  script-type: only-typed
  time:
    notAfter: '1599'
    notBefore: '1500'
  transcription-guidelines: Les normes de transcription suivent les préconisations
    du projet Gallicorpora
  volume:
  - metric: characters
    count: 186202
  - metric: files
    count: 180
  - metric: lines
    count: 4918
  - metric: regions
    count: 591
  citation-file-link: https://github.com/Gallicorpora/HTR-imprime-16e-siecle/CITATION.cff
  production-software: eScriptorium + Kraken
  characters:
    mode: NFD
    members:
    - e
    - u
    - r
    - a
    - n
    - i
    - t
    - o
    - l
    - s
    - ſ
    - d
    - c
    - m
    - p
    - ','
    - q
    - y
    - v
    - f
    - g
    - b
    - h
    - .
    - ’
    - '&'
    - E
    - x
    - "'"
    - z
    - ́
    - ̀
    - A
    - ¬
    - ̃
    - D
    - C
    - R
    - ':'
    - L
    - I
    - S
    - P
    - N
    - M
    - O
    - Q
    - T
    - V
    - G
    - H
    - B
    - F
    - '-'
    - ̧
    - j
    - '?'
    - (
    - ̈
    - )
    - »
    - '1'
    - œ
    - ¶
    - '!'
    - U
    - '2'
    - X
    - ;
    - '9'
    - Y
    - '4'
    - '3'
    - ß
    - '5'
    - '"'
    - '7'
    - J
    - '8'
    - æ
    - ꝰ
    - '6'
    - '0'
    - ̂
    - ʳ
    - ⁊
    - Z
    - «
    - '*'
    - ꝗ
    - ꝓ
    -  
    - ⁋
    - Ι
    - ꝑ
    - ']'
    - ͥ
    - ᵉ
    - Ε
    - '['
    - Τ
    - /
  automatically-aligned: false
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Imprimés 17e siècle
  description: Corpus d'entrainement pour l'HTR composé d'imprimés français du 17e
    s.
  url: https://github.com/Gallicorpora/HTR-imprime-17e-siecle
  authors:
  - name: Gabay
    surname: Simon
    roles:
    - project-manager
  - name: Pinche
    surname: Ariane
    roles:
    - project-manager
  - name: Fabert
    surname: Eliott
    roles:
    - transcriber
  - name: Vlachou-Efstathiou
    surname: malamatenia
    roles:
    - transcriber
  - name: Christensen
    surname: Kelly
    roles:
    - support
  project-name: Gallicorpora
  project-website: https://github.com/Gallicorpora
  language:
  - frm
  - fra
  script:
  - iso: Latn
  script-type: only-typed
  time:
    notBefore: '1600'
    notAfter: '1699'
  hands:
    count: 1-per-folder
    precision: estimated
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  volume:
  - metric: characters
    count: 255981
  - metric: files
    count: 327
  - metric: lines
    count: 8950
  - metric: regions
    count: 1185
  transcription-guidelines: Les normes de transcription suivent les préconisations
    du projet gallicorpora
  citation-file-link: https://github.com/Gallicorpora/HTR-imprime-17e-siecle/CITATION.cff
  production-software: eScriptorium + Kraken
  characters:
    mode: NFD
    members:
    - e
    - u
    - r
    - a
    - n
    - i
    - t
    - o
    - l
    - s
    - ſ
    - d
    - c
    - m
    - p
    - ','
    - v
    - q
    - .
    - f
    - g
    - b
    - E
    - ’
    - h
    - y
    - ́
    - A
    - '&'
    - "'"
    - S
    - I
    - x
    - ¬
    - L
    - C
    - R
    - P
    - D
    - ̀
    - M
    - V
    - T
    - O
    - N
    - z
    - ':'
    - Q
    - j
    - '-'
    - F
    - G
    - ̃
    - B
    - ;
    - H
    - ̈
    - '1'
    - ̂
    - ̧
    - '2'
    - '?'
    - '3'
    - œ
    - '4'
    - '5'
    - Y
    - U
    - Z
    - '6'
    - '7'
    - '8'
    - '0'
    - X
    - J
    - '9'
    - (
    - æ
    - )
    - Æ
    - ι
    - α
    - '!'
    - ß
    - ο
    - ν
    - ε
    - ρ
    - ̓
    - υ
    - κ
    - '*'
    - σ
    - τ
    - ω
    - '['
    - ']'
    - ꝰ
    - K
    - Α
    - χ
    - ς
    - π
    - γ
    - ̨
    - μ
    - k
    - ͂
    - Ν
    - Β
    - λ
    - Σ
    - Κ
    - η
    - θ
    - W
    - Œ
    - δ
    - Τ
    - ͅ
    - »
    - ᵉ
    - ˡ
    - ͧ
    - Ζ
    - β
    - ̔
    - ̇
    - °
    - w
    - ẞ
    - Φ
    - Λ
    - Χ
    - φ
    - Ι
    - ʳ
    - ᵐ
  automatically-aligned: false
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Données imprimés du 18e siècle
  description: Corpus d'entrainement pour l'HTR constitué d'imprimés du 18e siècle
  url: https://github.com/Gallicorpora/HTR-imprime-18e-siecle
  authors:
  - name: Gabay
    roles:
    - project-manager
    surname: Simon
  - name: Pinche
    roles:
    - project-manager
    surname: Ariane
  - name: Fabert
    roles:
    - transcriber
    surname: Eliott
  - name: Christensen
    roles:
    - support
    surname: Kelly
  project-name: Gallicorpora
  project-website: https://github.com/Gallicorpora
  language:
  - fra
  script:
  - iso: Latn
  script-type: only-typed
  time:
    notBefore: '1700'
    notAfter: '1799'
  hands:
    count: 1-per-folder
    precision: estimated
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  volume:
  - metric: characters
    count: 255981
  - metric: files
    count: 327
  - metric: lines
    count: 8950
  - metric: regions
    count: 1185
  transcription-guidelines: Les normes de transcription suivent les préconisations
    du projet gallicorpora
  citation-file-link: https://github.com/Gallicorpora/HTR-imprime-18e-siecle/CITATION.cff
  production-software: eScriptorium + Kraken
  characters:
    mode: NFD
    members:
    - e
    - u
    - r
    - a
    - n
    - i
    - t
    - o
    - l
    - s
    - ſ
    - d
    - c
    - m
    - p
    - ','
    - v
    - q
    - .
    - f
    - g
    - b
    - E
    - ’
    - h
    - y
    - ́
    - A
    - '&'
    - "'"
    - S
    - I
    - x
    - ¬
    - L
    - C
    - R
    - P
    - D
    - ̀
    - M
    - V
    - T
    - O
    - N
    - z
    - ':'
    - Q
    - j
    - '-'
    - F
    - G
    - ̃
    - B
    - ;
    - H
    - ̈
    - '1'
    - ̂
    - ̧
    - '2'
    - '?'
    - '3'
    - œ
    - '4'
    - '5'
    - Y
    - U
    - Z
    - '6'
    - '7'
    - '8'
    - '0'
    - X
    - J
    - '9'
    - (
    - æ
    - )
    - Æ
    - ι
    - α
    - '!'
    - ß
    - ο
    - ν
    - ε
    - ρ
    - ̓
    - υ
    - κ
    - '*'
    - σ
    - τ
    - ω
    - '['
    - ']'
    - ꝰ
    - K
    - Α
    - χ
    - ς
    - π
    - γ
    - ̨
    - μ
    - k
    - ͂
    - Ν
    - Β
    - λ
    - Σ
    - Κ
    - η
    - θ
    - W
    - Œ
    - δ
    - Τ
    - ͅ
    - »
    - ᵉ
    - ˡ
    - ͧ
    - Ζ
    - β
    - ̔
    - ̇
    - °
    - w
    - ẞ
    - Φ
    - Λ
    - Χ
    - φ
    - Ι
    - ʳ
    - ᵐ
  automatically-aligned: false
- authors:
  - name: Gabay
    roles:
    - project-manager
    surname: Simon
  - name: Pinche
    roles:
    - project-manager
    surname: Ariane
  - name: Leroy
    roles:
    - transcriber
    surname: Noé
  - name: Christensen
    roles:
    - support
    surname: Kelly
  characters:
    members:
    - e
    - s
    - u
    - t
    - a
    - i
    - r
    - o
    - n
    - l
    - d
    - c
    - m
    - p
    - ̃
    - f
    - q
    - g
    - y
    - h
    - b
    - .
    - z
    - ⁊
    - x
    - E
    - '-'
    - ','
    - ¶
    - L
    - ͥ
    - D
    - C
    - ;
    - ᷤ
    - I
    - ꝰ
    - Q
    - A
    - S
    - ꝑ
    - P
    - M
    - O
    - T
    - U
    - N
    - F
    - R
    - ꝓ
    - B
    - G
    - ꝯ
    - ̾
    - H
    - ᷑
    - ͬ
    - ̌
    - ':'
    - (
    - '['
    - ']'
    - v
    - J
    - Ꝙ
    - )
    - k
    - ꝙ
    - ͣ
    - V
    - '4'
    - ͦ
    - w
    - ͨ
    - ͤ
    - Ι
    - ̧
    - '1'
    - '9'
    - '7'
    - ̶
    - "'"
    - ́
    - '|'
    mode: NFD
  citation-file-link: https://github.com/Gallicorpora/HTR-incunable-15e-siecle/CITATION.cff
  description: Corpus d'entrainement pour l'HTR composé d'incunable français du 15e
    s.
  format: Alto-XML
  hands:
    count: 1-per-folder
    precision: estimated
  language:
  - frm
  - fra
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: Gallicorpora
  project-website: https://github.com/Gallicorpora
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-typed
  time:
    notAfter: '1500'
    notBefore: '1400'
  title: Données HTR incunables du 15e siècle
  transcription-guidelines: 'Les normes de transcription suivent les préconisations
    du projet CREMMALAB : https://cremmalab.hypotheses.org'
  url: https://github.com/Gallicorpora/HTR-incunable-15e-siecle
  volume:
  - count: 245094
    metric: characters
  - count: 149
    metric: files
  - count: 7608
    metric: lines
  - count: 535
    metric: regions
  automatically-aligned: false
- authors:
  - name: Emmanuelle
    roles:
    - project-manager
    surname: de Champs
  - name: Florence
    roles:
    - project-manager
    - quality-control
    - transcriber
    surname: Clavaud
  - name: Pauline
    roles:
    - project-manager
    - quality-control
    - transcriber
    surname: Charbonnier
  - name: Christine
    roles:
    - project-manager
    - quality-control
    - support
    surname: Nougaret
  - name: Alix
    roles:
    - aligner
    - project-manager
    - quality-control
    - support
    surname: Chagué
  - name: Thibault
    roles:
    - aligner
    - project-manager
    - quality-control
    - support
    surname: Clérice
  - name: Falcoz
    roles:
    - aligner
    surname: Elsa
  - name: Marie-Françoise
    roles:
    - project-manager
    surname: Limon-Bonnet
  - name: Elise
    roles:
    - project-manager
    surname: Wojszvzyk
  - name: Sylvie
    roles:
    - project-manager
    - quality-control
    - support
    surname: Dechavanne
  - roles:
    - transcriber
    surname: ALemoine
  - roles:
    - transcriber
    surname: ASJPeronneau
  - roles:
    - transcriber
    surname: Alcofrybas
  - roles:
    - transcriber
    surname: BeaLct
  - roles:
    - transcriber
    surname: CLbt
  - roles:
    - transcriber
    surname: Chloelsa
  - roles:
    - transcriber
    surname: DMichel
  - roles:
    - transcriber
    surname: Desauthieux
  - roles:
    - transcriber
    surname: EPerrin
  - roles:
    - transcriber
    surname: GBMireille
  - roles:
    - transcriber
    surname: GPINET
  - roles:
    - transcriber
    surname: Genea78
  - roles:
    - transcriber
    surname: JMGoux
  - roles:
    - transcriber
    surname: Jideuxhemme
  - roles:
    - transcriber
    surname: LBIsabelle
  - roles:
    - transcriber
    surname: Lamotte
  - roles:
    - transcriber
    surname: MFGarreau
  - roles:
    - transcriber
    surname: MIna
  - roles:
    - transcriber
    surname: Maniet
  - roles:
    - transcriber
    surname: MarionJo
  - roles:
    - transcriber
    surname: PGambette
  - roles:
    - transcriber
    surname: PPocard
  - roles:
    - transcriber
    surname: PROMBAUT
  - roles:
    - transcriber
    surname: PaulineTest
  - roles:
    - transcriber
    surname: SCayeux
  - roles:
    - transcriber
    surname: SL.
  - roles:
    - transcriber
    surname: SLespinasse
  - roles:
    - transcriber
    surname: Silver08
  - roles:
    - transcriber
    surname: TPellé
  - roles:
    - transcriber
    surname: Valérie
  - roles:
    - transcriber
    surname: alp
  - roles:
    - transcriber
    surname: jmorvan
  - roles:
    - transcriber
    surname: lelia
  - roles:
    - transcriber
    surname: majubama
  - roles:
    - transcriber
    surname: mickael.lefevr
  - roles:
    - transcriber
    surname: sgauthier
  - roles:
    - quality-control
    surname: EdChamps
  - name: Danièle
    roles:
    - support
    surname: Allezard
  - name: Françoise
    roles:
    - support
    surname: Auriau
  - name: Sophie
    roles:
    - support
    surname: Blanchard
  - name: Laure
    roles:
    - support
    surname: Cadars
  - name: Paul
    roles:
    - support
    surname: Cazin-Bernier
  - name: Rosine
    roles:
    - support
    surname: Cleyet-Michaud
  - name: Sophie
    roles:
    - support
    surname: Delinge
  - name: Christiane
    roles:
    - support
    surname: Demeulenaere-Douyère
  - name: Mathilde
    roles:
    - support
    surname: Deuve
  - name: Tristan
    roles:
    - support
    surname: Girard
  - name: Wilfried
    roles:
    - support
    surname: Gourdon
  - name: Emilie
    roles:
    - support
    surname: Laffitte-Louisou
  - name: Valérie
    roles:
    - support
    surname: Lemée
  - name: Jean-Claude
    roles:
    - support
    surname: Lescure
  - name: Mélisa
    roles:
    - support
    surname: Locatelli
  - name: Aurélie
    roles:
    - support
    surname: Massie
  - name: Thomas
    roles:
    - support
    surname: Olivier
  - name: Françoise
    roles:
    - support
    surname: Pinchard
  - name: Tiffanie
    roles:
    - support
    surname: Pitot
  - name: Anais
    roles:
    - support
    surname: Pontoparia
  - name: Michel
    roles:
    - support
    surname: Renard
  - name: Thierry
    roles:
    - support
    surname: Rihouey
  - name: Christian
    roles:
    - support
    surname: Rodriguez
  - name: Konstantinos
    roles:
    - support
    surname: Sifakis
  - name: Marie-Thérèse
    roles:
    - support
    surname: Solignat
  - name: Lucie
    roles:
    - support
    surname: Vieillon
  - roles:
    - support
    surname: SL
  characters:
    members:
    - e
    - a
    - i
    - n
    - t
    - s
    - r
    - u
    - o
    - l
    - m
    - d
    - c
    - p
    - ́
    - ̀
    - f
    - v
    - g
    - ','
    - q
    - b
    - .
    - ’
    - '1'
    - h
    - M
    - J
    - j
    - P
    - C
    - A
    - '-'
    - x
    - L
    - S
    - F
    - '9'
    - y
    - D
    - B
    - ̂
    - R
    - '2'
    - ^
    - '4'
    - z
    - '0'
    - E
    - V
    - G
    - '3'
    - '5'
    - T
    - )
    - (
    - H
    - '6'
    - N
    - '7'
    - '8'
    - I
    - ':'
    - O
    - ;
    - Q
    - ̧
    - °
    - U
    -  
    - /
    - W
    - '"'
    - ̈
    - '>'
    - <
    - '='
    - œ
    - w
    - '?'
    - _
    - X
    - '%'
    - k
    - '*'
    - ſ
    - '!'
    - Z
    - '&'
    - "'"
    - –
    - K
    - +
    mode: NFD
  citation-file-link: https://github.com/Dummy/depot-test/CITATION.cff
  description: WWI’s Poilus' testaments edited by the Archives National during the
    Testaments de Poilus project.
  format: Alto-XML
  hands:
    count: 1-per-file
    precision: estimated
  language:
  - fra
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: Testaments de Poilus
  project-website: https://edition-testaments-de-poilus.huma-num.fr/
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notAfter: '1918'
    notBefore: '1914'
  title: ' CREMMA-AN Testament De Poilus '
  transcription-guidelines: 'The original transcriptions were performed on a crowdsourcing
    application (https://testaments-de-poilus.huma-num.fr/#!/) under the supervision
    of the Archives nationales de France. Only the allographic portions of the documents
    were transcribed. Any marginal elements added later by clerks or archivists are
    neither segmented nor transcribed. The segmentation follows the SegmOnto ontology.
    Abbreviations and mispelling were not corrected. Superscripted portions of text
    are preceeded by ^. '
  url: https://github.com/HTR-United/CREMMA-AN-TestamentDePoilus
  volume:
  - count: 87726
    metric: characters
  - count: 226
    metric: files
  - count: 3330
    metric: lines
  - count: 553
    metric: regions
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Chagué, Alix and Clérice, Thibault\
    \ and Mazoue, Anaïs and Van Kote, Elsa},\ntitle = {CREMMA-AN-TestamentDePoilus\
    \ },\nurl = {https://github.com/HTR-United/CREMMA-AN-TestamentDePoilus}\n}\n"
  _apa: "Chagué A., Clérice T., Mazoue A., Van Kote E. CREMMA-AN-TestamentDePoilus\
    \  URL: https://github.com/HTR-United/CREMMA-AN-TestamentDePoilus\n"
- authors:
  - name: Thibault
    orcid: 0000-0003-1852-9204
    roles:
    - project-manager
    - quality-control
    - support
    surname: Clérice
  - name: Alix
    orcid: 0000-0002-0136-4434
    roles:
    - project-manager
    - quality-control
    - support
    surname: Chagué
  - name: Anaïs
    roles:
    - transcriber
    surname: Mazoue
  automatically-aligned: false
  characters:
    members:
    - e
    - r
    - n
    - a
    - u
    - o
    - t
    - i
    - l
    - ſ
    - d
    - s
    - c
    - m
    - p
    - v
    - y
    - q
    - g
    - f
    - b
    - z
    - h
    - J
    - /
    - x
    - R
    - ^
    - L
    - I
    - .
    - E
    - ẜ
    - ⁊
    - M
    - '1'
    - ꝑ
    - A
    - ́
    - ̾
    - <
    - '>'
    - j
    - C
    - D
    - '3'
    - ꝙ
    - '9'
    - V
    - '7'
    - '6'
    - ’
    - P
    - '8'
    - Ꝑ
    - ̃
    - T
    - (
    - S
    - N
    - ;
    - Q
    - ̀
    - '5'
    - '0'
    - U
    mode: NFD
  citation-file-link: https://github.com/HTR-United/CREMMA-MSS-16/CITATION.cff
  description: Manuscripts of the 16th century
  format: Alto-XML
  hands:
    count: 1-per-folder
    precision: exact
  institutions: []
  language:
  - fra
  license:
    name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: CREMMA
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notAfter: '1599'
    notBefore: '1500'
  title: CREMMA MSS 16
  transcription-guidelines: Abréviations conservées.
  url: https://github.com/HTR-United/CREMMA-MSS-16
  volume:
  - count: 10911
    metric: characters
  - count: 9
    metric: files
  - count: 244
    metric: lines
  - count: 18
    metric: regions
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Mazoue, Anaïs and Clérice, Thibault\
    \ and Chagué, Alix},\nmonth = {3},\ntitle = {CREMMA-MSS-16},\nurl = {https://github.com/HTR-United/CREMMA-MSS-16},\n\
    year = {2024}\n}\n"
  _apa: "Mazoue A., Clérice T., Chagué A. (2024). CREMMA-MSS-16 URL: https://github.com/HTR-United/CREMMA-MSS-16\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: CREMMA Manuscrits du 17e
  url: https://github.com/HTR-United/CREMMA-MSS-17
  project-name: CREMMA
  authors:
  - name: Clérice
    surname: Thibault
    roles:
    - project-manager
    - quality-control
  - name: Chagué
    surname: Alix
    roles:
    - project-manager
    - quality-control
  - name: Faure
    surname: Margaux
    roles:
    - transcriber
  - name: Norindr
    surname: Jade
    roles:
    - transcriber
  - name: Mazoue
    surname: Anais
    roles:
    - transcriber
  - name: Davoury
    surname: Baudoin
    roles:
    - transcriber
  description: Various Manuscripts of the 17th century
  language:
  - fra
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '1600'
    notAfter: '1699'
  hands:
    count: 1-per-folder
    precision: exact
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  volume:
  - metric: characters
    count: 81909
  - metric: files
    count: 111
  - metric: lines
    count: 2245
  - metric: regions
    count: 264
  transcription-guidelines: Abréviations conservées.
  production-software: eScriptorium + Kraken
  characters:
    mode: NFD
    members:
    - e
    - s
    - r
    - a
    - n
    - u
    - i
    - o
    - t
    - l
    - d
    - c
    - m
    - p
    - v
    - q
    - .
    - ','
    - y
    - "'"
    - f
    - b
    - g
    - ́
    - h
    - j
    - ̃
    - M
    - x
    - R
    - z
    - C
    - '1'
    - J
    - ^
    - ̀
    - P
    - L
    - S
    - V
    - '&'
    - A
    - E
    - '>'
    - I
    - <
    - '2'
    - X
    - '3'
    - T
    - '7'
    - D
    - '6'
    - ']'
    - B
    - '4'
    - '['
    - '0'
    - '?'
    - '-'
    - ̂
    - ̈
    - '9'
    - '5'
    - ;
    - G
    - N
    - '8'
    - ':'
    - F
    - ̧
    - )
    - (
    - Q
    - O
    - H
    - W
    - œ
    - ‸
    - ⁊
    - U
    - ̄
    - /
    - ꝗ
    - +
    - k
    - °
    -  
    - w
    - ם
    - Z
    - ς
    - '#'
    - æ
    - ꝙ
    - ͣ
    - ε
    - ϕ
  automatically-aligned: false
- authors:
  - name: Chagué
    roles:
    - project-manager
    - quality-control
    surname: Alix
  - name: Clérice
    roles:
    - project-manager
    - quality-control
    surname: Thibault
  - name: Norindr
    roles:
    - transcriber
    surname: Jade
  - name: Norindr
    roles:
    - transcriber
    surname: Jade
  - name: Van Kote
    roles:
    - transcriber
    - aligner
    surname: Elsa
  - name: Faure
    roles:
    - transcriber
    - aligner
    surname: Margaux
  characters:
    members:
    - e
    - s
    - a
    - r
    - t
    - n
    - u
    - i
    - o
    - l
    - d
    - p
    - c
    - m
    - v
    - .
    - q
    - f
    - ́
    - "'"
    - ','
    - g
    - b
    - h
    - y
    - x
    - j
    - L
    - C
    - ̀
    - ^
    - '1'
    - M
    - S
    - ̂
    - z
    - E
    - R
    - ;
    - '2'
    - I
    - '6'
    - '0'
    - '>'
    - <
    - D
    - V
    - J
    - '4'
    - '3'
    - (
    - )
    - P
    - ̈
    - '5'
    - ̃
    - '-'
    - '7'
    - B
    - '8'
    - A
    - '['
    - ']'
    - '9'
    - N
    - F
    - G
    - T
    - '?'
    - X
    - ̧
    - /
    - ':'
    - O
    - H
    - ’
    - ¬
    - +
    -  
    - œ
    - U
    - '&'
    - «
    - Q
    - '='
    - K
    - '!'
    - k
    - W
    - Z
    - w
    - °
    - ⁊
    - ꝑ
    - ſ
    - ‸
    - '#'
    - ̶
    - _
    - Y
    - ̄
    - »
    - ͦ
    mode: NFD
  description: Manuscripts of the 18th century
  format: Alto-XML
  hands:
    count: 1-per-folder
    precision: exact
  language:
  - fra
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: CREMMA
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notAfter: '1799'
    notBefore: '1700'
  title: CREMMA Manuscrits du 18e
  transcription-guidelines: Abréviations conservées.
  url: https://github.com/HTR-United/CREMMA-MSS-18
  volume:
  - count: 141690
    metric: characters
  - count: 125
    metric: files
  - count: 4019
    metric: lines
  - count: 329
    metric: regions
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Van Kote, Elsa and Faure, Margaux\
    \ and Norindr, Jade and Clérice, Thibault and Chagué, Alix},\nmonth = {3},\ntitle\
    \ = {CREMMA-MSS-18},\nurl = {https://github.com/HTR-United/CREMMA-MSS-18},\nyear\
    \ = {2024}\n}\n"
  _apa: "Van Kote E., Faure M., Norindr J., Clérice T., Chagué A. (2024). CREMMA-MSS-18\
    \ URL: https://github.com/HTR-United/CREMMA-MSS-18\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: CREMMA Manuscrits du 19e
  url: https://github.com/HTR-United/CREMMA-MSS-19
  project-name: CREMMA
  authors:
  - name: Clérice
    surname: Thibault
    roles:
    - project-manager
    - quality-control
  - name: Chagué
    surname: Alix
    roles:
    - project-manager
    - quality-control
  - name: Davoury
    surname: Baudouin
    roles:
    - transcriber
    - aligner
  - name: Doat
    surname: Soline
    roles:
    - transcriber
    - aligner
  - name: Faure
    surname: Margaux
    roles:
    - transcriber
    - aligner
  - name: Humeau
    surname: Maxime
    roles:
    - transcriber
    - aligner
  description: Manuscripts of the 19th century
  language:
  - fra
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '1800'
    notAfter: '1899'
  hands:
    count: 1-per-folder
    precision: exact
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  volume:
  - metric: characters
    count: 55581
  - metric: files
    count: 69
  - metric: lines
    count: 1807
  - metric: regions
    count: 167
  transcription-guidelines: Abréviations conservées.
  production-software: eScriptorium + Kraken
  characters:
    mode: NFD
    members:
    - e
    - s
    - a
    - i
    - u
    - n
    - r
    - t
    - o
    - l
    - d
    - m
    - c
    - p
    - v
    - ','
    - ́
    - "'"
    - q
    - f
    - .
    - g
    - b
    - h
    - ̀
    - j
    - x
    - '-'
    - ̂
    - L
    - C
    - M
    - y
    - J
    - z
    - A
    - D
    - P
    - '"'
    - '>'
    - <
    - E
    - '!'
    - N
    - S
    - Q
    - '1'
    - ;
    - '?'
    - ':'
    - R
    - I
    - T
    - B
    - V
    - œ
    - '6'
    - O
    - (
    - _
    - )
    - '2'
    - '3'
    - H
    - '4'
    - ^
    - '9'
    - '8'
    - '7'
    - F
    - '0'
    - G
    - '5'
    - ̧
    - U
    - '&'
    - '['
    - ']'
    - °
    - ̈
    - k
    - $
    - w
    - X
    - W
    - Y
    - +
    - Z
  automatically-aligned: false
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: CREMMA Manuscrits du 20e
  url: https://github.com/HTR-United/CREMMA-MSS-20
  project-name: CREMMA
  authors:
  - name: Clérice
    surname: Thibault
    roles:
    - project-manager
    - quality-control
  - name: Chagué
    surname: Alix
    roles:
    - project-manager
    - quality-control
  description: "Manuscripts of the 20th century\n"
  language:
  - fra
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '1900'
    notAfter: '1999'
  hands:
    count: 1-per-folder
    precision: exact
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  volume:
  - metric: characters
    count: 5764
  - metric: files
    count: 13
  - metric: lines
    count: 224
  - metric: regions
    count: 25
  transcription-guidelines: Abréviations conservées.
  production-software: eScriptorium + Kraken
  characters:
    mode: NFKD
    members:
    - e
    - a
    - s
    - n
    - t
    - r
    - i
    - u
    - l
    - o
    - d
    - c
    - m
    - p
    - ́
    - <
    - '>'
    - "'"
    - v
    - q
    - ','
    - .
    - ̀
    - b
    - g
    - h
    - j
    - f
    - F
    - J
    - '1'
    - '-'
    - ̂
    - M
    - A
    - E
    - x
    - T
    - y
    - C
    - D
    - ^
    - O
    - '8'
    - N
    - '7'
    - B
    - S
    - '0'
    - ̧
    - P
    - G
    - R
    - H
    - L
    - '9'
    - z
    - I
    - '2'
    - ':'
    - U
    - '&'
    - k
    - +
    - ;
    - $
    - V
    - œ
    - '['
    - '?'
    - ']'
    - '4'
    - '3'
    - (
    - )
    - '6'
  automatically-aligned: false
- authors:
  - name: Cl\xE9rice
    orcid: 0000-0003-1852-9204
    roles:
    - transcriber
    - aligner
    - project-manager
    - quality-control
    surname: Thibault
  - name: Chagu\xE9
    orcid: 0000-0002-0136-4434
    roles:
    - project-manager
    surname: Alix
  - name: Vlachou Efstathiou
    orcid: 0000-0002-9397-356X
    roles:
    - transcriber
    - aligner
    surname: Malamatenia
  characters:
    members:
    - i
    - e
    - t
    - a
    - u
    - s
    - ̃
    - o
    - n
    - r
    - c
    - d
    - m
    - l
    - p
    - .
    - ̾
    - q
    - b
    - g
    - f
    - ⁊
    - 
    - ͣ
    - h
    - ꝰ
    - ꝑ
    - ͥ
    - x
    - ł
    - ᷑
    - ᷤ
    - ͦ
    - ꝙ
    - ꝯ
    - I
    - ':'
    - ͤ
    - ͭ
    - ꝵ
    - ꝓ
    - S
    - ͫ
    - ¶
    - ẜ
    - E
    - U
    - A
    - ͨ
    - C
    - ħ
    - N
    - Q
    - y
    - ꝗ
    - ᵈ
    - D
    - ̵
    - R
    - P
    - ͬ
    - ᷝ
    - M
    - T
    - ꝭ
    - /
    - ^
    - '2'
    - ͧ
    - '&'
    - z
    - ','
    - H
    - O
    - ¬
    - L
    - '1'
    - '3'
    - '4'
    - F
    - '='
    - G
    - ᷠ
    - ÷
    - ℥
    - '5'
    - B
    - '9'
    - Ø
    - ̇
    - Ꝙ
    - '6'
    - ̧
    - X
    - '8'
    - '0'
    - ᵇ
    - k
    - '7'
    - "'"
    - '*'
    - 
    - w
    - '-'
    - Y
    - ́
    - ̈
    - +
    - Z
    - đ
    -  
    - K
    - ⁋
    - ᵖ
    - 
    - ι
    mode: NFD
  description: Ground truth for medieval latin manuscripts. Formerly `CREMMA-Medieval-LAT`.
  format: Alto-XML
  hands:
    count: 1-per-folder
    precision: exact
  institutions: []
  language:
  - lat
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: CREMMA
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notAfter: '1599'
    notBefore: '1100'
  title: CREMMA Medii Aevi
  transcription-guidelines: 'Not a graphetic/"allographetique" transcription but rather
    a graphemic one that preserves the sequence of letters and reduces each form to
    its meaning in an alphabetical system. Abbreviations are preserved (e.g. pro,
    pre, tironian et, "est" etc.), as well as abbreviative signs, ligatures are reduced
    to their component letters. Spaces between letters reproduce the original (e.g.
    in the case of a semicontinuous script). Punctuations are simplified, reducing
    to ":" all two-component punctuation (e.g. punctus elevatus). Rare characters
    have been preserved such as "instans" and metric values (e.g. ounces). '
  url: https://github.com/HTR-United/CREMMA-Medieval-LAT
  volume:
  - count: 263222
    metric: characters
  - count: 121
    metric: files
  - count: 7274
    metric: lines
  - count: 441
    metric: regions
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Clérice, Thibault and Chagué, Alix\
    \ and Vlachou-Efstathiou, Malamatenia},\ndoi = {10.5281/zenodo.7013436},\ntitle\
    \ = {CREMMA Medii Aevi},\nurl = {https://github.com/HTR-United/CREMMA-Medieval-LAT}\n\
    }\n"
  _apa: "Clérice T., Chagué A., Vlachou-Efstathiou M. CREMMA Medii Aevi DOI: 10.5281/zenodo.7013436\
    \ URL: https://github.com/HTR-United/CREMMA-Medieval-LAT\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: CREMMA Early Modern Books
  url: https://github.com/HTR-United/cremma-16-17-print
  project-name: CREMMA
  authors:
  - name: Clérice
    surname: Thibault
    roles:
    - transcriber
    - project-manager
  description: Collection of book samples in early print forms, 16th to 17th century,
    in Latin and pre-orthographic French.
  language:
  - frm
  - lat
  script:
  - iso: Latn
  script-type: only-typed
  time:
    notBefore: '1500'
    notAfter: '1779'
  hands:
    count: 1-per-folder
    precision: estimated
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  volume:
  - metric: characters
    count: 84726
  - metric: files
    count: 98
  - metric: lines
    count: 2603
  - metric: regions
    count: 451
  sources:
  - reference: Omnia Andreae Alciati v.c. emblemata cum commentariis
    link: http://pid.emory.edu/ark:/25593/b70rv
  - reference: "15.. \tHistoria de duobus amantibus Eurialo et Lucretia "
    link: https://gallica.bnf.fr/ark:/12148/bpt6k533863
  - reference: "1520 \tEpigrammata clarissimi disertissimique viri Thomae Mori..."
    link: https://doi.org/10.3931/e-rara-74397
  - reference: "'1550 \tLa description de l'isle d'Utopie, oú est comprins '\n"
    link: https://gallica.bnf.fr/ark:/12148/bpt6k6566444g
  - reference: "1779 \tZoologia Danica, seu, Animalium Daniae et Norvegiae "
    link: https://archive.org/details/zoologiadanicase01mlle
  - reference: "'L'Achileyde de Stace... traduction en vers, avec ...'"
    link: https://gallica.bnf.fr/view3if/ga/ark:/12148/bpt6k3103841
  - reference: "1681 \tVigiliæ Rhetorum, Et Somnia Poetarvm, Symbolicè"
    link: http://diglib.hab.de/drucke/qun-607-5/start.htm
  - reference: 'Aneau, Barthélemy: Picta Poesis - Lugduni : Pesnot, 1564'
    link: http://diglib.hab.de/drucke/231-5-poet/start.htm
  citation-file-link: https://raw.githubusercontent.com/HTR-United/cremma-16-17-print/main/CITATION.CFF
  transcription-guidelines: Kept abbreviation and transcribed long s as long s
  production-software: eScriptorium + Kraken
  characters:
    mode: NFD
    members:
    - e
    - i
    - u
    - a
    - t
    - r
    - n
    - o
    - l
    - ſ
    - m
    - s
    - c
    - d
    - p
    - ','
    - .
    - q
    - b
    - g
    - f
    - h
    - v
    - A
    - I
    - E
    - ¬
    - '&'
    - x
    - S
    - ́
    - ̃
    - y
    - ’
    - C
    - P
    - ̀
    - T
    - R
    - M
    - ':'
    - V
    - æ
    - L
    - N
    - O
    - D
    - 
    - z
    - Q
    - j
    - H
    - G
    - B
    - F
    - '2'
    - ̈
    - '-'
    - '1'
    - "'"
    - œ
    - ;
    - '?'
    - (
    - ̂
    - )
    - '7'
    - U
    - X
    - '3'
    - ο
    - ι
    - α
    - '5'
    - '6'
    - '4'
    - ε
    - ̧
    - ν
    - τ
    - '8'
    - ̓
    - π
    - '9'
    - '!'
    - J
    - '0'
    - ꝰ
    - ς
    - λ
    - υ
    - Y
    - §
    - ꝙ
    - Æ
    - σ
    - Α
    - ω
    - ']'
    - Z
    - /
    - ρ
    - k
    - Ο
    - Ν
    - η
    - ͂
    - μ
    - κ
    - '*'
    - K
    - Υ
    - δ
    - θ
    - ꝗ
    - ℟
    - Ε
    - Ρ
    - Ω
    - Π
    - Ι
    - Τ
    - φ
    - ł
    - ̊
    - Μ
    - Θ
    - Σ
    - Β
    - Λ
    - γ
    - '|'
    - ½
    - ̰
    -  
    - ̔
    - χ
    - ϛ
    - ß
    - ͅ
    - Γ
    - Δ
    - W
    - Χ
    - ξ
    - 
    - '#'
  automatically-aligned: false
- authors:
  - name: Pinche
    orcid: 0000-0002-7843-5050
    roles:
    - transcriber
    - aligner
    - project-manager
    - quality-control
    - support
    surname: Ariane
  - name: Camps
    roles:
    - transcriber
    surname: Jean-Baptiste
  - name: Mariotti
    roles:
    - transcriber
    surname: Viola
  - name: Nolibois
    roles:
    - transcriber
    surname: Alice
  - name: Carnaille
    roles:
    - transcriber
    surname: Camille
  - name: Deleville
    roles:
    - transcriber
    surname: Prunelle
  - name: Lecomte
    roles:
    - transcriber
    surname: Sophie
  - name: Meylan
    roles:
    - transcriber
    surname: Aminoel
  - name: Ventura
    roles:
    - transcriber
    surname: Simone
  - name: Dugaz
    roles:
    - transcriber
    surname: Lucien
  characters:
    members:
    - e
    - i
    - s
    - t
    - n
    - a
    - r
    - u
    - o
    - l
    - c
    - d
    - m
    - p
    - .
    - q
    - f
    - g
    - ̃
    - z
    - b
    - h
    - ⁊
    - y
    - ':'
    - E
    - x
    - Q
    - L
    - S
    - ꝑ
    - D
    - ̾
    - ͥ
    - C
    - ꝯ
    - ͣ
    - A
    - I
    - M
    - "'"
    - ꝰ
    - ́
    - T
    - P
    - O
    - k
    - N
    - '9'
    - U
    - ͬ
    - G
    - R
    - ᷑
    - F
    - 
    - ͤ
    - '&'
    - '1'
    - B
    - ꝓ
    - H
    - ͦ
    - ᷤ
    - '7'
    - '2'
    - Λ
    - ÷
    - ł
    - '6'
    - '0'
    - '3'
    - '8'
    - '4'
    - ̽
    - w
    - '-'
    - '5'
    - ','
    - ͭ
    - ¶
    - Y
    - ẜ
    -  
    - ⟦
    - ⟧
    - ͨ
    - ̈
    - X
    - ħ
    - K
    - δ
    - /
    - ŧ
    - j
    mode: NFD
  citation-file-link: https://github.com/HTR-United/cremma-medieval/blob/main/citation.cff
  description: Transcription corpora for training HTR models for medieval manuscripts
    from the 12th to the 15th century.
  format: Alto-XML
  hands:
    count: 1-per-folder
    precision: exact
  language:
  - fra
  - fro
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: CremmaLab
  project-website: https://cremmalab.hypotheses.org
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notAfter: '1499'
    notBefore: '1100'
  title: Cremma Medieval
  transcription-guidelines: "As the data come from different projects, transcriptions\
    \ have been standardized to strengthen HTR models. We chose a graphemic transcription\
    \ method, following D. Stutzmann definitions (see bibliography), to have a sign\
    \ in the image corresponding to a sign in our text: all the abbreviations are\
    \ kept, and u/v or i/j are not distinguished. The spaces in the dataset are not\
    \ homogeneously represented, sometimes transcriptions reproduce the manuscript\
    \ spacing while others use lexical spaces. It must be stressed that spaces are\
    \ the most important source of error in medieval HTR models. Most of the transcription\
    \ follow the layout segmentation of the SegmOnto ontology (https://github.com/SegmOnto/examples),\
    \ separating the main column, margin, numbering, drop capital, etc. All the recommendations\
    \ are described in\n  the following document  : Ariane Pinche, Guide de transcription\
    \ pour les manuscrits du Xe au XVe siècle, 2022, ⟨hal-03697382>, en ligne : <https://hal.archives-ouvertes.fr/hal-03697382>."
  url: https://github.com/HTR-United/cremma-medieval
  volume:
  - count: 612134
    metric: characters
  - count: 279
    metric: files
  - count: 22913
    metric: lines
  - count: 1889
    metric: regions
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Pinche, Ariane},\ndoi = {10.5281/zenodo.5235185},\n\
    month = {6},\ntitle = {Cremma Medieval},\nurl = {https://github.com/HTR-United/cremma-medieval},\n\
    year = {2022}\n}\n"
  _apa: "Pinche A. (2022). Cremma Medieval (version Bicerin 1.1.0). DOI: 10.5281/zenodo.5235185\
    \ URL: https://github.com/HTR-United/cremma-medieval\n"
- authors:
  - name: Chagué
    orcid: 0000-0002-0136-4434
    roles:
    - project-manager
    - quality-control
    - digitization
    - support
    surname: Alix
  - name: Clérice
    orcid: 0000-0003-1852-9204
    roles:
    - project-manager
    - quality-control
    surname: Thibault
  - name: Van Kote
    roles:
    - aligner
    - transcriber
    surname: Elsa
  - name: Carrow
    roles:
    - aligner
    - transcriber
    - support
    surname: Jennifer
  - name: Wissam
    roles:
    - support
    surname: Antoum
  - name: Yann
    roles:
    - support
    surname: Audin
  - name: Anne
    roles:
    - support
    surname: Baillot
  - name: Marlène
    roles:
    - support
    surname: Baron
  - name: Alexandre
    roles:
    - support
    surname: Bartz
  - name: Rachel
    roles:
    - support
    surname: Bawden
  - name: Alice
    roles:
    - support
    surname: Beaudry-Lagarde
  - name: Rishika
    roles:
    - support
    surname: Bhagwatkar
  - name: Federico
    roles:
    - support
    surname: Boschetti
  - name: Camille
    roles:
    - support
    surname: Bourgeois
  - name: Alice
    roles:
    - support
    surname: Brenon
  - name: William
    roles:
    - support
    surname: Brubacher
  - name: Donovan
    roles:
    - support
    surname: Brunot
  - name: Roxanne
    roles:
    - support
    surname: Brusseau
  - name: Talitha
    roles:
    - support
    surname: Bueno Mottes
  - name: Zoé
    roles:
    - support
    surname: Cappe
  - name: Roman
    roles:
    - support
    surname: Castagné
  - name: Galo
    roles:
    - support
    surname: Castillo
  - name: Brigitte
    roles:
    - support
    surname: Chagué
  - name: Denis
    roles:
    - support
    surname: Chagué
  - name: Emeric
    roles:
    - support
    surname: Chagué
  - name: Léa
    roles:
    - support
    surname: Charette
  - name: Emmanuel
    roles:
    - support
    surname: Chateau
  - name: Jean-Baptiste
    roles:
    - support
    surname: Chaudron
  - name: Anna
    roles:
    - support
    surname: Chepaikina
  - name: Floriane
    roles:
    - support
    surname: Chiffoleau
  - name: Kelly
    roles:
    - support
    surname: Christensen
  - name: Federico
    roles:
    - support
    surname: Cuartas Aristizabal
  - name: Maria Laura
    roles:
    - support
    surname: Cucciniello
  - name: Aurore
    roles:
    - support
    surname: Cuéllar
  - name: Baudoin
    roles:
    - support
    surname: Davoury
  - name: Eric
    roles:
    - support
    surname: de la Clergerie
  - name: Roch
    roles:
    - support
    surname: Delanney
  - name: Camille
    roles:
    - support
    surname: Delattre
  - name: Béatrice
    roles:
    - support
    surname: Denis
  - name: Philippe
    roles:
    - support
    surname: Deschamps
  - name: Valentine
    roles:
    - support
    surname: Desmorat
  - name: Cindy
    roles:
    - support
    surname: Dionisio
  - name: Amélie
    roles:
    - support
    surname: Disant
  - name: Elsa
    roles:
    - support
    surname: Dufourg
  - name: Jean-Luc
    roles:
    - support
    surname: Falcone
  - name: Margaux
    roles:
    - support
    surname: Faure
  - name: Glenda
    roles:
    - support
    surname: Ferbeyre Rodriguez
  - name: Giulia
    roles:
    - support
    surname: Ferretti
  - name: Fabien
    roles:
    - support
    surname: Fizaine
  - name: Jeanne
    roles:
    - support
    surname: Flamant
  - name: Clémence
    roles:
    - support
    surname: Foisy-Marquis
  - name: Anna
    roles:
    - support
    surname: Fröhlich
  - name: Anne
    roles:
    - support
    surname: Garcia Fernancez
  - name: Vincent
    roles:
    - support
    surname: Giovannangeli
  - name: Gabrielle
    roles:
    - support
    surname: Grondin
  - name: Morgane
    roles:
    - support
    surname: Guichard
  - name: Jessica
    roles:
    - support
    surname: Guiraud
  - name: Anahi
    roles:
    - support
    surname: Haedo
  - name: Pauline
    roles:
    - support
    surname: Hennequart
  - name: Yanet
    roles:
    - support
    surname: Hernandez Pedroza
  - name: Lucence
    roles:
    - support
    surname: Ing
  - name: Pauline
    roles:
    - support
    surname: Jacsont
  - name: Juliette
    roles:
    - support
    surname: Janes
  - name: Corinne
    roles:
    - support
    surname: Jeanne
  - name: Arilys
    roles:
    - support
    surname: Jia
  - name: Vincent
    roles:
    - support
    surname: Jolivet
  - name: Katrina
    roles:
    - support
    surname: Kaustina
  - name: Ben
    roles:
    - support
    surname: Kiessling
  - name: Ozcar
    roles:
    - support
    surname: Koc
  - name: Lena
    roles:
    - support
    surname: Krause
  - name: Gabriel
    roles:
    - support
    surname: Labrie
  - name: Amélie
    roles:
    - support
    surname: Lapointe
  - name: David
    roles:
    - support
    surname: Lassner
  - name: Emmanuelle
    roles:
    - support
    surname: Lescouet
  - name: Danny
    roles:
    - support
    surname: Létourneau
  - name: Marie-Françoise
    roles:
    - support
    surname: Limon-Bonnet
  - name: Gabrielle
    roles:
    - support
    surname: Lodi
  - name: Victoria
    roles:
    - support
    surname: Lupascu
  - name: Elsa
    roles:
    - support
    surname: Marguin-Hamon
  - name: Orestis
    roles:
    - support
    surname: Marinamis
  - name: Gina
    roles:
    - support
    surname: Mars
  - name: Eugénie
    roles:
    - support
    surname: Matthey-Jonais
  - name: Dilson
    roles:
    - support
    surname: Mayunga
  - name: Margot
    roles:
    - support
    surname: Mellet
  - name: Matt
    roles:
    - support
    surname: Moskal
  - name: Shannon
    roles:
    - support
    surname: Moskal
  - name: Zoé
    roles:
    - support
    surname: Mozin
  - name: Lydia
    orcid: 0009-0009-7082-4711
    roles:
    - support
    surname: Nishimwe
  - name: Jade
    roles:
    - support
    surname: Norindr
  - name: Jules
    roles:
    - support
    surname: Nuguet
  - name: Sarah
    roles:
    - support
    surname: Orsini
  - name: Pedro
    roles:
    - support
    surname: Ortiz Suarez
  - name: Kenan
    roles:
    - support
    surname: Oudin
  - name: Gabrielle
    roles:
    - support
    surname: Pannetier-Leboeuf
  - name: Thierry
    roles:
    - support
    surname: Paquet
  - name: Thomas
    roles:
    - support
    surname: Parisot
  - name: Elodie
    roles:
    - support
    surname: Paupe
  - name: Gaël
    roles:
    - support
    surname: Poux
  - name: Montaine
    roles:
    - support
    surname: Prophête
  - name: Alix
    roles:
    - support
    surname: Raoux
  - name: Gaëtan
    roles:
    - support
    surname: Raoux
  - name: Elise
    roles:
    - support
    surname: Razafindrakoto
  - name: Camille
    roles:
    - support
    surname: Rey
  - name: Arij
    roles:
    - support
    surname: Riabi
  - name: Karen
    roles:
    - support
    surname: Ross
  - name: Manon
    roles:
    - support
    surname: Rouillé
  - name: Louise
    roles:
    - support
    surname: Ruby
  - name: Benoît
    roles:
    - support
    surname: Sagot
  - name: Hugo
    roles:
    - support
    surname: Scheithauer
  - name: Anne-Valérie
    roles:
    - support
    surname: Schweyer
  - name: Djamé
    roles:
    - support
    surname: Seddah
  - name: Paula
    roles:
    - support
    surname: Seidel
  - name: Peter
    roles:
    - support
    surname: Stokes
  - name: Yves
    roles:
    - support
    surname: Tadjo
  - name: Lionel
    roles:
    - support
    surname: Tadjou
  - name: Kristin
    roles:
    - support
    surname: Tanton
  - name: Marie
    roles:
    - support
    surname: Tariol
  - name: Rian
    roles:
    - support
    surname: Touchent
  - name: Anne-Kim
    roles:
    - support
    surname: Tremblay
  - name: Pierre
    roles:
    - support
    surname: Vauterin
  - name: Mathilde
    roles:
    - support
    surname: Verstraete
  - name: Magalie
    roles:
    - support
    surname: Vetter
  - name: Marcello
    roles:
    - support
    surname: Vitali Rosati
  - name: Malamatenia
    roles:
    - support
    surname: Vlachou-Estathiou
  - name: Rosanne
    roles:
    - support
    surname: Wingert
  - name: Débora
    roles:
    - support
    surname: Yi
  - name: Antoine"
    roles:
    - support
    surname: ''
  - name: Camille
    roles:
    - support
    surname: ''
  - name: Manon
    roles:
    - support
    surname: ''
  - name: Yohan
    roles:
    - support
    surname: ''
  characters:
    members:
    - e
    - a
    - n
    - i
    - s
    - t
    - r
    - l
    - o
    - u
    - d
    - c
    - m
    - p
    - ́
    - ','
    - g
    - h
    - v
    - f
    - .
    - b
    - ̀
    - "'"
    - q
    - '1'
    - L
    - y
    - '0'
    - C
    - '9'
    - E
    - S
    - '2'
    - '-'
    - A
    - (
    - )
    - I
    - x
    - k
    - M
    - P
    - R
    - j
    - B
    - '8'
    - T
    - N
    - D
    - ̂
    - '6'
    - '4'
    - O
    - G
    - '3'
    - '5'
    - '7'
    - F
    - H
    - U
    - w
    - V
    - '='
    - z
    - ̧
    - J
    - ':'
    - ̈
    - W
    - K
    - '>'
    - <
    - '"'
    - «
    - »
    - Y
    - X
    - '['
    - ']'
    - ^
    - /
    - ſ
    - ̄
    - ;
    - Q
    - Z
    - œ
    - ̌
    - '!'
    - ’
    - ø
    - ̃
    - '%'
    - '&'
    - –
    - ɛ
    - ̊
    - °
    - ß
    - ɹ
    - —
    - Æ
    - ²
    - ̆
    - ᑕ
    - '#'
    - ə
    - €
    - …
    - ł
    -  
    - ɑ
    - ɔ
    - ʁ
    mode: NFD
  description: "The CREMMA-WIKIPEDIA project aims at creating a collection of ground\
    \ truth to train HTR models on contemporary French handwriting.\n\nEach image\
    \ represents an exerpt from a randomly selected Wikipedia page, copied by hand\
    \ by volunteers. We then took care of the alignment between the handwritten portion\
    \ and the original text, also present on the image."
  format: Alto-XML
  hands:
    count: 1-per-file
    precision: estimated
  institutions:
  - name: 6e-1 du Collège Martin-Luther-King de Charvieu-Chavagneux
    roles:
    - support
  language:
  - fra
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: CREMMA
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notAfter: '2023'
    notBefore: '2022'
  title: CREMMA WIKIPEDIA
  transcription-guidelines: "The transcription guidelines follow CREMMA's convention\
    \ (https://gist.github.com/alix-tz/6f89444521bf1cab0522da520f7e4ff4). In short:\
    \ superscript is preceded by a ^. Strikethrough elements are transcribed with\
    \ \"><\" when unreadable, \">word<\" when readeable.  The text to copy may have\
    \ included phonetic transcription. Non-french letters and diacritics were rendered\
    \ as well. "
  url: https://github.com/HTR-United/cremma-wikipedia
  volume:
  - count: 99680
    metric: characters
  - count: 350
    metric: files
  - count: 1971
    metric: lines
  - count: 351
    metric: regions
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Chagué, Alix and Clérice, Thibault\
    \ and Van Kote, Elsa and Carrow, Jennifer and Antoum, Wissam and Audin, Yann and\
    \ Baillot, Anne and Baron, Marlène and Bartz, Alexandre and Bawden, Rachel and\
    \ Beaudry-Lagarde, Alice and Bhagwatkar, Rishika and Boschetti, Federico and Bourgeois,\
    \ Camille and Brenon, Alice and Brubacher, William and Brunot, Donovan and Brusseau,\
    \ Roxanne and Bueno Mottes, Talitha and Cappe, Zoé and Castagné, Roman and Castillo,\
    \ Galo and Chagué, Brigitte and Chagué, Denis and Chagué, Emeric and Charette,\
    \ Léa and Chateau, Emmanuel and Chaudron, Jean-Baptiste and Chepaikina, Anna and\
    \ Chiffoleau, Floriane and Christensen, Kelly and Cuartas Aristizabal, Federico\
    \ and Cucciniello, Maria Laura and Cuéllar, Aurore and Davoury, Baudoin and de\
    \ la Clergerie, Eric and Delanney, Roch and Delattre, Camille and Denis, Béatrice\
    \ and Deschamps, Philippe and Desmorat, Valentine and Dionisio, Cindy and Disant,\
    \ Amélie and Dufourg, Elsa and Falcone, Jean-Luc and Faure, Margaux and Ferbeyre\
    \ Rodriguez, Glenda and Ferretti, Giulia and Fizaine, Fabien and Flamant, Jeanne\
    \ and Foisy-Marquis, Clémence and Fröhlich, Anna and Garcia Fernancez, Anne and\
    \ Giovannangeli, Vincent and Grondin, Gabrielle and Guichard, Morgane and Guiraud,\
    \ Jessica and Haedo, Anahi and Hennequart, Pauline and Hernandez Pedroza, Yanet\
    \ and Ing, Lucence and Jacsont, Pauline and Janes, Juliette and Jeanne, Corinne\
    \ and Jia, Arilys and Jolivet, Vincent and Kaustina, Katrina and Kiessling, Ben\
    \ and Koc, Ozcar and Krause, Lena and Labrie, Gabriel and Lapointe, Amélie and\
    \ Lassner, David and Lescouet, Emmanuelle and Létourneau, Danny and Limon-Bonnet,\
    \ Marie-Françoise and Lodi, Gabrielle and Lupascu, Victoria and Marguin-Hamon,\
    \ Elsa and Marinamis, Orestis and Mars, Gina and Matthey-Jonais, Eugénie and Mayunga,\
    \ Dilson and Mellet, Margot and Moskal, Matt and Moskal, Shannon and Mozin, Zoé\
    \ and Nishimwe, Lydia and Norindr, Jade and Nuguet, Jules and Orsini, Sarah and\
    \ Ortiz Suarez, Pedro and Oudin, Kenan and Pannetier-Leboeuf, Gabrielle and Paquet,\
    \ Thierry and Parisot, Thomas and Paupe, Elodie and Poux, Gaël and Prophête, Montaine\
    \ and Raoux, Alix and Raoux, Gaëtan and Razafindrakoto, Elise and Rey, Camille\
    \ and Riabi, Arij and Ross, Karen and Rouillé, Manon and Ruby, Louise and Sagot,\
    \ Benoît and Scheithauer, Hugo and Schweyer, Anne-Valérie and Seddah, Djamé and\
    \ Seidel, Paula and Stokes, Peter and Tadjo, Yves and Tadjou, Lionel and Tanton,\
    \ Kristin and Tariol, Marie and Touchent, Rian and Tremblay, Anne-Kim and Vauterin,\
    \ Pierre and Verstraete, Mathilde and Vetter, Magalie and Vitali Rosati, Marcello\
    \ and Vlachou-Estathiou, Malamatenia and Wingert, Rosanne and Yi, Débora and other\
    \ anonymous contributers},\ndoi = {10.5281/zenodo.7782065},\nmonth = {3},\ntitle\
    \ = {CREMMA WIKIPEDIA},\nurl = {https://github.com/HTR-United/cremma-wikipedia},\n\
    year = {2023}\n}\n"
  _apa: "Chagué A., Clérice T., Van Kote E., Carrow J., Antoum W., Audin Y., Baillot\
    \ A., Baron M., Bartz A., Bawden R., Beaudry-Lagarde A., Bhagwatkar R., Boschetti\
    \ F., Bourgeois C., Brenon A., Brubacher W., Brunot D., Brusseau R., Bueno Mottes\
    \ T., Cappe Z., Castagné R., Castillo G., Chagué B., Chagué D., Chagué E., Charette\
    \ L., Chateau E., Chaudron J., Chepaikina A., Chiffoleau F., Christensen K., Cuartas\
    \ Aristizabal F., Cucciniello M.L., Cuéllar A., Davoury B., de la Clergerie E.,\
    \ Delanney R., Delattre C., Denis B., Deschamps P., Desmorat V., Dionisio C.,\
    \ Disant A., Dufourg E., Falcone J., Faure M., Ferbeyre Rodriguez G., Ferretti\
    \ G., Fizaine F., Flamant J., Foisy-Marquis C., Fröhlich A., Garcia Fernancez\
    \ A., Giovannangeli V., Grondin G., Guichard M., Guiraud J., Haedo A., Hennequart\
    \ P., Hernandez Pedroza Y., Ing L., Jacsont P., Janes J., Jeanne C., Jia A., Jolivet\
    \ V., Kaustina K., Kiessling B., Koc O., Krause L., Labrie G., Lapointe A., Lassner\
    \ D., Lescouet E., Létourneau D., Limon-Bonnet M., Lodi G., Lupascu V., Marguin-Hamon\
    \ E., Marinamis O., Mars G., Matthey-Jonais E., Mayunga D., Mellet M., Moskal\
    \ M., Moskal S., Mozin Z., Nishimwe L., Norindr J., Nuguet J., Orsini S., Ortiz\
    \ Suarez P., Oudin K., Pannetier-Leboeuf G., Paquet T., Parisot T., Paupe E.,\
    \ Poux G., Prophête M., Raoux A., Raoux G., Razafindrakoto E., Rey C., Riabi A.,\
    \ Ross K., Rouillé M., Ruby L., Sagot B., Scheithauer H., Schweyer A., Seddah\
    \ D., Seidel P., Stokes P., Tadjo Y., Tadjou L., Tanton K., Tariol M., Touchent\
    \ R., Tremblay A., Vauterin P., Verstraete M., Vetter M., Vitali Rosati M., Vlachou-Estathiou\
    \ M., Wingert R., Yi D., other anonymous contributers (2023). CREMMA WIKIPEDIA\
    \ (version 1.0.3). DOI: 10.5281/zenodo.7782065 URL: https://github.com/HTR-United/cremma-wikipedia\n"
- authors:
  - name: Chiffoleau
    roles:
    - project-manager
    - aligner
    surname: Floriane
  characters:
    members:
    - e
    - s
    - a
    - n
    - r
    - i
    - t
    - u
    - o
    - l
    - d
    - c
    - m
    - p
    - ́
    - ','
    - v
    - .
    - f
    - q
    - g
    - ̀
    - '-'
    - E
    - b
    - ’
    - "'"
    - h
    - A
    - L
    - N
    - x
    - j
    - S
    - R
    - I
    - T
    - M
    - ̂
    - C
    - P
    - y
    - O
    - ;
    - '1'
    - £
    - U
    - D
    - B
    - F
    - J
    - G
    - '"'
    - '0'
    - z
    - V
    - '9'
    - '2'
    - ':'
    - X
    -  
    - €
    - H
    - '5'
    - '!'
    - '3'
    - '4'
    - ̧
    - °
    - W
    - Y
    - '6'
    - '8'
    - '?'
    - '7'
    - K
    - Q
    - /
    - (
    - )
    - k
    - œ
    - w
    - ̈
    - …
    - Z
    - –
    - '&'
    - '%'
    - '='
    - $
    - _
    mode: NFD
  description: OCR ground Truth dataset based on French 20th typewritten letters
  format: Alto-XML
  hands:
    count: less-than-11
    precision: exact
  language:
  - fra
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: DAHN
  project-website: https://digitalintellectuals.hypotheses.org/category/dahn
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-typed
  time:
    notAfter: '1924'
    notBefore: '1914'
  title: DAHN Corpus
  url: https://github.com/HTR-United/dahncorpus
  volume:
  - count: 475849
    metric: characters
  - count: 547
    metric: files
  - count: 12539
    metric: lines
  - count: 527
    metric: pages
  - count: 547
    metric: regions
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Chiffoleau, Floriane},\ndoi = {10.5281/zenodo.5911868},\n\
    month = {3},\ntitle = {dahncorpus},\nurl = {https://github.com/HTR-United/dahncorpus},\n\
    year = {2021}\n}\n"
  _apa: "Chiffoleau F. (2021). dahncorpus (version 1.0.0). DOI: 10.5281/zenodo.5911868\
    \ URL: https://github.com/HTR-United/dahncorpus\n"
- authors:
  - name: Limon-Bonnet
    roles:
    - transcriber
    - aligner
    - quality-control
    surname: Françoise
  - name: Chagué
    roles:
    - support
    - project-manager
    - quality-control
    surname: Alix
  - name: Rostaing
    roles:
    - project-manager
    surname: Aurélia
  characters:
    members:
    - e
    - t
    - a
    - /
    - '0'
    - c
    - n
    - r
    - m
    - h
    - p
    - s
    - o
    - g
    - '5'
    - '7'
    - '1'
    - E
    - .
    - i
    - '-'
    - '3'
    - '9'
    - '2'
    - f
    - d
    - '8'
    - <
    - l
    - '{'
    - ':'
    - P
    - A
    - G
    - '}'
    - U
    - x
    - '>'
    - b
    - '4'
    - '6'
    mode: NFD
  citation-file-link: https://raw.githubusercontent.com/HTR-United/lectaurep-bronod/master/CITATION.cff
  description: "Ground truth for Maître Bronod’s registers, notary in Paris during\
    \ the 18th century.\n"
  format: Page-XML
  hands:
    count: '1'
    precision: exact
  language:
  - fra
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: "LECTAUREP\n"
  project-website: https://lectaurep.hypotheses.org/
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-manuscript
  sources:
  - link: ''
    reference: Limon-Bonnet, M. (2021). Lectaurep-Bronod, ground truth for Maitre
      Bronod\u0027s documents (French XVIIIth century) (Version 1.0) [Computer software].
      https://doi.org/10.5072/zenodo.977735
  time:
    notAfter: '1745'
    notBefore: '1742'
  title: Notaires de Paris - Bronod
  transcription-guidelines: "Transcription fidèle aux manuscrits : la casse et les\
    \ abréviations sont respectées. Les portions de texte suscrites sont précédées\
    \ d'un symbole `^`. Pas de traitement particulier des éventuels s longs.'\n"
  url: https://github.com/HTR-United/lectaurep-bronod
  volume:
  - count: 359094
    metric: characters
  - count: 100
    metric: files
  - count: 3702
    metric: lines
  - count: 200
    metric: pages
  - count: 296
    metric: regions
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Limon-Bonnet, Marie-Françoise and\
    \ Chagué, Alix and Rostaing, Aurélia},\ndoi = {10.5281/zenodo.10631355},\nmonth\
    \ = {2},\ntitle = {Lectaurep-Bronod, ground truth for Maitre Bronod's documents\
    \ (French XVIIIth century)},\nurl = {https://lectaurep.hypotheses.org/},\nyear\
    \ = {2024}\n}\n"
  _apa: "Limon-Bonnet M., Chagué A., Rostaing A. (2024). Lectaurep-Bronod, ground\
    \ truth for Maitre Bronod's documents (French XVIIIth century) DOI: 10.5281/zenodo.10631355\
    \ URL: https://lectaurep.hypotheses.org/\n"
- authors:
  - name: Denis
    roles:
    - transcriber
    - aligner
    surname: Nathalie
  - name: Rostaing
    roles:
    - project-manager
    - quality-control
    - support
    surname: Aurélia
  - name: Chagué
    roles:
    - project-manager
    - quality-control
    - support
    surname: Alix
  characters:
    members:
    - e
    - t
    - /
    - a
    - c
    - '0'
    - n
    - r
    - m
    - h
    - p
    - s
    - o
    - g
    - '1'
    - '7'
    - '2'
    - E
    - .
    - i
    - '-'
    - f
    - '9'
    - d
    - '8'
    - '5'
    - <
    - l
    - '{'
    - ':'
    - P
    - A
    - G
    - '}'
    - U
    - x
    - '>'
    - b
    - '4'
    - '6'
    - '3'
    mode: NFD
  citation-file-link: https://raw.githubusercontent.com/HTR-United/lectaurep-mariages-et-divorces/main/CITATION.cff
  description: "Ground truth for the Registres des Contrats de Mariages et des Séparations\
    \ et Divorces in Paris. The documents are written in Franch during the 19th century,\
    \ contain many names and addresses. The information is organized in tables spreading\
    \ on two pages. The table’s headers and the preamble are printed.\n"
  format: Page-XML
  hands:
    count: more-than-10
    precision: estimated
  language:
  - fra
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: "LECTAUREP\n"
  project-website: https://lectaurep.hypotheses.org/
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: mainly-manuscript
  sources:
  - link: ''
    reference: 'Rostaing, A., Denis, N., & Chagué, A. (2021). Lectaurep-Mariages-et-Divorces:
      ground truth for the Enregistrements des Contrats de Mariages et des Séparations
      et Divorces in Paris (French 19th century)  (Version 1.0) [Computer software].
      https://doi.org/10.5072/zenodo.977697'
  time:
    notAfter: '1928'
    notBefore: '1829'
  title: Notaires de Paris - Mariages et Divorces
  transcription-guidelines: "The transcription respects what is written (abbreviations\
    \ are not developed, capitalization follows 19th century practices). Superscripted\
    \ portions of text are signaled by `^` and many signatures are transcription with\
    \ ¥. The lines containing printed text are associated with the type `printed`\
    \ and the signatures are associated with the type `signature`. Thus they can both\
    \ be removed from the dataset if necessary.'\n"
  url: https://github.com/HTR-United/lectaurep-mariages-et-divorces
  volume:
  - count: 1969488
    metric: characters
  - count: 104
    metric: files
  - count: 20304
    metric: lines
  - count: 105
    metric: pages
  - count: 324
    metric: regions
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Rostaing, Aurélia and Denis, Nathalie\
    \ and Chagué, Alix},\ndoi = {10.5281/zenodo.10632593},\nmonth = {2},\ntitle =\
    \ {Lectaurep-Mariages-et-Divorces: ground truth for the Enregistrements des Contrats\
    \ de Mariages et des Séparations et Divorces in Paris (French 19th century) },\n\
    url = {https://github.com/HTR-United/lectaurep-mariages-et-divorces},\nyear =\
    \ {2024}\n}\n"
  _apa: "Rostaing A., Denis N., Chagué A. (2024). Lectaurep-Mariages-et-Divorces:\
    \ ground truth for the Enregistrements des Contrats de Mariages et des Séparations\
    \ et Divorces in Paris (French 19th century)  (version 2.0). DOI: 10.5281/zenodo.10632593\
    \ URL: https://github.com/HTR-United/lectaurep-mariages-et-divorces\n"
- authors:
  - name: Durand
    roles:
    - transcriber
    - aligner
    surname: Marc
  - name: Rostaing
    roles:
    - transcriber
    - project-manager
    - quality-control
    surname: Aurélia
  - name: Chagué
    roles:
    - project-manager
    - quality-control
    - support
    surname: Alix
  characters:
    members:
    - e
    - r
    - a
    - i
    - n
    - t
    - o
    - u
    - s
    - d
    - l
    - c
    - p
    - '1'
    - m
    - S
    - ̀
    - ','
    - E
    - ́
    - '2'
    - P
    - .
    - M
    - '0'
    - A
    - C
    - '5'
    - '3'
    - h
    - T
    - v
    - g
    - D
    - '7'
    - )
    - (
    - R
    - N
    - f
    - I
    - b
    - L
    - '8'
    - '9'
    - ^
    - '4'
    - '6'
    - B
    - O
    - J
    - V
    - y
    - "'"
    - G
    - F
    - '-'
    - x
    - q
    - °
    - H
    - ̂
    - U
    - '"'
    - X
    - '&'
    - z
    - ;
    - ̧
    - ':'
    - j
    - +
    - Q
    - '|'
    - ̈
    - /
    - k
    - '='
    - '%'
    - W
    - K
    - Y
    - Z
    - w
    - '~'
    - ¥
    - ȼ
    - _
    - €
    - '`'
    - '['
    - ']'
    - œ
    - '?'
    - '*'
    - ̃
    - '>'
    - ½
    mode: NFD
  citation-file-link: https://github.com/HTR-United/lectaurep-repertoires/raw/main/CITATION.cff
  description: Ground truth for various Parisian registries of notary deeds written
    in French during the 19th century. The information is organized following pre-printed
    tables (with printed headers) and contain many names, addresses, numbers and abbreviations.
  format: Alto-XML
  hands:
    count: more-than-10
    precision: estimated
  language:
  - fra
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: LECTAUREP
  project-website: https://lectaurep.hypotheses.org/
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: mainly-manuscript
  time:
    notAfter: '1939'
    notBefore: '1830'
  title: Notaires de Paris - Répertoires
  url: https://github.com/HTR-United/lectaurep-repertoires
  volume:
  - count: 525786
    metric: characters
  - count: 218
    metric: files
  - count: 29410
    metric: lines
  - count: 1181
    metric: regions
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {LECTAUREP and Rostaing, Aurélia and\
    \ Durand, Marc and Chagué, Alix},\ndoi = {10.5072/zenodo.977691},\nmonth = {12},\n\
    title = {Notaires de Paris - Répertoires, ground truth for various Parisian registries\
    \ of notary deeds (French 19th and 20th centuries)},\nurl = {https://github.com/HTR-United/lectaurep-repertoires},\n\
    year = {2021}\n}\n"
  _apa: "LECTAUREP, Rostaing A., Durand M., Chagué A. (2021). Notaires de Paris -\
    \ Répertoires, ground truth for various Parisian registries of notary deeds (French\
    \ 19th and 20th centuries) (version 2.0.0). DOI: 10.5072/zenodo.977691 URL: https://github.com/HTR-United/lectaurep-repertoires\n"
- authors:
  - name: Chagué
    roles:
    - transcriber
    - project-manager
    surname: Alix
  characters:
    members:
    - e
    - a
    - s
    - n
    - t
    - r
    - i
    - u
    - o
    - l
    - d
    - c
    - m
    - p
    - ́
    - .
    - '~'
    - v
    - ','
    - "'"
    - '-'
    - f
    - g
    - h
    - q
    - b
    - ̀
    - _
    - E
    - L
    - A
    - I
    - C
    - x
    - S
    - M
    - j
    - T
    - ̂
    - R
    - N
    - '1'
    - O
    - P
    - y
    - '"'
    - U
    - J
    - D
    - '2'
    - ':'
    - )
    - (
    - B
    - '0'
    - '5'
    - '3'
    - '4'
    - z
    - '6'
    - F
    - H
    - Q
    - '!'
    - '9'
    - G
    - '7'
    - V
    - '8'
    - '?'
    - ⟦
    - ⟧
    - ̧
    - Y
    - ;
    - ’
    - °
    - k
    - X
    - ̈
    - +
    - '='
    - W
    - /
    - K
    - ^
    - w
    - Z
    - '%'
    - '*'
    mode: NFD
  citation-file-link: https://github.com/HTR-United/tapuscorpus/raw/main/citation.cff
  description: Ground truth based on a variety of French typewritten documents from
    the 20th century. Contains exerpts plays, poems, letters and administrative reports.
  format: Page-XML
  hands:
    count: 1-per-folder
    precision: exact
  language:
  - fra
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: "HTR-United\n"
  project-website: https://htr-united.github.io/
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-typed
  sources:
  - link: ''
    reference: Chagué, A. (2021). Tapuscorpus (Version 1.0) [Computer software]. https://doi.org/10.5072/zenodo.977649
  time:
    notAfter: '1999'
    notBefore: '1900'
  title: Tapus Corpus
  transcription-guidelines: See README in repository.
  url: https://github.com/HTR-United/tapuscorpus
  volume:
  - count: 131511
    metric: characters
  - count: 151
    metric: files
  - count: 4376
    metric: lines
  - count: 150
    metric: pages
  - count: 375
    metric: regions
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Chagué, Alix},\ndoi = {10.5072/zenodo.977649},\n\
    month = {12},\ntitle = {Tapuscorpus},\nurl = {https://github.com/HTR-United/tapuscorpus},\n\
    year = {2021}\n}\n"
  _apa: "Chagué A. (2021). Tapuscorpus (version 1.0). DOI: 10.5072/zenodo.977649 URL:\
    \ https://github.com/HTR-United/tapuscorpus\n"
- authors:
  - name: Chagué
    roles:
    - transcriber
    - aligner
    - support
    surname: Alix
  - name: Riondet
    roles:
    - support
    surname: Charles
  - name: Le Fourner
    roles:
    - transcriber
    surname: Victoria
  - name: Bey
    roles:
    - transcriber
    surname: Laura
  - name: Vanneau
    roles:
    - transcriber
    surname: Laurie
  - name: Skilbeck-Gaborit
    roles:
    - transcriber
    surname: Eden
  - name: Meissel
    roles:
    - transcriber
    surname: Nina
  - name: Genero
    roles:
    - aligner
    surname: Jean-Damien
  - name: Champougny
    roles:
    - transcriber
    surname: Kevin
  - name: Albert
    roles:
    - project-manager
    surname: Anaïs
  - name: Martini
    roles:
    - project-manager
    surname: Manuela
  characters:
    members:
    - e
    - n
    - a
    - i
    - t
    - r
    - u
    - s
    - l
    - o
    - d
    - p
    - c
    - m
    - ́
    - v
    - f
    - q
    - x
    - ̀
    - ','
    - g
    - h
    - "'"
    - ;
    - j
    - C
    - b
    - P
    - D
    - ’
    - y
    - B
    - .
    - L
    - M
    - ̂
    - z
    - J
    - A
    - G
    - E
    - '-'
    -  
    - S
    - V
    - '?'
    - T
    - Q
    - F
    - '='
    - R
    - '4'
    - '2'
    - ̧
    - k
    - —
    - '7'
    - W
    - O
    - N
    - '1'
    - '3'
    - '8'
    - '0'
    - '9'
    - ':'
    - –
    - '5'
    - ̈
    - Y
    - K
    - H
    - œ
    - I
    - )
    - (
    - U
    - Z
    - _
    - '@'
    - '!'
    - ‘
    - »
    - '&'
    - '6'
    - ─
    - /
    mode: NFD
  description: Ground-Truth for French 19th century pre-printed documents created
    by administrative services.
  format: Page-XML
  hands:
    count: less-than-11
    precision: estimated
  language:
  - fra
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: ANR TIME US
  project-website: https://timeus.hypotheses.org/
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: evenly-mixed
  time:
    notAfter: '1858'
    notBefore: '1858'
  title: TIMEUS Corpus
  url: https://github.com/HTR-United/timeuscorpus
  volume:
  - count: 401304
    metric: characters
  - count: 250
    metric: files
  - count: 7701
    metric: lines
  - count: 159
    metric: pages
  - count: 586
    metric: regions
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Chagué, Alix and Champougny, Kévin\
    \ and Meissel, Nina and Genero, Jean-Damien and Skilbeck-Gaborit, Eden and Vanneau,\
    \ Laurie and Bey, Laura and Le Fourner, Victoria and Albert, Anaïs and Riondet,\
    \ Charles and Martini, Manuela},\ndoi = {10.5281/zenodo.6230755},\ntitle = {Time\
    \ Us Corpus},\nurl = {https://github.com/HTR-United/timeuscorpus}\n}\n"
  _apa: "Chagué A., Champougny K., Meissel N., Genero J., Skilbeck-Gaborit E., Vanneau\
    \ L., Bey L., Le Fourner V., Albert A., Riondet C., Martini M. Time Us Corpus\
    \ DOI: 10.5281/zenodo.6230755 URL: https://github.com/HTR-United/timeuscorpus\n"
- authors:
  - name: Leroy
    orcid: 0000-0002-7843-5050
    roles:
    - transcriber
    surname: Noé
  - name: Pinche
    orcid: 0000-0001-7764-9690
    roles:
    - project-manager
    - quality-control
    surname: Ariane
  - name: Jean-Baptiste
    orcid: 0000-0001-7764-9690
    roles:
    - project-manager
    surname: Camps
  - name: Alix
    orcid: 0000-0002-0136-4434
    roles:
    - project-manager
    surname: Chagué
  - name: Thibault
    orcid: 0000-0003-1852-9204
    roles:
    - project-manager
    surname: Clérice
  automatically-aligned: false
  characters:
    members:
    - e
    - i
    - s
    - t
    - a
    - u
    - n
    - o
    - r
    - l
    - c
    - d
    - m
    - p
    - .
    - ̃
    - f
    - q
    - g
    - h
    - ⁊
    - b
    - z
    - ̾
    - y
    - x
    - E
    - S
    - I
    - Q
    - ͥ
    - C
    - ꝑ
    - ꝯ
    - L
    - D
    - ͣ
    - A
    - R
    - ꝰ
    - k
    - M
    - ','
    - T
    - P
    - N
    - ᷑
    - ':'
    - U
    - O
    - ͤ
    - K
    - ⟦
    - ⟧
    - ᷤ
    - F
    - G
    - B
    - ¶
    - ͦ
    - ^
    - w
    - H
    - 
    - ẜ
    - ł
    - ꝓ
    - '-'
    - ÷
    - '0'
    - '3'
    - '2'
    - '1'
    - ̵
    - '9'
    - ͭ
    - ͬ
    - '5'
    - '4'
    - ͫ
    - ꝙ
    - ꝭ
    - '7'
    - '6'
    - ħ
    - '&'
    - ⁜
    - ᷠ
    - /
    - ˣ
    - ͨ
    - Y
    - ꝵ
    - "'"
    - '8'
    - ͧ
    - W
    - đ
    - ᷝ
    - 
    - '*'
    - ́
    - X
    - ̧
    - ᵈ
    mode: NFD
  description: >-
    Ground truth of Old French and Middle French manuscripts. Manuscripts vary in
    themes,
    period, etc. Most manuscript have at most 10 columns transcribed.
  format: Alto-XML
  hands:
    count: 1-per-folder
    precision: estimated
  institutions: []
  language:
  - fro
  license:
    name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: HTRomance
  project-website: https://htromance-project.github.io
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notAfter: '1499'
    notBefore: '1200'
  title: HTRomance, Medieval French corpus of ground-truth for Handwritten Text Recognition
    and Layout Segmentation
  transcription-guidelines: >2-

    The transcription guidelines are described in a paper available on
    [HAL](https://hal-enc.archives-ouvertes.fr/hal-03828353) and published at the
    Journal for Open Humanities Data. It provides specific details about the
    selection process, the transcription methods and choices, as well as details
    about output (mainly the [Generic CREMMA Model for Medieval Manuscripts (Latin
    and Old French)](https://zenodo.org/record/7234166#.Y7f69afMJhE) for
    [Kraken](https://kraken.re))
  url: https://github.com/HTRomance-Project/medieval-french
  volume:
  - count: 300070
    metric: characters
  - count: 138
    metric: files
  - count: 10385
    metric: lines
  - count: 789
    metric: regions
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Leroy, Noé and Pinche, Ariane and\
    \ Camps, Jean-Baptiste and Clérice, Thibault and Chagué, Alix},\ntitle = {HTRomance,\
    \ Medieval French corpus of ground-truth for Handwritten Text Recognition and\
    \ Layout Segmentation},\nurl = {https://github.com/HTRomance-Project/middle-ages-in-spain}\n\
    }\n"
  _apa: "Leroy N., Pinche A., Camps J., Clérice T., Chagué A. HTRomance, Medieval\
    \ French corpus of ground-truth for Handwritten Text Recognition and Layout Segmentation\
    \ URL: https://github.com/HTRomance-Project/middle-ages-in-spain\n"
- authors:
  - name: Rachele
    roles:
    - transcriber
    surname: Alba
  - name: Giorgia
    roles:
    - transcriber
    surname: Rubin
  - name: Federico
    orcid: 0000-0002-7810-7735
    roles:
    - project-manager
    - quality-control
    surname: Boschetti
  - name: Franz
    roles:
    - project-manager
    surname: Fischer
  - name: Alix
    orcid: 0000-0002-0136-4434
    roles:
    - project-manager
    surname: Chagué
  - name: Thibault
    orcid: 0000-0003-1852-9204
    roles:
    - project-manager
    surname: Clérice
  automatically-aligned: false
  characters:
    members:
    - e
    - a
    - o
    - i
    - l
    - n
    - r
    - t
    - u
    - s
    - c
    - d
    - m
    - p
    - g
    - h
    - f
    - .
    - ̃
    - q
    - b
    - ⁊
    - ','
    - ꝑ
    - E
    - C
    - z
    - x
    - ̾
    - A
    - I
    - ̧
    - D
    - L
    - M
    - ͤ
    - O
    - S
    - R
    - ͧ
    - y
    - ꝙ
    - ͬ
    - ł
    - F
    - N
    - U
    - T
    - Q
    - ͦ
    - P
    - B
    - ́
    - ͥ
    - '='
    - ':'
    - ꝯ
    - X
    - ẜ
    - G
    - ͣ
    - H
    - '2'
    - '9'
    - '1'
    - ¶
    - '4'
    - ꝓ
    - '3'
    - '5'
    - k
    - ͭ
    - '7'
    - '8'
    - /
    - "'"
    - ε
    - ɨ
    - đ
    - '6'
    - ι
    - ο
    - '0'
    - ̓
    - ν
    - ꝗ
    - ̈
    - μ
    - λ
    - ꝰ
    - α
    - ω
    - π
    - σ
    - ͫ
    - Y
    - '-'
    - θ
    - γ
    - η
    - Ο
    - υ
    - ρ
    - ̔
    - ͂
    - β
    - +
    - Z
    mode: NFD
  description: Transcription of samples of Medieval Italian manuscripts
  format: Alto-XML
  hands:
    count: 1-per-folder
    precision: estimated
  language:
  - ita
  - vec
  license:
    name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: HTRomance
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notAfter: '1499'
    notBefore: '1100'
  title: HTRomance, Medieval Italian corpus of ground-truth for Handwritten Text Recognition
    and Layout Segmentation
  url: https://github.com/HTRomance-Project/medieval-italian
  volume:
  - count: 84366
    metric: characters
  - count: 60
    metric: files
  - count: 3086
    metric: lines
  - count: 60
    metric: pages
  - count: 353
    metric: regions
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Alba, Rachele and Rubin, Giorgia and\
    \ Boschetti, Federico and Fischer, Franz and Clérice, Thibault and Chagué, Alix},\n\
    doi = {10.5281/zenodo.8256728},\ntitle = {HTRomance, Medieval Italian corpus of\
    \ ground-truth for Handwritten Text Recognition and Layout Segmentation},\nurl\
    \ = {https://github.com/HTRomance-Project/medieval-italian}\n}\n"
  _apa: "Alba R., Rubin G., Boschetti F., Fischer F., Clérice T., Chagué A. HTRomance,\
    \ Medieval Italian corpus of ground-truth for Handwritten Text Recognition and\
    \ Layout Segmentation DOI: 10.5281/zenodo.8256728 URL: https://github.com/HTRomance-Project/medieval-italian\n"
- authors:
  - name: Anthony
    orcid: 0000-0003-4715-5184
    roles:
    - transcriber
    surname: Glaise
  - name: Thibault
    orcid: 0000-0003-1852-9204
    roles:
    - project-manager
    - quality-control
    surname: Clérice
  - name: Alix
    orcid: 0000-0002-0136-4434
    roles:
    - project-manager
    surname: Chagué
  - name: Federico
    orcid: 0000-0002-7810-7735
    roles:
    - project-manager
    surname: Boschetti
  - name: Franz
    orcid: 0000-0002-2162-5531
    roles:
    - project-manager
    surname: Fischer
  automatically-aligned: false
  characters:
    members:
    - i
    - e
    - t
    - a
    - u
    - s
    - n
    - o
    - r
    - c
    - m
    - ̃
    - d
    - l
    - .
    - p
    - b
    - q
    - g
    - f
    - 
    - h
    - x
    - ̾
    - ꝰ
    - ꝑ
    - ':'
    - '&'
    - ͥ
    - ł
    - ̧
    - I
    - ⁊
    - ͣ
    - E
    - ꝵ
    - C
    - S
    - A
    - N
    - ᷑
    - D
    - Q
    - U
    - ͦ
    - ','
    - ꝓ
    - ꝙ
    - ¶
    - T
    - y
    - ꝯ
    - P
    - ꝗ
    - M
    - ħ
    - R
    - đ
    - H
    - O
    - /
    - '-'
    - F
    - L
    - ÷
    - ͬ
    - ⟦
    - ⟧
    - "'"
    - z
    - ᷤ
    - B
    - G
    - '4'
    - '1'
    - '3'
    - ^
    - ͫ
    - '2'
    - ͭ
    - ẜ
    - X
    - ͨ
    - ͤ
    - '0'
    - '6'
    - k
    - '7'
    - '9'
    - ƀ
    - ᷝ
    - '8'
    - ͧ
    - K
    - Ø
    - Ꝙ
    - '*'
    - '5'
    - ᵈ
    - ͯ
    - ℥
    - ¬
    - ᷠ
    - Y
    - ̵
    - Z
    - ꝭ
    - +
    - ι
    - Ι
    - Μ
    - ̈
    - Ε
    - υ
    - χ
    - α
    - ρ
    - σ
    - τ
    - κ
    - ο
    - ́
    - ν
    - Τ
    - ❧
    - ⁿ
    - ᶻ
    mode: NFD
  description: >-
    Ground truth of Latin medieval manuscripts. Manuscripts vary in themes,
    period, etc. Most manuscript have at most 10 columns transcribed.
  format: Alto-XML
  hands:
    count: 1-per-folder
    precision: estimated
  institutions: []
  language:
  - lat
  license:
    name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: HTRomance
  project-website: https://htromance-project.github.io
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notAfter: '1499'
    notBefore: '1100'
  title: HTRomance, Medieval Latin corpus of ground-truth for Handwritten Text Recognition
    and Layout Segmentation
  transcription-guidelines: >2-

    The transcription guidelines are described in a paper available on
    [HAL](https://hal-enc.archives-ouvertes.fr/hal-03828353) and published at the
    Journal for Open Humanities Data. It provides specific details about the
    selection process, the transcription methods and choices, as well as details
    about output (mainly the [Generic CREMMA Model for Medieval Manuscripts (Latin
    and Old French)](https://zenodo.org/record/7234166#.Y7f69afMJhE) for
    [Kraken](https://kraken.re))
  url: https://github.com/HTRomance-Project/medieval-latin
  volume:
  - count: 299062
    metric: characters
  - count: 142
    metric: files
  - count: 8879
    metric: lines
  - count: 749
    metric: regions
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Glaise, Anthony and Clérice, Thibault\
    \ and Boschetti, Federico and Fischer, Franz and Chagué, Alix},\ntitle = {HTRomance,\
    \ Medieval Latin corpus of ground-truth for Handwritten Text Recognition and Layout\
    \ Segmentation},\nurl = {https://github.com/HTRomance-Project/medieval-latin}\n\
    }\n"
  _apa: "Glaise A., Clérice T., Boschetti F., Fischer F., Chagué A. HTRomance, Medieval\
    \ Latin corpus of ground-truth for Handwritten Text Recognition and Layout Segmentation\
    \ URL: https://github.com/HTRomance-Project/medieval-latin\n"
- authors:
  - name: Julie
    orcid: 0009-0004-0769-8875
    roles:
    - transcriber
    surname: Bordier
  - name: Matthias
    orcid: 0000-0001-9488-5986
    roles:
    - project-manager
    - quality-control
    surname: Gille Levenson
  - name: Olivier
    orcid: 0000-0001-7809-3890
    roles:
    - project-manager
    - quality-control
    surname: Brisville-Fertin
  - name: Alix
    orcid: 0000-0002-0136-4434
    roles:
    - project-manager
    surname: Chagué
  - name: Thibault
    orcid: 0000-0003-1852-9204
    roles:
    - project-manager
    surname: Clérice
  automatically-aligned: false
  characters:
    members:
    - e
    - a
    - o
    - s
    - n
    - r
    - l
    - i
    - d
    - u
    - t
    - c
    - ̃
    - m
    - p
    - q
    - g
    - f
    - b
    - .
    - y
    - /
    - h
    - ⁊
    - ̧
    - z
    - E
    - x
    - R
    - C
    - ¶
    - ꝑ
    - ','
    - ͣ
    - ͥ
    - ̾
    - D
    - M
    - ͦ
    - ':'
    - '-'
    - ᷤ
    - S
    - ẜ
    - ́
    - A
    - L
    - P
    - ꝯ
    - Q
    - ͬ
    - I
    - B
    - ⟦
    - ⟧
    - O
    - N
    - ̇
    - T
    - ꝓ
    - ̈
    - F
    - U
    - ͤ
    - '1'
    - G
    - X
    - '2'
    - '0'
    - '3'
    - H
    - Y
    - ᷎
    - ℥
    - '4'
    - '6'
    - '8'
    - Ꞧ
    - '7'
    - '5'
    - ͫ
    - '9'
    - ꝰ
    - ħ
    - ͭ
    - ꝫ
    - ᷑
    - ᷝ
    - ͧ
    - ꝟ
    - ⁿ
    - †
    - K
    - ꝵ
    - ꝙ
    - ᷠ
    - Z
    - ł
    - Ꝯ
    - 
    - ͪ
    - ͩ
    - k
    mode: NFD
  description: >-
    Ground truth of medieval manuscripts from Spain. Manuscripts vary in themes,
    period, etc. Most manuscript have at most 10 columns transcribed.
  format: Alto-XML
  hands:
    count: 1-per-folder
    precision: estimated
  institutions: []
  language:
  - lat
  license:
    name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: HTRomance
  project-website: https://htromance-project.github.io
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notAfter: '1499'
    notBefore: '1100'
  title: HTRomance, Medieval Spain corpus of ground-truth for Handwritten Text Recognition
    and Layout Segmentation
  transcription-guidelines: >2-

    The transcription guidelines are described in a paper available on
    [HAL](https://hal-enc.archives-ouvertes.fr/hal-03828353) and published at the
    Journal for Open Humanities Data. It provides specific details about the
    selection process, the transcription methods and choices, as well as details
    about output (mainly the [Generic CREMMA Model for Medieval Manuscripts (Latin
    and Old French)](https://zenodo.org/record/7234166#.Y7f69afMJhE) for
    [Kraken](https://kraken.re))
  url: https://github.com/HTRomance-Project/middle-ages-in-spain
  volume:
  - count: 160876
    metric: characters
  - count: 86
    metric: files
  - count: 4437
    metric: lines
  - count: 395
    metric: regions
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Bordier, Julie and Gille Levenson,\
    \ Matthias and Brisville-Fertin, Olivier and Clérice, Thibault and Chagué, Alix},\n\
    title = {HTRomance, Medieval Spain corpus of ground-truth for Handwritten Text\
    \ Recognition and Layout Segmentation},\nurl = {https://github.com/HTRomance-Project/middle-ages-in-spain}\n\
    }\n"
  _apa: "Bordier J., Gille Levenson M., Brisville-Fertin O., Clérice T., Chagué A.\
    \ HTRomance, Medieval Spain corpus of ground-truth for Handwritten Text Recognition\
    \ and Layout Segmentation URL: https://github.com/HTRomance-Project/middle-ages-in-spain\n"
- authors:
  - name: Jade
    roles:
    - transcriber
    surname: Norindr
  - name: Anna
    roles:
    - transcriber
    surname: Mikhalchuk
  - name: Alix
    orcid: 0000-0002-0136-4434
    roles:
    - project-manager
    - quality-control
    - support
    surname: Chagué
  - name: Thibault
    orcid: 0000-0003-1852-9204
    roles:
    - project-manager
    surname: Clérice
  automatically-aligned: false
  characters:
    members:
    - e
    - s
    - r
    - a
    - n
    - t
    - u
    - i
    - o
    - l
    - d
    - c
    - p
    - m
    - v
    - q
    - .
    - "'"
    - ́
    - f
    - ','
    - b
    - g
    - y
    - h
    - j
    - M
    - L
    - C
    - I
    - x
    - '1'
    - z
    - E
    - V
    - ^
    - S
    - '-'
    - R
    - ̀
    - A
    - J
    - ̃
    - P
    - '3'
    - '2'
    - D
    - '5'
    - '4'
    - '7'
    - '6'
    - '0'
    - B
    - '8'
    - ̂
    - '9'
    - N
    - T
    - X
    - '>'
    - <
    - ̈
    - G
    - (
    - '='
    - ;
    - )
    - ⁊
    - '"'
    - U
    - F
    - ':'
    - O
    - H
    - '&'
    - ̧
    - /
    - '['
    - W
    - ']'
    - Q
    - k
    - '?'
    - ⎀
    - w
    - ¬
    - ̾
    - '*'
    - ̶
    - ꝑ
    - ͨ
    - Z
    - K
    - Y
    - ͫ
    - '!'
    - +
    mode: NFD
  description: >-
    Dataset for modern roman languages created within the context of the HTRomance
    project, using manuscripts from the Gallica digital library.
  format: Alto-XML
  hands:
    count: 1-per-folder
    precision: estimated
  institutions: []
  language:
  - fra
  license:
    name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: HTRomance
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notAfter: '1800'
    notBefore: '1600'
  title: Corpus Modern Roman Languages
  transcription-guidelines: >-
    The transcription guidelines are described in a paper available on HAL and
    published at the Journal for Open Humanities Data. It provides specific
    details about the selection process, the transcription methods and choices, as
    well as details about output (mainly the Generic CREMMA Model for Medieval
    Manuscripts (Latin and Old French) for Kraken)
  url: https://github.com/HTRomance-Project/modern-roman-languages
  volume:
  - count: 114094
    metric: characters
  - count: 168
    metric: files
  - count: 3386
    metric: lines
  - count: 441
    metric: regions
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Norindr, Jade and Mikhalchuk, Anna\
    \ and Clérice, Thibault and Chagué, Alix},\ntitle = {HTRomance, Modern language\
    \ corpus of ground-truth for Handwritten Text Recognition and Layout Segmentation},\n\
    url = {https://github.com/HTRomance-Project/modern-roman-languages}\n}\n"
  _apa: "Norindr J., Mikhalchuk A., Clérice T., Chagué A. HTRomance, Modern language\
    \ corpus of ground-truth for Handwritten Text Recognition and Layout Segmentation\
    \ URL: https://github.com/HTRomance-Project/modern-roman-languages\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: gt_structure_text
  url: https://github.com/OCR-D/gt_structure_text
  authors:
  - name: Matthias
    surname: Boenig
    orcid: 0000-0003-4615-4753
    roles:
    - transcriber
    - aligner
    - project-manager
    - quality-control
    - digitization
    - support
  institutions: []
  description: >-
    The OCR-D Ground Truth text and structure corpus was created between
    2015-2017. In the years since 2017, this corpus has been further curated and
    supplemented with metadata where appropriate. The corpus includes page XML
    files within annotations of the text and structure include. The data is based
    on transcription data stored in the German Text Archive (DTA)
    (https://www.deutschestextarchiv.de/).
  project-name: OCR-D
  project-website: https://ocr-d.de/
  language:
  - eng
  - fra
  - deu
  - heb
  - lat
  production-software: Aletheia
  automatically-aligned: false
  script:
  - iso: Latn
  - iso: Goth
  script-type: only-typed
  time:
    notAfter: '1900'
    notBefore: '1500'
  hands:
    count: less-than-11
    precision: exact
  license:
    name: CC-BY-SA 4.0
    url: https://creativecommons.org/licenses/by-sa/4.0/
  format: Page-XML
  volume:
  - count: 640976
    metric: characters
  - count: 217
    metric: files
  - count: 6608
    metric: lines
  - count: 1647
    metric: regions
  citation-file-link: https://raw.githubusercontent.com/OCR-D/gt_structure_text/main/CITATION.cff
  transcription-guidelines: OCR-D Ground Truth Guidelines https://ocr-d.de/en/gt-guidelines/trans/
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Boenig, Matthias},\nmonth = {7},\n\
    title = {gt_structure_text},\nurl = {https://github.com/OCR-D/gt_structure_text},\n\
    year = {2024}\n}\n"
  _apa: "Boenig M. (2024). gt_structure_text (version 68_v1.5.0). URL: https://github.com/OCR-D/gt_structure_text\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: De la généalogie des dieux
  url: https://github.com/PSL-Chartes-HTR-Students/HN2021-Boccace
  project-name: ENC - Bonnes pratiques du developpement collaboratif
  authors:
  - name: Vlachou Efstathiou
    surname: Malamatenia
    roles:
    - transcriber
    - project-manager
  - name: Leroy
    surname: Noé
    roles:
    - transcriber
    - project-manager
  - name: Maulu
    surname: Marco
    roles:
    - project-manager
    - quality-control
  description: "This repository hosts all the documents, including transcriptions,\
    \ bibliographical  references and introduction that serve the team Boccace for\
    \ the validation of the course \"Bonnes pratiques du developpement collaboratif\
    \ : initiation à Git\"  (prof. Thibault Clérice), of the first semester - Master\
    \ Humanités Numériques ENC-PSL 2021-2022.  At the same time it and constitutes\
    \ part of the biannual project \"Per un’edizione digitale della Genealogia deorum\
    \ gentilium\"  di Boccaccio\" (dir. F. Duval, M. Maulu). Financed in 2021, this\
    \ project foresees to put on line in XML format the unpublished  translation in\
    \ Middle French entitled \"De la genealogie des dieux\".\n"
  language:
  - frm
  - lat
  script:
  - iso: Latn
  script-type: only-typed
  time:
    notBefore: '1472'
    notAfter: '1498'
  hands:
    count: 1-per-folder
    precision: exact
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  volume:
  - metric: characters
    count: 109409
  - metric: files
    count: 47
  - metric: lines
    count: 3656
  - metric: pages
    count: 52
  - metric: regions
    count: 292
  sources:
  - reference: Laurent Premierfait, Boccace (1498), "De la genealogie des dieux",
      Paris, A. Vérard.
    link: 'https://gallica.bnf.fr/ark:/12148/bpt6k105063r?rk=21459;2 '
  citation-file-link: https://raw.githubusercontent.com/PSL-Chartes-HTR-Students/HN2021-Boccace/main/CITATION.cff
  transcription-guidelines: "No development of abbreviations. Special characters are\
    \ used for the graphemic transcription, compatible with the Unicode mufi qnd the\
    \ special character table of cremma-medieval.   No correction of orthography errors,\
    \ BUT proper transcription of inversed letters (for Inc59) such as character \"\
    n\" printed as \"u\" in several cases.  Spaces were added freely for word separation\
    \ according to dictionaries of middle French and Latin (latin forms verified on\
    \ Collatinus).  For more documentation regarding the transcription norms and guidelines\
    \ head to the repository and the report file.''\n"
  production-software: Unknown [Automatically filled]
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Vlachou Efstathiou, Malamatenia and\
    \ Leroy, Noé and Maulu, Marco},\ndoi = {10.5281/zenodo.6126613},\ntitle = {git-project-Boccace}\n\
    }\n"
  _apa: "Vlachou Efstathiou M., Leroy N., Maulu M. git-project-Boccace (version 1.0).\
    \ DOI: 10.5281/zenodo.6126613\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Chateau de Chavigny
  url: https://github.com/PSL-Chartes-HTR-Students/HN2021-ChateauChavigny
  project-name: ENC - Bonnes pratiques du developpement collaboratif
  authors:
  - name: Pascual
    surname: Margot
    roles:
    - transcriber
  - name: Franchet d\u0027Espèrey
    surname: Louis-Fiacre
    roles:
    - transcriber
    - digitization
  - name: Gabay
    surname: Simon
    roles:
    - quality-control
  description: "Le document sur lequel nous travaillons porte sur le Château de Chavigny\
    \ à Lerné en Touraine. Au XVIème siècle, c’est la famille des seigneurs Leroy\
    \ qui possède ce château. Avant 1568, en pleine guerre de religion, François Leroy,\
    \ du parti du roi et des catholiques, participe à la capture et la rançon du prince\
    \ de Condé, du parti protestant. En 1568, François Leroy, en tant que capitaine\
    \ de 50 lances au service du roi, part en campagne avec lui.  L'objectif est de\
    \ transcrire cinq feuillets d'un manuscrit à l'aide d'eScriptorium. Le but étant\
    \ d'apprendre à utiliser git et github pour mener à bien notre premier projet\
    \ collaboratif.\n"
  language:
  - frm
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '1568'
    notAfter: '1599'
  hands:
    count: '1'
    precision: exact
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  citation-file-link: https://raw.githubusercontent.com/PSL-Chartes-HTR-Students/HN-2021-ChateauChavigny/main/CITATION.cff
  transcription-guidelines: "- Gestion des abbréviations: \n    - Si développement\
    \ (pas toujours), les développer entre crochets.\n    - L'orthographe originale\
    \ et les abréviations doivent être conservées.\n- Gestion des échecs de transcription\
    \ de caractère : lorsqu'un qu'un caractère nous paraît non sur, nous préférons\
    \ mettre un [?] pour indiquer qu'il y a un caractère non transcrit dans un mot.\
    \ Pour plusieurs caractères, faire autant de ? que de caractère non reconnu :\
    \ tel [???] pour 3 caractères.\n"
  volume:
  - metric: characters
    count: 9126
  - metric: files
    count: 6
  - metric: lines
    count: 253
  - metric: regions
    count: 22
  production-software: Unknown [Automatically filled]
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Pascual, Margot and Franchet d'Espèrey,\
    \ Louis-Fiacre and Gabay, Simon},\ndoi = {10.5281/zenodo.6126655},\nmonth = {2},\n\
    title = {Château de Chavigny},\nurl = {https://github.com/PSL-Chartes-HTR-Students/HN2021-ChateauChavigny},\n\
    year = {2022}\n}\n"
  _apa: "Pascual M., Franchet d'Espèrey L., Gabay S. (2022). Château de Chavigny (version\
    \ 1.0). DOI: 10.5281/zenodo.6126655 URL: https://github.com/PSL-Chartes-HTR-Students/HN2021-ChateauChavigny\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: 'Maxime Kovalewsky - Coutume contemporaine et loi ancienne: droit coutumier
    ossétien'
  url: https://github.com/PSL-Chartes-HTR-Students/HN2021-Kovalewsky-1893
  project-name: "ENC - Bonnes pratiques du developpement collaboratif\n"
  authors:
  - name: L’Eveque
    surname: Zoé
    roles:
    - transcriber
  - name: Ekaterina
    surname: Kate
    roles:
    - transcriber
  - name: Kasparian
    surname: Anahide
    roles:
    - transcriber
  description: "Nous avons choisi de transcrire le deuxième chapitre de l’ouvrage\
    \ de Maxime Kovalewsky :  Coutume contemporaine et loi ancienne : droit coutumier\
    \ ossétien, éclairé par l’histoire comparée. Paris, L. Larose, 1893. \n"
  language:
  - fra
  script:
  - iso: Latn
  script-type: only-typed
  time:
    notBefore: '1893'
    notAfter: '1893'
  hands:
    count: '1'
    precision: exact
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  citation-file-link: https://github.com/PSL-Chartes-HTR-Students/HN2021-Kovalewsky-1893/main/CITATION.CFF
  volume:
  - metric: characters
    count: 45626
  - metric: files
    count: 28
  - metric: lines
    count: 983
  - metric: regions
    count: 72
  production-software: Unknown [Automatically filled]
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {L’Eveque, Zoé and Ekaterina, Kate\
    \ and Kasparian, Anahide},\ndoi = {10.5281/zenodo.6126633},\nmonth = {2},\ntitle\
    \ = {Projet Kovaleswky - 1893},\nurl = {https://github.com/PSL-Chartes-HTR-Students/HN2021-Kovalewsky-1893},\n\
    year = {2022}\n}\n"
  _apa: "L’Eveque Z., Ekaterina K., Kasparian A. (2022). Projet Kovaleswky - 1893\
    \ (version 1.0). DOI: 10.5281/zenodo.6126633 URL: https://github.com/PSL-Chartes-HTR-Students/HN2021-Kovalewsky-1893\n"
- authors:
  - name: Ingrid
    roles:
    - transcriber
    - aligner
    surname: Guimarães
  - name: Perrine
    roles:
    - transcriber
    - aligner
    surname: Maurel
  - name: Yagmur
    roles:
    - transcriber
    - aligner
    surname: Ozturk
  - name: Alix
    orcid: 0000-0002-0136-4434
    roles:
    - quality-control
    surname: Chagué
  - name: Thibault
    orcid: 0000-0003-1852-9204
    roles:
    - support
    surname: Clérice
  automatically-aligned: false
  characters:
    members:
    - e
    - t
    - r
    - o
    - a
    - n
    - i
    - s
    - h
    - l
    - .
    - d
    - f
    - c
    - u
    - y
    - m
    - ','
    - S
    - M
    - w
    - p
    - g
    - b
    - C
    - I
    - v
    - R
    - A
    - E
    - D
    - F
    - P
    - T
    - k
    - O
    - L
    - N
    - W
    - '1'
    - B
    - J
    - H
    - '2'
    - '-'
    - U
    - '0'
    - G
    - Y
    - '5'
    - '9'
    - ':'
    - "'"
    - q
    - x
    - V
    - '3'
    - K
    - '4'
    - ᗅ
    - '8'
    - '7'
    - (
    - )
    - j
    - ^
    - '"'
    - '&'
    - z
    - '6'
    - '?'
    - ⟦
    - ⟧
    - ᗞ
    - Q
    - ;
    - ᑕ
    - $
    - +
    - '*'
    - Z
    mode: NFD
  citation-file-link: >-
    https://github.com/PSL-Chartes-HTR-Students/HN2021-Memorials_Jane_Lathrop_Stanford/main/CITATION.CFF
  description: >-
    "Les données sources ont été téléversées sur le site From the page par les
    Archives de l’Université Stanford qui en sont les propriétaires. Elles ont
    ensuite été retranscrites par des bénévoles anonymes ; c'est leur travail nous
    a servi de base pour corriger nos propres retranscriptions. Les documents
    sources choisies sont des lettres de diffé  rents auteurs portant sur les
    obsèques de Jane Lathrop Stanford.  Les lettres sélectionnées étaient les
    lettres : 42, 43, 46, 49, 50, 54, 57 à 60, 69, 75, 76 [section 1,
    retranscrites par Perrine MAUREL] ; 80 à 93 [section 2, retranscrites par
    Ingrid GUIMARÃES] ; 241 à 242  [section 3, retranscrites par Yagmur OZTURK].
  format: Alto-XML
  hands:
    count: 1-per-file
    precision: estimated
  institutions: []
  language:
  - eng
  license:
    name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: ENC - Bonnes pratiques du developpement collaboratif
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: evenly-mixed
  time:
    notAfter: '1905'
    notBefore: '1905'
  title: Memorials for Jane Lathrop Stanford
  transcription-guidelines: >-
    Notre retranscription en elle-même a cherché à retranscrire le texte ipsis
    litteris, sans le corriger, en conservant donc les erreurs éventuelles
    intrinsèques au document. Il convient toutefois de noter que dans certains
    cas, les documents présentaient des mentions imprécises qui n'avaient pas été
    prises en compte par les retranscriptions originelles, ou alors qui avaient
    été soulignées comme étant une retranscription incertaine. Nous avons alors
    fait le choix d'être plus exhaustif que la retranscription originelle si
    possible, et nous avons parfois fait des choix de retranscription différents
    sur la base de notre ressenti visuel lors du travail. En raison de ces choix,
    la taille d'une page s'est donc parfois avérée rallongée par rapport à
    l'estimation première.


    Addition: les règles de transcriptions ont été adaptées pour être compatibles
    avec les préconisations CREMMA/CATMuS, à savoir : les portions de texte
    suscrites sont précédées d'un "^", les mots barrés ou illisible sont encadrés
    des signes "⟦" et "⟧". Les zones ne sont pas tracées dans le document, mais
    l'ontologie segmOnto a été appliquée pour le typage des lignes, en suivant 5
    types possibles: DefaultLine:Handwritten, DefaultLine:Print,
    DefaultLine:Typewritten, DefaultLine:Signature et InterlinearLine:Handwritten.
    Cela permet de distinguer aisément les lignes manuscrites ou tapuscrites des
    en-têtes préimprimées des papiers à lettre.
  url: >-
    https://github.com/PSL-Chartes-HTR-Students/HN2021-Memorials_Jane_Lathrop_Stanford
  volume:
  - count: 18323
    metric: characters
  - count: 41
    metric: files
  - count: 774
    metric: lines
  - count: 50
    metric: regions
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Guimarães, Ingrid and Maurel, Perrine\
    \ and Ozturk, Yagmur and Chagué, Alix},\ndoi = {10.5281/zenodo.6126625},\nmonth\
    \ = {2},\ntitle = {Memorials for Jane Lathrop Stanford},\nyear = {2022}\n}\n"
  _apa: "Guimarães I., Maurel P., Ozturk Y., Chagué A. (2022). Memorials for Jane\
    \ Lathrop Stanford (version 1.0). DOI: 10.5281/zenodo.6126625\n"
- authors:
  - name: Sarbach-Pulicani
    roles:
    - transcriber
    - project-manager
    surname: Vincent
  - name: Saïag
    surname: Violette
  - name: Escoda
    roles:
    - transcriber
    surname: Adrien
  - name: Miaille
    roles:
    - transcriber
    - project-manager
    surname: Théophile
  - name: Gabay
    orcid: 0000-0001-9094-4475
    roles:
    - transcriber
    - quality-control
    surname: Simon
  characters:
    members:
    - e
    - a
    - i
    - u
    - n
    - r
    - t
    - s
    - o
    - l
    - c
    - d
    - p
    - m
    - ','
    - g
    - .
    - ̀
    - v
    - ’
    - f
    - h
    - b
    - C
    - ́
    - ¬
    - P
    - q
    - z
    - '?'
    - "'"
    - A
    - M
    - I
    - L
    - S
    - '1'
    - D
    - ̂
    - G
    - j
    - F
    - U
    - E
    - Q
    - '-'
    - x
    - '!'
    - B
    - ':'
    - V
    - '7'
    - '9'
    - R
    - N
    - ;
    - –
    - O
    - '8'
    - T
    - '2'
    - '0'
    - «
    - '3'
    - '6'
    - y
    - »
    - '5'
    - ̧
    - (
    - )
    - —
    - '4'
    - J
    - °
    - H
    - '*'
    - X
    - œ
    - '"'
    - ̈
    - K
    - ^
    - “
    - '='
    mode: NFD
  citation-file-link: https://raw.githubusercontent.com/PSL-Chartes-HTR-Students/HN2021-OCR-Poesie-Corse/main/CITATION.CFF
  description: "Le premier ouvrage s’intitule *Pontenôvu* a été écrit par Petru Rocca\
    \ et publié par la \"Stamparia di a Muvra\" en 1927. Il s'agit d'un recueil de\
    \ poèmes en corse et en français dont les thèmes varient. *A Muvra* est un journal\
    \ autonomiste corse d'influence maurassienne qui a existé pendant toute la période\
    \ de l'entre-deux-guerres. Se revendiquant comme étant une revue culturelle, la\
    \ dimension politique de la revue (incarnée par le PCA, ou Partitu corsu d'azione),\
    \ en a fait un mouvement controversé. C'est dans ce contexte de lutte politique\
    \ et d'éveil culturel corse que s'inscrit ce recueil.\nLe second ouvrage s'intitule\
    \ *A nostra Santa Fede - Catechismu Corsu*, écrit par Ageniu Grimaldi en 1926\
    \ sous le pseudonyme de Saveriu Malaspina. Proche de Petru Rocca, ce-dernier est\
    \ l'un des théoriciens de l'autonomisme corse de l'entre-deux-guerres et fidèle\
    \ muvriste. Dans l'ouvrage, il est fait mention notamment de la façon dont un\
    \ vrai corse doit se comproter vis-à-vis de sa foi envers Dieu et son île. Bien\
    \ qu'il ne s'agisse pas réellement d'un recueil de poèmes, le style d'écriture\
    \ de cet ouvrage est particulièrement intéressant. Il reprend un style qui se\
    \ rapproche des écrits bibliques.\n"
  format: Alto-XML
  hands:
    count: 1-per-folder
    precision: exact
  language:
  - cos
  - fra
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  project-name: ENC - Bonnes pratiques du developpement collaboratif
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-typed
  time:
    notAfter: '1927'
    notBefore: '1926'
  title: OCR Corse
  transcription-guidelines: SegmOnto
  url: https://github.com/PSL-Chartes-HTR-Students/HN2021-OCR-Poesie-Corse
  volume:
  - count: 41205
    metric: characters
  - count: 47
    metric: files
  - count: 1681
    metric: lines
  - count: 126
    metric: regions
  production-software: Unknown [Automatically filled]
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Sarbach-Pulicani, Vincent and Miaille,\
    \ Théophile and Escoda, Adrien and Saïag, Violette and Gabay, Simon},\ndoi = {10.5281/zenodo.6126641},\n\
    month = {2},\ntitle = {OCR d'une poésie corse},\nurl = {https://github.com/PSL-Chartes-HTR-Students/HN2021-OCR-Poesie-Corse},\n\
    year = {2022}\n}\n"
  _apa: "Sarbach-Pulicani V., Miaille T., Escoda A., Saïag V., Gabay S. (2022). OCR\
    \ d'une poésie corse (version 1.0). DOI: 10.5281/zenodo.6126641 URL: https://github.com/PSL-Chartes-HTR-Students/HN2021-OCR-Poesie-Corse\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Argus des Brevets
  url: https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-ArgusDesBrevets
  project-name: "ENC - Bonnes pratiques du developpement collaboratif'\n"
  authors:
  - name: De Craene
    surname: Valentin
    roles:
    - transcriber
  - name: Humeau
    surname: Maxime
    roles:
    - transcriber
  - name: Reignier
    surname: Virgile
    roles:
    - transcriber
  description: "L’argus des brevets de 1910 se présente sous la forme d’un imprimé\
    \ contemporain, organisé en rubriques regroupant de manière chronologique puis\
    \ thématique les brevets déposés en France. Cette énumération et présentation\
    \ succincte des brevets est répartie en deux colonnes et présente des abréviations\
    \ normalisées. Dès lors, ce présent guide de contribution au projet entend présenter\
    \ l’ensemble des normes de transcriptions adoptées au cours de ce projet de transcription,\
    \ réalisé sur la plateforme E-scriptorium, dans le cadre du cours Git du master\
    \ TNAH à l’ENC.\n"
  language:
  - fra
  script:
  - iso: Latn
  script-type: only-typed
  time:
    notBefore: '1910'
    notAfter: '1910'
  hands:
    count: '1'
    precision: exact
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  citation-file-link: https://raw.githubusercontent.com/PSL-Chartes-HTR-Students/TNAH-2021-ArgusDesBrevets/main/CITATION.cff
  transcription-guidelines: "En premier lieu, nous avons décidé de fonder notre transcription\
    \ sur les recommandations publiées dans l’ouvrage *L’édition critique des textes\
    \ contemporains, XIXe-XXIe siècle*, par Christine Nougaret, Elisabeth Parinet\
    \ et Florence Clavaud. Néanmoins, certaines adaptations ont été nécessaires afin\
    \ de fournir un jeu de données issue de la transcription, qui soit à la fois proche\
    \ du document source et exploitable par la suite. Ainsi, concernant les abréviations,\
    \ nous avons décidé de conserver la graphie originale au sein de la transcription.\
    \ Ce choix fut guidé par deux éléments : d’une part, la volonté de conserver une\
    \ graphie intègre, afin de fournir aux chercheurs s’intéressant à ce sujet un\
    \ texte facilement exploitable de manière automatique, comme par exemple une analyse\
    \ quantitative des types de sociétés (anonymes, familiales,…) déposant des brevets.\
    \ Cette décision fut motivée par la facilité de résolution et compréhension des\
    \ abréviations par le lecteur. D'autre part, il nous semble que cette approche\
    \ permettrait une réutilisation générales des données, telle qu'un processus d'apprentissage\
    \ machine.\nNous avons été amené à réaliser certains choix relevant de la transcription\
    \ et de l’édition du document. Pour ce faire, nous nous sommes référé au *Lexique\
    \ typographique en usage à l’Imprimerie nationale* : - les tirets en fin de ligne\
    \ faisant la césure au sein des mots ont été rétablis (ex : direc-tion). - les\
    \ numéros de page en haut de page ont été transcris ainsi : « _ N _ » où N correspond\
    \ au numéro de page. - en cas de caractères mal imprimés ou usés, ceux-ci ont\
    \ été rétablis dans la mesure où ils sont facilement interprétables (mais non\
    \ devinables) par le lecteur. \n"
  volume:
  - metric: characters
    count: 55156
  - metric: files
    count: 17
  - metric: lines
    count: 1962
  - metric: regions
    count: 86
  production-software: Unknown [Automatically filled]
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {De Craene, Valentin and Humeau, Maxime\
    \ and Reignier, Virgile},\ndoi = {10.5281/zenodo.6126366},\nmonth = {1},\ntitle\
    \ = {Projet Argus des Brevets},\nurl = {https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-ArgusDesBrevets},\n\
    year = {2022}\n}\n"
  _apa: "De Craene V., Humeau M., Reignier V. (2022). Projet Argus des Brevets (version\
    \ 1.0). DOI: 10.5281/zenodo.6126366 URL: https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-ArgusDesBrevets\n"
- authors:
  - name: Biay
    roles:
    - transcriber
    surname: Sébastien
  - name: Cappe
    roles:
    - transcriber
    surname: Zoé
  - name: Konstantinova
    roles:
    - transcriber
    surname: Kristina
  - name: Boby
    roles:
    - transcriber
    - aligner
    surname: Victor
  citation-file-link: https://raw.githubusercontent.com/PSL-Chartes-HTR-Students/TNAH-2021-DecameronFR/main/CITATION.cff
  description: "Le projet vise à la consitution de vérités de terrain pour l’entraînement\
    \ de modèles HTR à partir d'un manuscrit français des années 1430-1455 : le manuscrit\
    \ 5070 de la Bibliothèque de l'Arsenal (reproduit sur Gallica). Ce manuscrit contient\
    \ la traduction française du Decameron de Boccace par Laurent de Premierfait.\
    \ Nos vérités de terrain recouvrent la description de la peste à Florence située\
    \ dans le prologue de l'ouvrage.\n"
  format: Alto-XML
  hands:
    count: '1'
    precision: exact
  language:
  - frm
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  project-name: "ENC - Bonnes pratiques du developpement collaboratif\n"
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notAfter: '1455'
    notBefore: '1430'
  title: DecameronFR
  transcription-guidelines: "Cf. https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-DecameronFR/blob/main/normesTranscription.md\n"
  url: https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-DecameronFR
  volume:
  - count: 19821
    metric: characters
  - count: 9
    metric: files
  - count: 751
    metric: lines
  - count: 41
    metric: regions
  production-software: Unknown [Automatically filled]
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Biay, Sébastien and Boby, Victor and\
    \ Konstantinova, Kristina and Cappe, Zoé},\ndoi = {10.5281/zenodo.6126376},\n\
    title = {TNAH-2021-DecameronFR},\nurl = {https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-DecameronFR}\n\
    }\n"
  _apa: "Biay S., Boby V., Konstantinova K., Cappe Z. TNAH-2021-DecameronFR (version\
    \ 1.0). DOI: 10.5281/zenodo.6126376 URL: https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-DecameronFR\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Projet Exposition universelle de 1878
  url: https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-Expositions_Universelles
  project-name: "ENC - Bonnes pratiques du developpement collaboratif'\n"
  authors:
  - name: Christensen
    surname: Kelly
    roles:
    - transcriber
  - name: Davoury
    surname: Baudoin
    roles:
    - transcriber
  - name: Anahi
    surname: Haedo
    roles:
    - transcriber
  - name: Kervegan
    surname: Paul
    roles:
    - transcriber
  - name: Sanchez-Oeconomo
    surname: Esteban
    roles:
    - transcriber
  description: "Le Congrès international des sciences ethnographiques de 1878 a eu\
    \ lieu à l’occasion de l'Exposition universelle de 1878, à Paris. Édité en 1881\
    \ par l'Imprimerie nationale, le compte rendu de ce congrès a été mis à disposition\
    \ par le Conservatoire numérique des Arts et Métiers.\n"
  language:
  - fra
  script:
  - iso: Latn
  - iso: Grek
  - iso: Deva
  - iso: Arab
  script-type: only-typed
  time:
    notBefore: '1881'
    notAfter: '1881'
  hands:
    count: '1'
    precision: exact
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  citation-file-link: https://raw.githubusercontent.com/PSL-Chartes-HTR-Students/TNAH-2021-Expositions_Universelles/main/CITATION.cff
  transcription-guidelines: Diplomatique, mais pas allographétique.
  volume:
  - metric: characters
    count: 155022
  - metric: files
    count: 56
  - metric: lines
    count: 2620
  - metric: regions
    count: 158
  production-software: Unknown [Automatically filled]
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Christensen, Kelly and Davoury, Baudoin\
    \ and Haedo, Anahi and Kervegan, Paul and Sanchez-Oeconomo, Esteban},\ndoi = {10.5281/zenodo.6126447},\n\
    month = {1},\ntitle = {Projet Exposition Universelle de 1878},\nurl = {https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-Expositions_Universelles},\n\
    year = {2022}\n}\n"
  _apa: "Christensen K., Davoury B., Haedo A., Kervegan P., Sanchez-Oeconomo E. (2022).\
    \ Projet Exposition Universelle de 1878 (version 1.0). DOI: 10.5281/zenodo.6126447\
    \ URL: https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-Expositions_Universelles\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Projet Correspondance Berlioz
  url: https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-Projet-Correspondance-Berlioz
  project-name: "ENC - Bonnes pratiques du developpement collaboratif'\n"
  authors:
  - name: Céard
    surname: Lien
    roles:
    - transcriber
  - name: Sajdak
    surname: Cécile
    roles:
    - transcriber
  - name: Lebreton
    surname: Fanny
    roles:
    - transcriber
  description: "Nous avons choisi de travailler sur la correspondance active de Hector\
    \ Berlioz adressée à sa sœur Anne-Marguerite \"Nanci\" Berlioz. L’ensemble des\
    \ lettres adressées à Nanci Berlioz représentait un volume trop important pour\
    \ notre projet, aussi nous les avons sélectionnées, par souci de cohérence, selon\
    \ un ordre chronologique (voir le tableau de gestion) pour la liste exacte des\
    \ lettres transcrites).\n"
  language:
  - fra
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '1823'
    notAfter: '1844'
  hands:
    count: '1'
    precision: exact
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  citation-file-link: https://raw.githubusercontent.com/PSL-Chartes-HTR-Students/TNAH-2021-Projet-Correspondance-Berlioz/main/CITATION.cff
  transcription-guidelines: "**Orthographe :** - Aucune modification opérée sur l'orthographe,\
    \ même en présence de fautes. - L'orthographe ancienne est laissée telle quelle.\
    \  - Aucune restitution des accents manquants. Aucune correction des accents fautifs.\
    \ Restitution de la bonne graphie de l'accent, lorsque nous considérons qu'il\
    \ y a une variation de la graphie de celui-ci à cause de la rapidité d'écriture.\
    \    - Aucune restitution des traits d'union manquants. - Séparation des mots\
    \ collés dès lors que la ligature entre ces mots semble due à la rapidité de l'écriture.\n\
    **Abréviations :** - Aucune résolution d'abréviation. - Utilisation du symbole\
    \ monétaire de la livre tournois → **₶** (Unicode U+20B6).\n**Mots en exposant\
    \ :** - Restitution seulement du mot sans le mettre en exposant.\n**Majuscules\
    \ et minuscules :** -  Aucune restitution des majuscules, même lorsqu'elles sont\
    \ absentes en début de phrase ou de nom propre.\n**Ponctuation :** - Aucune restitution\
    \ de la ponctuation manquante. Aucune correction de la ponctuation fautive.  -\
    \ Emploi du tiret cadratin (—, unicode U+2014) de part et d'autre d'une incise.\
    \  - Emploi du tiret demi-cadratin (–, unicode U+2013) pour marquer le changement\
    \ d’interlocuteur dans les dialogues et devant les éléments des listes/ énumérations.\n"
  volume:
  - metric: characters
    count: 13474
  - metric: files
    count: 16
  - metric: lines
    count: 367
  - metric: regions
    count: 64
  production-software: Unknown [Automatically filled]
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Ceard, Lien and Lebreton, Fanny and\
    \ Sajdak, Cécile},\ndoi = {10.5281/zenodo.6126475},\nmonth = {1},\ntitle = {Projet\
    \ Correspondance Berlioz},\nurl = {https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-Projet-Correspondance-Berlioz},\n\
    year = {2022}\n}\n"
  _apa: "Ceard L., Lebreton F., Sajdak C. (2022). Projet Correspondance Berlioz (version\
    \ 1.0). DOI: 10.5281/zenodo.6126475 URL: https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-Projet-Correspondance-Berlioz\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Projet Notre-Dame
  url: https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-Projet-Notre-Dame
  project-name: "ENC - Bonnes pratiques du developpement collaboratif\n"
  authors:
  - name: Doat
    surname: Soline
    roles:
    - transcriber
  - name: Menu
    surname: Ariane
    roles:
    - transcriber
  - name: Falcoz
    surname: Elsa
    roles:
    - transcriber
  - name: Faure
    surname: Margaux
    roles:
    - transcriber
  - name: Mazoué
    surname: Anaïs
    roles:
    - transcriber
  description: "Le Projet Notre-Dame consiste en une transcription des journaux quotidiens\
    \ de l’année 1860 (https://mediatheque-patrimoine.culture.gouv.fr/sites/mediatheque/files/jnd_1860.pdf)\
    \ des travaux de restauration effectués de 1844 à 1865 à la cathédrale Notre-Dame\
    \ de Paris sous la direction d'Eugène Viollet-le-Duc et Jean-Baptiste Lassus.\
    \ Celle-ci a été effectuée sur eScriptorium à partir de la numérisation des journaux\
    \ des travaux (https://mediatheque-patrimoine.culture.gouv.fr/travaux-de-notre-dame-de-paris-1844-1865)\
    \ réalisée par la Médiathèque de l'architecture et du patrimoine. \n"
  language:
  - fra
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '1860'
    notAfter: '1860'
  hands:
    count: '1'
    precision: exact
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  citation-file-link: https://raw.githubusercontent.com/PSL-Chartes-HTR-Students/TNAH-2021-Projet-Notre-Dame/main/CITATION.cff
  transcription-guidelines: "- respect des majuscules et minuscules - respect des\
    \ ligatures (par exemple, transcrire \"chœur\") - mot qui est barré : 难 (une seule\
    \ fois par mot) mais seulement s'ils sont totalement/à moitié illisibles. Les\
    \ restranscrire entre accolades {} s'ils sont lisibles.  - Pour mettre en exergue\
    \ les doutes de transcription : \n    - mot incertain: [incertain]\n    - mot\
    \ que l'on ne parvient pas à transcrire : [??]\n"
  volume:
  - metric: characters
    count: 29286
  - metric: files
    count: 12
  - metric: lines
    count: 735
  - metric: regions
    count: 86
  production-software: Unknown [Automatically filled]
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Doat, Soline and Falcoz, Elsa and\
    \ Faure, Margaux and Mazoué, Anaïs and Menu, Ariane},\ndoi = {10.5281/zenodo.6126491},\n\
    month = {1},\ntitle = {Projet Notre-Dame},\nurl = {https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-Projet-Notre-Dame},\n\
    year = {2022}\n}\n"
  _apa: "Doat S., Falcoz E., Faure M., Mazoué A., Menu A. (2022). Projet Notre-Dame\
    \ (version 1.0). DOI: 10.5281/zenodo.6126491 URL: https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-Projet-Notre-Dame\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: FoNDUE-GasparoSardiToponomasia-Dataset
  url: https://github.com/PaulineJac/GasparoSardiToponomasia/tree/main/HTR
  authors:
  - name: Jacsont
    surname: Pauline
    roles:
    - transcriber
    - quality-control
    - digitization
  - name: Mittenhuber
    surname: Florian
  institutions: []
  description: >-
    Dataset produced as for the project to edit Gasparo Sardi’s Toponomasia from
    codex 174 of the Burgerbibliothek of Bern. Images are available on request by
    writing to: pauline.jacsont [ at ] unige.ch.
  project-name: FoNDUE
  language:
  - lat
  production-software: eScriptorium + Kraken
  script:
  - iso: Latn
  - iso: Grek
  script-type: only-manuscript
  time:
    notBefore: '1561'
    notAfter: '1570'
  hands:
    count: '1'
    precision: exact
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  sources:
  - reference: ''
    link: http://katalog.burgerbib.ch/detail.aspx?ID=340662
  volume:
  - metric: pages
    count: 49
  citation-file-link: >-
    https://github.com/PaulineJac/GasparoSardiToponomasia/blob/main/HTR/CITATION.cff
  transcription-guidelines: " The transcriptions were made following the rules of\
    \ the github cremma-medieval repository - https://github.com/HTR-United/cremma-medieval.\
    \ The transcription is strictly diplomatic and graphmatic. No abbreviations are\
    \ resolved, no standardization of 'i' and 'v' with ramist letters, and accents,\
    \ punctuation, spaces, and line breaks are strictly adhered to. Following Leiden\
    \ conventions, crossed out or crossed out elements are transcribed with double\
    \ brackets ⟦⟧, and elements that are illegible in the picture will not be restored\
    \ but indicated by this type of bracket ⟨ ⟩. Special characters are encoded according\
    \ to the MUFI fonts."
  automatically-aligned: false
- authors:
  - name: Dubois
    roles:
    - project-manager
    surname: Alain
  - name: Clérice
    roles:
    - project-manager
    - quality-control
    surname: Thibault
  - name: Rudaz
    roles:
    - transcriber
    surname: Clemence
  - name: Schlaeppi
    roles:
    - transcriber
    surname: Darius
  - name: Mamie
    roles:
    - transcriber
    surname: Delphine
  - name: Schmied
    roles:
    - support
    surname: Marie-Caroline
  characters:
    members:
    - e
    - '1'
    - a
    - i
    - r
    - l
    - n
    - s
    - t
    - o
    - u
    - '8'
    - c
    - /
    - h
    - '"'
    - d
    - '2'
    - m
    - M
    - b
    - f
    - g
    - V
    - '3'
    - '6'
    - '4'
    - '5'
    - F
    - J
    - p
    - '7'
    - v
    - A
    - S
    - '0'
    - ̧
    - ̀
    - ́
    - z
    - y
    - C
    - B
    - '9'
    - D
    - L
    - .
    - W
    - P
    - G
    - E
    - T
    - ̶
    - R
    - H
    - N
    - O
    - ̈
    - x
    - I
    - K
    - k
    - w
    - °
    - q
    - '-'
    - j
    - ̂
    - '?'
    - Z
    - "'"
    - _
    - ^
    - ̵
    - X
    - U
    - (
    - )
    - '='
    - ','
    - Q
    - ':'
    - <
    - '>'
    - œ
    - '!'
    - '&'
    - '['
    - ']'
    - ᗅ
    - ¨
    - '*'
    - §
    - '}'
    - \
    - +
    - '#'
    mode: NFD
  citation-file-link: https://raw.githubusercontent.com/PonteIneptique/valais-recensement/main/CITATION.CFF
  description: Ensemble de formulaire de recensement
  format: Alto-XML
  hands:
    count: 1-per-file
    precision: exact
  institutions:
  - name: Archives du Valais
    roles:
    - digitization
  language:
  - fra
  - deu
  license:
  - name: CC-BY-BC 4.0
    url: https://creativecommons.org/licenses/by-nc/4.0/
  production-software: eScriptorium + Kraken
  project-name: Valais Time Machine
  project-website: https://www.timemachinevs.ch/
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notAfter: '1890'
    notBefore: '1870'
  title: Recensement Valaisan (Valais Time Machine)
  transcription-guidelines: "- Superscript are transcribed with a ^ before the string.\n\
    - Transcription is faithful: nothing is corrected.\n- Checkmarks in table are\
    \ transcribed as `/`. Some checkmarks looking character can be transcribed as\
    \ `1` if the 1 in the dates looks the same\n- Printed part of the form is not\
    \ transcribed.\n- Only `Col` and `Header` regions are used for table segmentation.\
    \ If a Signature is at the bottom, we also use `Signature`"
  url: https://github.com/PonteIneptique/valais-recensement
  volume:
  - count: 282260
    metric: characters
  - count: 915
    metric: files
  - count: 59368
    metric: lines
  - count: 34083
    metric: regions
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Alain, Dubois and Clérice, Thibault\
    \ and Mamie, Delphine and Darius, Schlaeppi and Rudaz, Clémence and Schmied, Marie-Caroline},\n\
    title = {Tables du recensement du Valais},\nurl = {https://github.com/PonteIneptique/valais-recensement}\n\
    }\n"
  _apa: "Alain D., Clérice T., Mamie D., Darius S., Rudaz C., Schmied M. Tables du\
    \ recensement du Valais URL: https://github.com/PonteIneptique/valais-recensement\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: HTR - Araucania manuscript XIX
  url: https://github.com/Proyecto-Ocupacion-Araucania-UChile/HTR_Araucania_XIX
  authors:
  - name: Humeau
    surname: Maxime
  - name: Chiaretti
    surname: Alessandro
  institutions:
  - name: Archivo Central Andres Bello
  description: "Ground Truth dataset for Spanish 19th typewritten OCR. \nThe archives\
    \ come from the events of the Occupation of Araucania (1850-1881) in Chile. They\
    \ are archived in the ’Colección manuscritos' of the Archivo Central Andres Bello\
    \ - Universidad de Chile."
  language:
  - spa
  production-software: eScriptorium + Kraken
  script:
  - iso: Latn
  script-type: mainly-manuscript
  time:
    notBefore: '1859'
    notAfter: '1877'
  hands:
    count: more-than-10
    precision: estimated
  license:
  - name: CC-BY-SA 4.0
    url: https://creativecommons.org/licenses/by-sa/4.0/
  format: Alto-XML
  volume:
  - metric: characters
    count: 117155
  - metric: files
    count: 180
  - metric: lines
    count: 3932
  - metric: regions
    count: 981
  transcription-guidelines: "- xxx for erased or unreadable characters\n- ^+letters\
    \ for superscript letters\n- ⁋ for new paragraph\n"
  characters:
    mode: NFD
    members:
    - e
    - a
    - o
    - n
    - s
    - r
    - i
    - d
    - l
    - u
    - t
    - c
    - m
    - p
    - q
    - b
    - ́
    - g
    - .
    - h
    - ','
    - ⁋
    - v
    - '-'
    - f
    - y
    - S
    - C
    - '0'
    - ^
    - A
    - j
    - U
    - '1'
    - z
    - x
    - D
    - M
    - ̃
    - E
    - '2'
    - L
    - P
    - N
    - '8'
    - V
    - J
    - B
    - T
    - G
    - '6'
    - I
    - '5'
    - '3'
    - ':'
    - '9'
    - '4'
    - H
    - R
    - '7'
    - ;
    - O
    - “
    - º
    - ”
    - F
    - Q
    - Y
    - ̄
    - '*'
    - _
    - '='
    - $
    - (
    - '"'
    - )
    - ¿
    - /
    - ̀
    - '?'
    - ̈
    - ¡
    - '!'
    - '{'
    - '~'
    - '}'
    - '&'
    - W
    - Z
    - ‘
    - ’
    - K
    - '['
    - ']'
  automatically-aligned: false
- authors:
  - name: Sonia
    orcid: 0009-0009-7367-048X
    roles:
    - transcriber
    - project-manager
    - quality-control
    surname: Solfrini
  - name: Simon
    orcid: 0000-0001-9094-4475
    roles:
    - support
    surname: Gabay
  - name: Geneviève
    orcid: 0009-0006-5367-4262
    roles:
    - transcriber
    - project-manager
    - quality-control
    surname: Gross
  - name: Pierre-Olivier
    orcid: 0009-0009-2475-6017
    roles:
    - transcriber
    - quality-control
    surname: Beaulnes
  - name: Aurélia
    orcid: 0009-0009-9678-9811
    roles:
    - transcriber
    - quality-control
    surname: Marques Oliveira
  - name: Daniela
    orcid: 0000-0002-2601-668X
    roles:
    - project-manager
    surname: Solfaroli Camillocci
  characters:
    members:
    - e
    - s
    - u
    - a
    - i
    - n
    - t
    - r
    - o
    - l
    - c
    - d
    - p
    - m
    - .
    - ','
    - f
    - q
    - g
    - ̃
    - y
    - b
    - h
    - /
    - z
    - ⁊
    - ¬
    - ':'
    - C
    - D
    - x
    - E
    - I
    - P
    - L
    - S
    - '1'
    - A
    - M
    - Q
    - '2'
    - U
    - '?'
    - '3'
    - N
    - T
    - '4'
    - O
    - ͥ
    - B
    - R
    - ꝰ
    - H
    - '6'
    - '5'
    - ͬ
    - G
    - '8'
    - F
    - (
    - )
    - '0'
    - '9'
    - ¶
    - '7'
    - ◊
    - ꝓ
    -  
    - ꝑ
    - ᑕ
    - V
    - '-'
    - Y
    - ;
    - ᗞ
    - J
    - k
    - ̀
    - ꝯ
    - Z
    - v
    mode: NFD
  citation-file-link: https://github.com/SETAFDH/HTR-SETAF-Jean-Michel/blob/main/CITATION.cff
  description: >-
    OCR data for the SETAF project, 16th-century French prints in Gothic
    characters.
  format: Alto-XML
  hands:
    count: '1'
    precision: exact
  language:
  - fra
  license:
    type: CC-BY
    version: 4.0
  production-software: eScriptorium + Kraken
  project-name: FoNDUE
  project-website: >-
    https://www.unige.ch/lettres/humanites-numeriques/recherche/projets-de-la-chaire/fondue
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-typed
  time:
    notAfter: '1600'
    notBefore: '1500'
  title: HTR-SETAF-Jean-Michel
  transcription-guidelines: >-
    Our data follow SegmOnto segmentation standards (https://segmonto.github.io).  Our
    transcription guidelines follow a graphematic approach, without  regularisation.
    We keep the original punctuation and abbreviations. A detailed  presentation of
    our rules is available on HAL  (https://hal.science/hal-04281804).
  url: https://github.com/SETAFDH/HTR-SETAF-Jean-Michel
  volume:
  - count: 286256
    metric: characters
  - count: 404
    metric: files
  - count: 11778
    metric: lines
  - count: 1365
    metric: regions
- authors:
  - name: Sonia
    orcid: 0009-0009-7367-048X
    roles:
    - transcriber
    - project-manager
    - quality-control
    surname: Solfrini
  - name: Simon
    orcid: 0000-0001-9094-4475
    roles:
    - support
    surname: Gabay
  - name: Geneviève
    orcid: 0009-0006-5367-4262
    roles:
    - transcriber
    - project-manager
    - quality-control
    surname: Gross
  - name: Pierre-Olivier
    orcid: 0009-0009-2475-6017
    roles:
    - transcriber
    - quality-control
    surname: Beaulnes
  - name: Aurélia
    orcid: 0009-0009-9678-9811
    roles:
    - transcriber
    - quality-control
    surname: Marques Oliveira
  - name: Daniela
    orcid: 0000-0002-2601-668X
    roles:
    - project-manager
    surname: Solfaroli Camillocci
  characters:
    members:
    - e
    - u
    - s
    - i
    - a
    - t
    - n
    - r
    - o
    - l
    - c
    - d
    - p
    - m
    - .
    - ſ
    - q
    - /
    - f
    - y
    - g
    - ̃
    - h
    - ','
    - z
    - b
    - ⁊
    - ¬
    - x
    - '&'
    - I
    - ':'
    - v
    - E
    - C
    - P
    - ’
    - '1'
    - D
    - L
    - S
    - ̀
    - '2'
    - ¶
    - ́
    - A
    - M
    - R
    - '3'
    - ꝰ
    - N
    - '?'
    - ͥ
    - "'"
    - T
    - Q
    - '4'
    - '6'
    - '7'
    - '0'
    - H
    - '8'
    - '5'
    - '9'
    - B
    - G
    - ͬ
    - O
    - F
    - U
    - ◊
    - )
    - (
    - V
    - œ
    - ꝑ
    - Z
    - ꝓ
    - J
    - '-'
    - ß
    - K
    - ꝫ
    - j
    - ł
    - ꝯ
    - ̧
    - k
    - ꝗ
    - ᗅ
    - ̂
    - ;
    - ð
    - ̈
    - X
    - ᑕ
    - '*'
    mode: NFD
  citation-file-link: https://github.com/SETAFDH/HTR-SETAF-LesFaictzJCH/blob/main/CITATION.cff
  description: >-
    OCR data for the SETAF project, 16th-century French prints in Gothic
    characters.
  format: Alto-XML
  hands:
    count: '1'
    precision: exact
  language:
  - fra
  license:
    type: CC-BY
    version: 4.0
  production-software: eScriptorium + Kraken
  project-name: FoNDUE
  project-website: >-
    https://www.unige.ch/lettres/humanites-numeriques/recherche/projets-de-la-chaire/fondue
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-typed
  time:
    notAfter: '1600'
    notBefore: '1500'
  title: HTR-SETAF-LesFaictzJCH
  transcription-guidelines: >-
    Our data follow SegmOnto segmentation standards (https://segmonto.github.io).  Our
    transcription guidelines follow a graphematic approach, without  regularisation.
    We keep the original punctuation and abbreviations. A detailed  presentation of
    our rules is available on HAL  (https://hal.science/hal-04281804).
  url: https://github.com/SETAFDH/HTR-SETAF-LesFaictzJCH
  volume:
  - count: 311547
    metric: characters
  - count: 232
    metric: files
  - count: 6853
    metric: lines
  - count: 751
    metric: regions
- authors:
  - name: Sonia
    orcid: 0009-0009-7367-048X
    roles:
    - transcriber
    - project-manager
    - quality-control
    surname: Solfrini
  - name: Simon
    orcid: 0000-0001-9094-4475
    roles:
    - support
    surname: Gabay
  - name: Geneviève
    orcid: 0009-0006-5367-4262
    roles:
    - transcriber
    - project-manager
    - quality-control
    surname: Gross
  - name: Pierre-Olivier
    orcid: 0009-0009-2475-6017
    roles:
    - transcriber
    - quality-control
    surname: Beaulnes
  - name: Aurélia
    orcid: 0009-0009-9678-9811
    roles:
    - transcriber
    - quality-control
    surname: Marques Oliveira
  - name: Daniela
    orcid: 0000-0002-2601-668X
    roles:
    - project-manager
    surname: Solfaroli Camillocci
  characters:
    members:
    - e
    - s
    - u
    - i
    - a
    - t
    - r
    - n
    - o
    - l
    - c
    - d
    - p
    - m
    - ̃
    - .
    - /
    - q
    - f
    - y
    - g
    - h
    - b
    - ⁊
    - z
    - ¬
    - ':'
    - x
    - C
    - '1'
    - I
    - E
    - L
    - D
    - P
    - A
    - ¶
    - ͥ
    - '2'
    - M
    - S
    - '3'
    - '*'
    - ͬ
    - Q
    - '?'
    - '4'
    - N
    - T
    - ꝰ
    - '5'
    - '6'
    - R
    - U
    - '0'
    - '8'
    - ','
    - H
    - O
    - '7'
    - '9'
    - (
    - )
    - ꝓ
    - G
    - ꝑ
    - B
    - F
    - ̈
    - ꝯ
    - ł
    - ð
    - ◊
    - '-'
    -  
    - ꝝ
    - v
    - Z
    - k
    - "'"
    - K
    - Y
    - X
    - ̀
    - ꝫ
    - V
    - ́
    - J
    - ꝙ
    - ᵉ
    - w
    - ;
    - ꝗ
    - ̇
    - ̌
    mode: NFD
  citation-file-link: https://github.com/SETAFDH/HTR-SETAF-Pierre-de-Vingle/blob/main/CITATION.cff
  description: >-
    OCR data for the SETAF project, 16th-century French prints in Gothic
    characters.
  format: Alto-XML
  hands:
    count: '1'
    precision: exact
  language:
  - fra
  license:
    type: CC-BY
    version: 4.0
  production-software: eScriptorium + Kraken
  project-name: FoNDUE
  project-website: >-
    https://www.unige.ch/lettres/humanites-numeriques/recherche/projets-de-la-chaire/fondue
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-typed
  time:
    notAfter: '1600'
    notBefore: '1500'
  title: HTR-SETAF-Pierre-de-Vingle
  transcription-guidelines: >-
    Our data follow SegmOnto segmentation standards (https://segmonto.github.io).  Our
    transcription guidelines follow a graphematic approach, without  regularisation.
    We keep the original punctuation and abbreviations. A detailed  presentation of
    our rules is available on HAL  (https://hal.science/hal-04281804).
  url: https://github.com/SETAFDH/HTR-SETAF-Pierre-de-Vingle
  volume:
  - count: 1718361
    metric: characters
  - count: 1835
    metric: files
  - count: 64388
    metric: lines
  - count: 8546
    metric: regions
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Moonshines
  url: https://github.com/alix-tz/moonshines
  authors:
  - name: Alix
    surname: Chagué
    orcid: 0000-0002-0136-4434
    roles:
    - transcriber
    - aligner
    - project-manager
    - digitization
  institutions: []
  description: This dataset is composed of pages of text written in 2023 by a single
    person, copying texts taken from Guillaume Apollinaire's poems published in Alcools,
    and taken from Guillaume Apollinaire's Wikipedia page.
  language:
  - fra
  production-software: eScriptorium + Kraken
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '2023'
    notAfter: '2023'
  hands:
    count: '1'
    precision: exact
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  volume:
  - metric: characters
    count: 27734
  - metric: files
    count: 45
  - metric: lines
    count: 1016
  - metric: regions
    count: 45
  citation-file-link: https://github.com/alix-tz/moonshines/blob/master/CITATION.cff
  transcription-guidelines: The transcription strictly follows what is written on
    the images, including accentuation or capitalization errors. The segmentation
    follows the SegmOnto ontology and mostly relies on MainZone and DefaultLine. Beware
    that this dataset barely contains any ponctuation and that most lines begin with
    a capital letter.
  characters:
    mode: NFD
    members:
    - e
    - s
    - a
    - n
    - r
    - i
    - t
    - u
    - o
    - l
    - d
    - m
    - c
    - p
    - ́
    - "'"
    - v
    - g
    - b
    - h
    - ̀
    - f
    - L
    - q
    - E
    - '1'
    - A
    - C
    - x
    - y
    - ̂
    - S
    - '9'
    - P
    - M
    - j
    - T
    - D
    - '-'
    - N
    - J
    - R
    - '0'
    - z
    - O
    - I
    - '2'
    - '8'
    - V
    - F
    - G
    - U
    - '5'
    - B
    - Q
    - )
    - H
    - '3'
    - (
    - '7'
    - '6'
    - w
    - k
    - '4'
    - ̧
    - K
    - Z
    - ̈
    - Y
    - '{'
    - '}'
    - W
    - .
    - X
    - ','
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Chagué, Alix},\ndoi = {0.5281/zenodo.607720783},\n\
    month = {2},\ntitle = {moonshines},\nurl = {https://github.com/alix-tz/moonshines},\n\
    year = {2023}\n}\n"
  _apa: "Chagué A. (2023). moonshines (version 2.0.0). DOI: 0.5281/zenodo.607720783\
    \ URL: https://github.com/alix-tz/moonshines\n"
- authors:
  - name: Alix
    orcid: 0000-0002-0136-4434
    roles:
    - transcriber
    - aligner
    - quality-control
    surname: Chagué
  - name: Pascal
    roles:
    - project-manager
    surname: Dubourg Glatigny
  - name: Gilles
    roles:
    - transcriber
    surname: Pérez
  characters:
    members:
    - e
    - s
    - a
    - t
    - n
    - i
    - r
    - u
    - o
    - l
    - d
    - m
    - c
    - p
    - E
    - ','
    - ́
    - .
    - v
    - A
    - f
    - ’
    - I
    - S
    - N
    - g
    - q
    - R
    - T
    - O
    - ̀
    - '-'
    - b
    - L
    - h
    - U
    - C
    - j
    - '1'
    - D
    - M
    - P
    - '"'
    - x
    - '2'
    - ̂
    - V
    - y
    - H
    - '3'
    - J
    - '9'
    - '4'
    - B
    - G
    - (
    - F
    - '0'
    - )
    - K
    - '7'
    - '5'
    - ']'
    - '?'
    - '8'
    - ':'
    - '['
    - '6'
    - Q
    - ̧
    - z
    - k
    - Y
    - /
    - ;
    - Z
    - X
    - °
    - '#'
    - ^
    - '='
    - ⋎
    - →
    - ̈
    - '!'
    - '{'
    - w
    - W
    - +
    - ̆
    - '*'
    - '%'
    - '>'
    - <
    - '~'
    mode: NFD
  description: Ground Truth for the Digital Peraire project.
  format: Alto-XML
  hands:
    count: '1'
    precision: exact
  institutions:
  - name: Azentis
    roles:
    - digitization
  language:
  - fra
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-name: Digital Peraire
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notAfter: '1990'
    notBefore: '1928'
  title: Peraire Ground Truth
  transcription-guidelines: Les mots barrés sont transcrits par "><". Les textes suscrits
    ne sont pas signalés. Ce qui est écrit est transcrits. S'il y a des incertitutes,
    la ligne est laissée vide. La segmentation de certains documents ne convient pas
    pour l'entraînement d'un modèle de segmentation. L'ontologie SegmOnto a été utilisée.
    Quand les mots ajoutés sont insérés par un '⋎', ce graphème est transcrit par
    un ⋎.
  url: https://github.com/alix-tz/peraire-ground-truth
  volume:
  - count: 97505
    metric: characters
  - count: 67
    metric: files
  - count: 2307
    metric: lines
  - count: 151
    metric: regions
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Chagué, Alix and Pérez, Gilles},\n\
    doi = {10.5281/zenodo.7185907},\nmonth = {6},\ntitle = {Peraire Ground Truth},\n\
    url = {https://github.com/alix-tz/peraire-ground-truth},\nyear = {2023}\n}\n"
  _apa: "Chagué A., Pérez G. (2023). Peraire Ground Truth (version 2.0.0). DOI: 10.5281/zenodo.7185907\
    \ URL: https://github.com/alix-tz/peraire-ground-truth\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: RASAM
  url: https://github.com/calfa-co/rasam-dataset
  project-website: https://calfa.fr/blog/26
  authors:
  - name: Vidal-Gorène
    surname: Chahan
    roles:
    - project-manager
  - name: Lucas
    surname: Noëmie
    roles:
    - project-manager
    - quality-control
  - name: Salah
    surname: Clément
    roles:
    - transcriber
    - quality-control
  - name: Decours-Perez
    surname: Aliénor
    roles:
    - support
  - name: Dupin
    surname: Boris
    roles:
    - support
  description: "The Dataset is made up of 300 images, with their related ground truth\
    \ stored in a XML file (pageXML format). Images come from three manuscripts selected\
    \ among the collections of the BULAC Library (Paris). It covers a representative\
    \ part of the handwritten production in Arabic Maghrebi scripts and includes an\
    \ annotation of the layout (TextRegions, baselines and polygons) and the transcription\
    \ of the main text. This dataset is the result of a collaborative transcription.\
    \ All the participants are credited on the official deposit. With the support\
    \ of the French Ministry of Higher Education, Research and Innovation, the Research\
    \ Consortium Middle-East and Muslim Worlds (GIS MOMM), Calfa and the BULAC library.\n"
  language:
  - ara
  script:
  - iso: Arab
  script-type: only-manuscript
  time:
    notBefore: '1700'
    notAfter: '1899'
  hands:
    count: less-than-11
    precision: exact
  license:
  - name: Apache-2.0 License
    url: https://www.apache.org/licenses/LICENSE-2.0
  format: Page-XML
  volume:
  - metric: pages
    count: 300
  - count: 7540
    metric: lines
  - count: 300
    metric: files
  - count: 676
    metric: regions
  - count: 403034
    metric: characters
  sources:
  - reference: Vidal-Gorène, C., Lucas, N., Salah, C., Decours-Perez, A., & Dupin,
      B. (2021, September). RASAM–A Dataset for the Recognition and Analysis of Scripts
      in Arabic Maghrebi. In International Conference on Document Analysis and Recognition
      (pp. 265-281). Springer, Cham
    link: https://link.springer.com/chapter/10.1007/978-3-030-86198-8_19
  transcription-guidelines: "Full description of specifications for transcription\
    \ available on Github and in the paper.'\n"
  production-software: Calfa Vision
  automatically-aligned: false
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: TariMa
  url: https://github.com/calfa-co/tarima
  authors:
  - name: Antoine
    surname: Perrier
    orcid: 0000-0002-5035-4283
    roles:
    - project-manager
  institutions:
  - name: BULAC
    roles:
    - project-manager
  description: >-
    The dataset has been collated within the frame of the TariMa project (Tarih
    al-Maghrib. Writing History in the Maghreb in the modern and contemporary
    era), sponsored by the French agency Collex-Persee and supervised by Antoine
    Perrier (CNRS). It comprises different image resolution and size (width from
    982px to 8049px), different layouts (double page, multiple columns), and state
    of conservation. It also mixes microfilms, scans and lithographies. It
    presents a very wide variety representative of the Maghrebi Arabic production.
  project-website: https://www.collexpersee.eu/projet/tarima/
  language:
  - ara
  production-software: Calfa Vision
  script:
  - iso: Arab
    qualify: Maghrebi
  script-type: mainly-manuscript
  time:
    notBefore: '1500'
    notAfter: '1899'
  hands:
    count: more-than-10
    precision: estimated
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Page-XML
  sources:
  - reference: ''
    link: https://github.com/calfa-co/tarima
  volume:
  - metric: files
    count: 120
  - metric: lines
    count: 2673
  - metric: characters
    count: 146667
  transcription-guidelines: >-
    We follow the RASAM guidelines for the transcription of Arabic Maghrebi
    manuscripts.
  automatically-aligned: false
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: OCR17plus
  url: https://github.com/e-ditiones/OCR17plus
  project-name: E-ditiones
  project-website: https://e-ditiones.huma-num.fr/
  authors:
  - name: Gabay
    surname: Simon
    roles:
    - transcriber
    - project-manager
    - support
  - name: Jahan
    surname: Claire
    roles:
    - transcriber
    - aligner
  description: Imprimés classiques
  language:
  - frm
  script:
  - iso: Latn
  script-type: only-typed
  time:
    notBefore: '1600'
    notAfter: '1700'
  hands:
    count: 1-per-folder
    precision: exact
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  volume:
  - count: 25628
    metric: lines
  - count: 965
    metric: files
  - count: 3923
    metric: regions
  - count: 686335
    metric: characters
  production-software: Transkribus
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Jahan, Claire and Gabay, Simon},\n\
    doi = {none},\nmonth = {7},\ntitle = {OCR17+ - Layout analysis and text recognition\
    \ for 17th c. French prints},\nurl = {https://github.com/e-ditiones/OCR17plus},\n\
    year = {2021}\n}\n"
  _apa: "Jahan C., Gabay S. (2021). OCR17+ - Layout analysis and text recognition\
    \ for 17th c. French prints (version 1.0). DOI: none URL: https://github.com/e-ditiones/OCR17plus\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: GenAuto TD Corpus
  url: https://github.com/jpmjpmjpm/genauto-td-htr.git
  project-name: GenAuto
  project-website: ''
  authors:
  - name: Boutet
    surname: Jean-François
    roles:
    - transcriber
    - aligner
  - name: Merx
    surname: Jean-Pierre
    roles:
    - transcriber
    - aligner
    - project-manager
  description: "150 transcribed images from \"Tables Décennales\" French Civil Registry.\
    \ Those come from Sermaises and Romilly-sur-Seine municipalities.\n"
  language:
  - fra
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '1792'
    notAfter: '1902'
  hands:
    count: less-than-11
    precision: estimated
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  volume:
  - count: 300
    metric: pages
  - count: 150
    metric: images
  - count: 150
    metric: files
  - count: 186366
    metric: characters
  - count: 21557
    metric: lines
  - count: 608
    metric: regions
  production-software: eScriptorium + Kraken
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Boutet, Jean-François and Merx, Jean-Pierre},\n\
    doi = {10.5281/zenodo.5507403},\nmonth = {9},\ntitle = {GenAuto TD Corpus},\n\
    url = {https://github.com/jpmjpmjpm/genauto-td-htr.git},\nyear = {2021}\n}\n"
  _apa: "Boutet J., Merx J. (2021). GenAuto TD Corpus (version 1.0.0). DOI: 10.5281/zenodo.5507403\
    \ URL: https://github.com/jpmjpmjpm/genauto-td-htr.git\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Joseph Hooker HTR
  url: https://github.com/jschaefer738b/JosephHookerHTR.git
  authors:
  - name: John
    surname: Schaefer
    orcid: 0009-0006-5751-9323
    roles:
    - transcriber
    - project-manager
    - quality-control
    - support
  - name: Kiri
    surname: Ross-Jones
    roles:
    - support
  - name: Alexis
    surname: Litvine
    roles:
    - support
  institutions:
  - name: Royal Botanic Gardens, Kew
  - name: University of Cambridge
  description: >-
    XML transcriptions and JPEG images exported from Transkribus as ground truth
    for an eScriptorium-Kraken HTR model (CER 11-12%) trained on the correspondence
    of Joseph
    Dalton Hooker (1817-1911), primarily letters to William Turner Thiselton-Dyer
    (1843-1928) during the late-19th/early-20th century. Many transcriptions in
    this dataset were generated by a small team of anonymous volunteers as part of
    the Joseph Hooker Correspondence Project based at Kew Gardens. All images in
    this dataset are reproduced with the kind permission of the Board of Trustees
    of the Royal Botanic Gardens Kew (© RBG, Kew). Contact archives@kew.org for
    more information.


    HTR Model: Schaefer, John, & Litvine, Alexis. (2023). Joseph Hooker HTR Model.
    Zenodo. https://doi.org/10.5281/zenodo.8038689
  project-name: Joseph Hooker Correspondence Project
  project-website: >-
    https://www.kew.org/science/our-science/projects/joseph-hooker-correspondence-project
  language:
  - eng
  production-software: Transkribus
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '1850'
    notAfter: '1911'
  hands:
    count: '1'
    precision: estimated
  license:
  - name: CC-BY-SA 4.0
    url: https://creativecommons.org/licenses/by-sa/4.0/
  format: Page-XML
  volume:
  - metric: lines
    count: 7100
  - metric: files
    count: 337
  - metric: pages
    count: 337
  transcription-guidelines: >-
    All horizontal lines in Hooker's hand were transcribed as originally written.
    Most typescript and vertical lines in the margins were not included.
  automatically-aligned: false
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: NuBIS-OCR
  url: https://github.com/ksefil/NuBIS-OCR
  authors:
  - name: Kutay
    surname: Sefil
    roles:
    - transcriber
  institutions: []
  description: >-
    Ground truth dataset for a selection of printed books from NuBIS, the digital
    library of the Bibliothèque Interuniversitaire de la Sorbonne.
  language:
  - fra
  - lat
  production-software: eScriptorium + Kraken
  automatically-aligned: false
  script:
  - iso: Latn
  script-type: only-typed
  time:
    notBefore: '1602'
    notAfter: '1989'
  hands:
    count: unknown
    precision: exact
  license:
    name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  sources:
  - reference: ''
    link: https://nubis.bis-sorbonne.fr/
  volume:
  - metric: pages
    count: 57
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Eutyches
  url: https://github.com/malamatenia/Eutyches
  authors:
  - name: Vlachou Efstathiou
    surname: Malamatenia
    roles:
    - transcriber
    - aligner
    - project-manager
  institutions: []
  description: >-
    Ground truth for minuscule caroline of the late 9th century from the
    grammatical work "de uerbo" of Eutychès. 
  project-name: Eutyches grammaticus glossed
  language:
  - lat
  - grc
  production-software: eScriptorium + Kraken
  script:
  - iso: Latn
    qualify: Minuscule Caroline
  script-type: only-manuscript
  time:
    notBefore: '850'
    notAfter: '900'
  hands:
    count: less-than-11
    precision: estimated
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  sources:
  - reference: Codices Vossiani Latini, Brill , VLO41
    link: >-
      https://primarysources.brillonline.com/browse/vossiani-latini/vlo-041-eutyches-grammaticalia-isidorus-alphabeta
  volume:
  - metric: pages
    count: 65
  citation-file-link: https://github.com/malamatenia/Eutyches/blob/main/CITATION.cff
  transcription-guidelines: >-
    Graphematic transcription, following the guidelines of CREMMA-medieval.
    Spacing has been reestablished when dealing with semicontinua, s for long s,
    loyal to the manuscript for capital letters, abbreviations preserved,
    punctuation reduced to ";" and ".". The few greek passages have been also been
    preserved, and some of the essais de plume as well (when  forming full
    words).  Annotation of the layout made with SegmOnto controlled vocabulary.
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Vlachou-Efstathiou, Malamatenia},\n\
    title = {Eutyches \"de uerbo\" glossed}\n}\n"
  _apa: "Vlachou-Efstathiou M. Eutyches \"de uerbo\" glossed\n"

- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Burchards Dekret Digital (BDD) Segmentation Data
  url: https://github.com/michaelscho/bdd-segmentation-data
  authors:
  - name: Michael
    surname: Schonhardt
    orcid: 0000-0002-2750-1900
    roles:
    - aligner
    - project-manager
    - quality-control
  - name: Leo
    surname: Felder
    orcid: 0009-0008-7230-4229
    roles:
    - support
  - name: Torben
    surname: Jordan
    orcid: 0009-0002-2143-0520
    roles:
    - support
  - name: Christopher
    surname: Oed
    orcid: 0009-0001-3910-1832
    roles:
    - support
  institutions: []
  description: >-
    This dataset comprises PageXML for training segmentation models in Transkribus
    and Kraken. It is designed to capture the specific layout of medieval canon
    law collections. Compiled from several 11th-century manuscripts of the
    Decretum Burchardi, it supports the ongoing edition project Burchards Dekret
    Digital. Annotations are tailored to project-specific needs but can be adapted
    for other use cases. The data was first prepared using Transkribus and then
    remasked in eScriptorium for usage in Kraken.
  project-name: Burchards Dekret Digital
  project-website: https://www.adwmainz.de/projekte/burchards-dekret-digital/informationen.html
  language:
  - lat
  production-software: eScriptorium + Kraken + Transkribus
  automatically-aligned: false
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '1000'
    notAfter: '1199'
  hands:
    count: unknown
    precision: exact
  license:
    name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Page-XML
  volume:
  - metric: pages
    count: 3000

- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Shakespeare-Scott translations
  url: https://github.com/millawell/ocr-data
  project-name: "Publishing an OCR ground truth data set for reuse in an unclear copyright\
    \ setting'\n"
  project-website: https://github.com/millawell/ocr-data
  authors:
  - name: Lassner
    surname: David
  - name: Coburger
    surname: Julius
  - name: Neudecker
    surname: Clemens
  - name: Baillot
    surname: Anne
  description: "Ground truth data in German and English of Shakespeare and Scott prints\
    \ in original and different translations. \n"
  language:
  - eng
  - deu
  script:
  - iso: Latn
  - iso: Latf
  script-type: only-typed
  time:
    notBefore: '1815'
    notAfter: '1852'
  hands:
    count: unknown
    precision: exact
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  volume:
  - metric: lines
    count: 5354
  - metric: files
    count: 131
  - metric: regions
    count: 131
  - metric: characters
    count: 192264
  sources:
  - reference: ''
    link: https://zfdg.de/sb005_006
  citation-file-link: https://github.com/millawell/ocr-data/blob/master/citation.cff
  production-software: eScriptorium + Kraken
  automatically-aligned: false
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Paris Bible Project (PBP)
  url: https://github.com/parisbible/ground_truth
  authors:
  - name: Estelle
    surname: Guéville
    orcid: 0000-0003-2603-1051
    roles:
    - transcriber
    - aligner
    - project-manager
    - quality-control
  - name: David
    surname: Wrisley
    orcid: 0000-0002-0355-1487
    roles:
    - transcriber
    - aligner
    - project-manager
    - quality-control
  - name: Niccolò Acram
    surname: Cappelletto
    roles:
    - transcriber
    - aligner
    - quality-control
  institutions: []
  description: >-
    The Paris Bible Project aims to understand the production and diffusion of
    medieval Latin Bibles in Europe. The dataset includes ground truth from Paris
    Bibles produced in the 13th and 14th centuries. We also provide the most
    recent version of our list of Paris Bible manuscripts found in the world along
    with information about them.
  project-website: https://parisbible.github.io/
  language:
  - lat
  production-software: Transkribus
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '1200'
    notAfter: '1399'
  hands:
    count: more-than-10
    precision: estimated
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  volume:
  - metric: lines
    count: 1700
  - metric: files
    count: 19
  - metric: regions
    count: 40
  - metric: characters
    count: 55970
  characters:
    mode: NFKD
    members:
    - i
    - e
    - t
    - u
    - a
    - s
    - o
    - n
    - ̄
    - c
    - m
    - r
    - l
    - ꝺ
    - .
    - p
    - b
    - q
    - ⁊
    - g
    - f
    - ́
    - ꝛ
    - h
    - '-'
    - d
    - ꝫ
    - ;
    - x
    - ꝯ
    - ̾
    - ꝑ
    - ͥ
    - E
    - ̕
    - ꝝ
    - ̃
    - ꝓ
    - y
    - ̈
    - N
    - ̇
    - Q
    - ·
    - D
    - S
    - I
    - A
    - ͦ
    - C
    - T
    - ᔆ
    - ꝙ
    - H
    - F
    - P
    - ͣ
    - '2'
    - V
    - M
    - ':'
    - R
    - z
    - L
    - O
    - U
    - v
    - ℟
    - G
    - ͨ
    - ͧ
    - '&'
    - ẜ
    - ᷤ
    - ͤ
    - ʀ
    - B
    - X
    - Ꝙ
    - '?'
    - k
    - ᣳ
    - j
    - ͬ
  transcription-guidelines: 'See: https://parisbible.github.io/guidelines/'
  automatically-aligned: false
  _bibtex: "@misc{YourReferenceHere,\nauthor = {Guéville, Estelle and Wrisley, David\
    \ Joseph},\ndoi = {10.5281/zenodo.7653691},\nmonth = {10},\ntitle = {Ground Truth\
    \ Used in HTR for the Paris Bible Project},\nyear = {2021}\n}\n"
  _apa: "Guéville E., Wrisley D.J. (2021). Ground Truth Used in HTR for the Paris\
    \ Bible Project (version 1.0.0). DOI: 10.5281/zenodo.7653691\n"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Bullinger HTR Dataset
  url: https://github.com/pstroe/bullinger-htr
  authors:
  - name: Phillip Benjamin
    surname: Ströbel
    orcid: 0000-0003-2063-5495
    roles:
    - aligner
    - support
  - name: Tobias
    surname: Hodel
    orcid: 0000-0002-2071-6407
    roles:
    - aligner
    - project-manager
  - name: Christian
    surname: Sieber
    orcid: 0000-0002-9364-6921
    roles:
    - digitization
  - name: Patricia
    surname: Scheurer
    roles:
    - quality-control
    - support
  - name: David Selim
    surname: Schoch
    orcid: 0000-0002-9936-8459
    roles:
    - aligner
  - name: Anna
    surname: Janka
    roles:
    - aligner
  - name: Raphael
    surname: Schwitter
    roles:
    - aligner
  - name: Beat
    surname: Wolf
    roles:
    - aligner
  - name: Jonas
    surname: Widmer
    roles:
    - aligner
  - name: Peter
    surname: Rechsteiner
    roles:
    - quality-control
    - support
  - name: Raphael
    surname: Müller
    roles:
    - quality-control
    - digitization
    - support
  institutions: []
  description: >-
    This dataset contains 165,673 image and corresponding text line files (.png
    for images and .txt for the texts) in a random 80/10/10 training, validation
    and test set split. The source is the extensive correspondence of Swiss
    reformer Heinrich Bullinger (1504-1575) and his over 800 different
    correspondents. It therefore contains great variety in handwriting styles.
    Furthermore, it is multilingual since there are Latin and Early New High
    German (and sometimes mixed) letters. The data is split into Latin and Early
    New High German (determined with langid) and put into separate folders (de for
    Early New High German and la for Latin).
  project-website: https://www.bullinger-digital.ch/
  language:
  - lat
  - deu
  production-software: Transkribus, own
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '1523'
    notAfter: '1575'
  hands:
    count: more-than-10
    precision: estimated
  license:
    name: CC-BY-SA 4.0
    url: https://creativecommons.org/licenses/by-sa/4.0/
  format: Image-Text-Pairs
  volume:
  - metric: lines
    count: 165673
  automatically-aligned: true
  transcription-guidelines: Automated transcript alignment with Transkribus
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Caroline Minuscule by Rescribe
  url: https://github.com/rescribe/carolineminuscule-groundtruth
  project-name: "Rescribe'\n"
  project-website: https://rescribe.xyz/
  authors:
  - name: White
    surname: Nick
    roles:
    - transcriber
    - project-manager
  - name: Clérice
    surname: Thibault
    roles:
    - aligner
  - name: Karaisl
    surname: Antonia
    roles:
    - transcriber
    - project-manager
  description: "This ground truth repository is a work in process; it currently accounts\
    \ for a part of our complete Caroline Minuscule training pool of around 70 manuscripts\
    \ used for our OCRopus Caroline Minuscule model (see ocropus-models repository).\n"
  language:
  - lat
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '800'
    notAfter: '1199'
  hands:
    count: 1-per-file
    precision: exact
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  volume:
  - metric: characters
    count: 17155
  - metric: files
    count: 17
  - metric: lines
    count: 457
  - metric: regions
    count: 46
  transcription-guidelines: "In general this meant deciding between diplomatic transcription\
    \ (i.e. sticking to what it says on the page) and gently modernized features (i.e.\
    \ reinterpreting medieval signs into modern equivalents) with a view to specific\
    \ categories. Read on for a summary of the rules and the respective rationale\
    \ behind them.\nSUMMARY\nPUNCTUATION\n\n    Modern: medieval punctuation is transcribed\
    \ with modern equivalents; punctus elevatus transcribed as semicolon\n\nCAPITALIZATION\n\
    \n    Diplomatic: Original capitalization retained\n\nABBREVIATIONS\n\n    Diplomatic\
    \ where possible: Retain abbreviations and render glyphs as opposed to expanded\
    \ versions where possible\n    \"*\" where original character isn't served: OCRopus\
    \ (at the point in time of transcription) could not handle some of the medieval\
    \ glyphs, even where a Unicode version was present. Abbreviations not in OCRopus\
    \ are uniformly transcribed as \"*\", in the case of a combined character (such\
    \ as a consonant with a macron) as the base character followed by \"*\" (e.g.\
    \ \"t*\"). The list of accepted characters in OCRopus can be found in this repository,\
    \ and downloaded and used as codec in the OCRopus training process.\n\nSPACING\n\
    \n    Diplomatic: Preserve manuscript spacing, i.e. give diplomatic transcription\n\
    \nNUMBERS\n\n    Diplomatic: retain original version of both Roman and Arabic\
    \ numerals'"
  characters:
    mode: NFD
    members:
    - i
    - e
    - t
    - u
    - a
    - s
    - n
    - o
    - r
    - m
    - c
    - d
    - l
    - p
    - .
    - b
    - q
    - g
    - '*'
    - h
    - ;
    - ̃
    - f
    - x
    - I
    - ̄
    - E
    - N
    - ̨
    - ':'
    - '&'
    - S
    - ꝑ
    - C
    - A
    - đ
    - D
    - U
    - T
    - ꝓ
    - Q
    - v
    - ','
    - O
    - R
    - P
    - L
    - M
    - æ
    - H
    - F
    - '?'
    - '1'
    - y
    - ꝝ
    - ꝙ
    - V
    - '4'
    - B
    - z
    - '5'
    - X
    - '6'
    - ꝛ
    - /
    - "'"
    - '0'
    - '2'
    - '9'
    - K
    - '-'
  production-software: Unknown [Automatically filled]
  automatically-aligned: false
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Éditer la correspondance de Constance de Salm (1767-1845)
  url: https://github.com/sbiay/CdS-edition/tree/main/htr/verite-terrain
  authors:
  - name: Biay
    surname: Sébastien
    roles:
    - transcriber
  institutions: []
  description: >-
    La correspondance de Constance de Salm (femme de lettres française) comprend
    différents spécimens d’écriture du début du XIXe siècle. Le jeu de données
    atteste les mains de quatre copistes différents.
  project-website: https://dhiha.hypotheses.org/2945
  language:
  - fra
  production-software: eScriptorium + Kraken
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '1800'
    notAfter: '1825'
  hands:
    count: less-than-11
    precision: estimated
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  sources:
  - reference: >-
      Salm, C. de (1767-1845). Correspondance. Société des Amis du Vieux Toulon
      et de sa Région, Fonds Salm. Archiv Schloss Dyck, fonds Constance de Salm.
    link: ''
  volume:
  - metric: lines
    count: 1754
  transcription-guidelines: >-
    Usages scribaux respectés : abréviations, fautes, accentuation respectés.
    Allographes normalisés (s long).
  automatically-aligned: false
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: The Sloane Lab HTR Model
  url: https://github.com/sloanelab-org/HTR-Model
  authors:
  - name: Marco
    surname: Humbel
    orcid: 0000-0003-1861-162X
    roles:
    - aligner
  - name: 'Andreas '
    surname: Vlachidis
    roles:
    - project-manager
  - name: 'Julianne '
    surname: Nyhan
    roles:
    - project-manager
  - name: 'The British Museum '
    surname: ''
    roles:
    - digitization
  institutions:
  - name: AEL Data Service
    roles:
    - transcriber
  description: >
    This repository contains Handwritten Text Recognition training data (layout
    segmentation and transcriptions ) for the Sloane Lab HTR model. The HTR model
    is trained on the handwriting of Hans Sloane (1660-1753). 


    Funding: 

    Enlightenment Architectures: Leverhulme Trust Project Grant 2016-21

    The Sloane Lab: Towards a National Collection – AHRC AH/W003457/1
  project-name: 'The Sloane Lab: Looking back to build future shared collections'
  project-website: https://sloanelab.org/
  language:
  - eng
  production-software: Transkribus
  automatically-aligned: false
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '1680'
    notAfter: '1750'
  hands:
    count: less-than-11
    precision: estimated
  license:
    name: CC BY-NC-SA 4.0
    url: https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en
  format: Alto-XML
  sources:
  - reference: >-
      Sloan, K., Ortolja-Baird, A., Nyhan, J., Pickering, V., & Fleming, M.
      (Eds.). (2019). Sir Hans Sloane’s Miscellanea which comprises his
      catalogues of Miscellanies, Antiquities, Seals, Pictures, Mathematical
      Instruments, Agate Handles and Agate Cups, Bottles, Spoons (Digital
      Edition). 
    link: >-
      https://enlightenmentarchitectures.reconstructingsloane.org/cataloguemiscellanies/index.html
  volume:
  - metric: pages
    count: 196
  citation-file-link: https://github.com/sloanelab-org/HTR-Model/blob/main/Citation_SL_HTR_Model.cff
  transcription-guidelines: >-
    Transcription rules can be found alongside the dataset. They include the
    following rules:

    - Exclusion of overwritten text from training data

    - Exclusion of text not identified by the automated layout recognition

    - Exclusion of faded text

    - Inserted words are treated as separate text lines

    - Exclusion of textual features such as dotted lines

    - Base line separation for text written apart
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: EpiSearch HTR
  url: https://github.com/vedph/episearch-htr
  authors:
  - name: Lorenzo
    surname: Calvelli
    orcid: 0000-0002-0920-9156
    roles:
    - project-manager
  - name: Tatiana
    surname: Tommasi
    orcid: 0009-0000-2815-0113
    roles:
    - transcriber
  - name: Federico
    surname: Boschetti
    orcid: 0000-0002-7810-7735
    roles:
    - support
  institutions: []
  description: Ground Truth for Astori’s letters (see the README.md file for details)
  project-name: EpiSearch
  project-website: https://github.com/vedph/episearch-htr
  language:
  - ita
  production-software: eScriptorium + Kraken
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '1705'
    notAfter: '1709'
  hands:
    count: '1'
    precision: exact
  license:
  - name: CC-BY-SA 4.0
    url: https://creativecommons.org/licenses/by-sa/4.0/
  format: Alto-XML
  volume:
  - metric: files
    count: 34
  automatically-aligned: false
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: HPGTR Dataset
  url: https://github.com/vivianpl/hpgtr
  authors:
  - name: Paraskevi
    surname: Platanou
    roles:
    - transcriber
    - project-manager
  - name: John
    surname: Pavlopoulos
    orcid: 0000-0001-9188-7425
    roles:
    - transcriber
    - project-manager
  - name: Georgios
    surname: Papaioannou
    orcid: 0000-0003-4774-0746
    roles:
    - transcriber
    - project-manager
  institutions: []
  description: >-
    The HPGT dataset consists of images of Handwritten Paleographic  Greek Text, derived
    from the Bodleian Libraries' Greek manuscript  collection, specifically the Barocci
    collection, which dates from
    the 8th to the 17th centuries. This dataset is divided into two
    editions: HPGTR.N, which contains 77 unsegmented images categorized  by century
    from the 10th to the 16th, and HPGTR.S, which features  carefully segmented lines
    from selected images to facilitate machine  learning tasks. The dataset captures
    a range of characteristics,  including variations in writing style, page conditions,
    and  manuscript production details.

    This dataset is part of the following work: Paraskevi Platanou,  John Pavlopoulos,
    and Georgios Papaioannou. 2022. Handwritten  Paleographic Greek Text Recognition:
    A Century-Based Approach.  In *Proceedings of the "Thirteenth Language Resources
    and Evaluation Conference"*,
    pages 6585–6589, Marseille, France. European Language Resources Association.
  language:
  - grc
  transcription-guidelines: |
    - Abbreviation and ligatures were resolved
    - Minuscule in the beginning of sentences were kept as such.
    - Polytonic spelling and diaeresis are kept
  production-software: Unknown
  automatically-aligned: false
  characters:
    mode: NFD
  script:
  - iso: Grek
  script-type: only-manuscript
  time:
    notBefore: '0901'
    notAfter: '1600'
  hands:
    count: less-than-11
    precision: exact
  license:
    name: CC-BY-NC-SA 3.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Page-XML
  volume:
  - {count: 1698, metric: lines}
  - {count: 70, metric: files}
  - {count: 178, metric: regions}
  - {count: 64952, metric: characters}
- authors:
  - name: Maxime
    orcid: 0009-0006-2076-1220
    roles:
    - transcriber
    - aligner
    - quality-control
    surname: Guénette
  - name: Mathilde
    orcid: 0000-0003-1642-8610
    roles:
    - transcriber
    - aligner
    - quality-control
    surname: Verstraete
  - name: Alix
    orcid: 0000-0002-0136-4434
    roles:
    - quality-control
    - support
    surname: Chagué
  - name: Marcello
    orcid: 0000-0001-6424-3229
    roles:
    - project-manager
    surname: Vitali-Rosati
  automatically-aligned: false
  characters:
    members:
    - α
    - ι
    - ́
    - ο
    - ε
    - ν
    - σ
    - τ
    - ̓
    - υ
    - ρ
    - ·
    - κ
    - λ
    - η
    - ̀
    - π
    - μ
    - δ
    - ω
    - ͂
    - θ
    - γ
    - ̔
    - χ
    - φ
    - ':'
    - β
    - ᾽
    - ⋇
    - ⁛
    - ξ
    - ̈
    - '~'
    - ζ
    - ψ
    - ※
    - ∻
    - ͳ
    mode: NFD
  description: >-
    Ground Truth dataset for the Codex palatinus graecus 23 (Palatine Anthology),
    byzantine writing from the X^th^ century. 
  format: Alto-XML
  hands:
    count: less-than-11
    precision: estimated
  institutions: []
  language:
  - grc
  license:
    name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  production-software: eScriptorium + Kraken
  project-website: https://anthologiagraeca.org/
  schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  script:
  - iso: Grek
    qualify: byzantine
  script-type: only-manuscript
  sources:
  - link: https://doi.org/10.11588/diglit.3449
    reference: >-
      Cod. Pal. graec. 23 (10e s. av., Constantinople).  Universitätsbibliothek
      Heidelberg, Germany.
  time:
    notAfter: '1000'
    notBefore: '900'
  title: Ground truth for the Palatine Anthology (HTR_CPgr23)
  transcription-guidelines: we do not resolve the abbreviation, except when they are
    non ambiguous. Full guidelines available here https://gitlab.huma-num.fr/ecrinum/anthologia/htr_cpgr23
  url: https://gitlab.huma-num.fr/ecrinum/anthologia/htr_cpgr23
  volume:
  - count: 114273
    metric: characters
  - count: 70
    metric: files
  - count: 3374
    metric: lines
  - count: 50
    metric: pages
  - count: 574
    metric: regions
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: La Correspondances Jacques Doucet - René Jean
  url: https://gitlab.inha.fr/snr/LaCorrespondanceDoucetReneJean
  authors:
  - name: Cugy
    surname: Pascale
    roles:
    - transcriber
    - project-manager
    - quality-control
  - name: Fieschi
    surname: Caroline
    roles:
    - project-manager
    - quality-control
  - name: Peyrard
    surname: Alix
    roles:
    - transcriber
    - quality-control
  - name: Prohin
    surname: Lucie
    roles:
    - transcriber
    - quality-control
  - name: Sarda
    surname: Marie-Anne
    roles:
    - support
  institutions:
  - name: Institut National de l'histoire de l'art (INHA)
    roles:
    - transcriber
    - project-manager
    - quality-control
  - name: Bibliothèque nationale de France
    roles:
    - digitization
  description: >-
    Projet entrepris dans le cadre du programme La Bibliothèque d’art et
    d’archéologie de Jacques Doucet : corpus, savoirs et réseaux de l’Institut
    national d’histoire de l’art à partir d’un corpus de lettres et documents
    conservés au Département des manuscrits de la Bibliothèque nationale de France
    sous la cote NAF 13124, une des principales sources sur la relation entre
    Doucet et René Jean qu’il engagea comme bibliothécaire le 2 juin 1908.
  project-name: PENSE@INHA
  project-website: https://skylab.inha.fr/PENSE/LettresDeJacquesDoucetAReneJean1908-1929/
  language:
  - fra
  production-software: Transkribus
  script:
  - iso: Latn
  script-type: mainly-manuscript
  time:
    notBefore: '1908'
    notAfter: '1929'
  hands:
    count: less-than-11
    precision: exact
  license:
  - name: Etalab OL 2.0
    url: https://spdx.org/licenses/etalab-2.0.html
  format: Alto-XML
  volume:
  - metric: characters
    count: 83312
  - metric: lines
    count: 2987
  - metric: pages
    count: 200
  - metric: files
    count: 200
  automatically-aligned: false
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Les Papiers Barye
  url: https://gitlab.inha.fr/snr/LesPapiersBarye
  authors:
  - name: Claass
    surname: Victor
    roles:
    - transcriber
    - project-manager
    - quality-control
  - name: Gain
    surname: Justine
    roles:
    - transcriber
    - quality-control
  - name: Martin-Vigier
    surname: Suzanne
    roles:
    - transcriber
    - quality-control
  institutions:
  - name: Institut National de l'histoire de l'art (INHA)
    roles:
    - transcriber
    - aligner
    - project-manager
    - quality-control
    - digitization
  description: >-
    Ensemble de documents autour du sculpteur Antoine-Louis Barye. Paris,
    Bibliothèque de l’Institut national d’histoire de l’art, collections Jacques
    Doucet, Archives 166. Institut National de l’Histoire de l’art (INHA) /
    Set of documents about the sculptor Antoine-Louis Barye. Paris,
    Library of the Institut national d'histoire de l'art, Jacques
    Doucet, Archives 166. National Institute of Art History (INHA)
  project-name: PENSE@INHA
  project-website: https://skylab.inha.fr/PENSE/LesPapiersBarye/
  language:
  - fra
  production-software: Transkribus
  script:
  - iso: Latn
  script-type: mainly-manuscript
  time:
    notBefore: '1819'
    notAfter: '1914'
  hands:
    count: more-than-10
    precision: exact
  license:
  - name: Etalab OL 2.0
    url: https://spdx.org/licenses/etalab-2.0.html
  format: Alto-XML
  volume:
  - metric: characters
    count: 362629
  - metric: lines
    count: 17880
  - metric: pages
    count: 918
  - metric: files
    count: 918
  automatically-aligned: false
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Ground truth for Neue Zürcher Zeitung black letter period
  url: https://zenodo.org/record/3333627#.YhN1G1vMLUQ
  project-name: "impresso'\n"
  project-website: https://impresso-project.ch/
  authors:
  - name: Ströbel
    surname: Phillip Benjamin
    roles:
    - transcriber
    - aligner
    - project-manager
    - quality-control
    - support
  - name: Clematide
    surname: Simon
    roles:
    - transcriber
    - quality-control
  - name: Watter
    surname: Camille
    roles:
    - transcriber
  - name: Meraner
    surname: Isabell
    roles:
    - transcriber
  description: "The Neue Zürcher Zeitung (NZZ) has been publishing in black letter\
    \ from its very first issue in 1780 until 1947. From this time period, we randomly\
    \ sampled one frontpage per year, resulting in a total of 167 pages. We chose\
    \ frontpages because they typically contain highly relevant material and because\
    \ we want to make sure not to sample pages containing exclusively advertisements\
    \ or stock information. During certain periods, the NZZ was published several\
    \ times a day, and there were supplements, too. Due to incomplete metadata, the\
    \ sampling included frontpages from supplements. We then manually corrected the\
    \ pages, so it can be used as a ground truth to improve the OCR of black letter\
    \ in historical newspapers.i\n"
  language:
  - deu
  script:
  - iso: Latn
  script-type: only-typed
  time:
    notBefore: '1780'
    notAfter: '1946'
  hands:
    count: less-than-11
    precision: estimated
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  volume:
  - count: 43173
    metric: lines
  - count: 167
    metric: files
  - count: 6318
    metric: regions
  - count: 1768146
    metric: characters
  production-software: Transkribus
  automatically-aligned: false
  _bibtex: "@dataset{phillip_strobel_2019_3333627,\n  author       = {Phillip Ströbel\
    \ and\n                  Simon Clematide},\n  title        = {{Ground truth for\
    \ Neue Zürcher Zeitung black letter \n                   period}},\n  month  \
    \      = jul,\n  year         = 2019,\n  publisher    = {Zenodo},\n  version \
    \     = {v1.0},\n  doi          = {10.5281/zenodo.3333627},\n  url          =\
    \ {https://doi.org/10.5281/zenodo.3333627}\n}"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Gwalther Handwriting Ground Truth
  url: https://zenodo.org/record/4780947#.YhN5pVvMLUQ
  project-name: "Bullinger digital'\n"
  project-website: https://www.bullinger-digital.ch/
  authors:
  - name: Ströbel
    surname: Phillip Benjamin
    roles:
    - aligner
    - quality-control
    - support
  - name: Stotz
    surname: Peter
    roles:
    - transcriber
  description: "This is ground truth for Rudolph Gwalther’s (1519-1586) handwriting\
    \ taken from his book \"Lateinische\" Gedichte\", where he accumulated writings\
    \ between 1540 and 1580. Data collection and ground truth creation:   At the time\
    \ we collected the data, we found 150 images with corresponding transcriptions\
    \ by Peter Stotz on e-manuscripta (reference: Gwalther, Rudolf: Lateinische Gedichte.\
    \ Zürich, 1540-1580. Zentralbibliothek Zürich, Ms D 152, https://doi.org/10.7891/e-manuscripta-26750\
    \ / Public Domain Mark) . We removed 8 images with too many corrections or vertical\
    \ texts. Next, we uploaded the images into the Transkribus platform, applied the\
    \ line recognition tool and manually copied the transcribed text lines into the\
    \ recognised line boxes. During this process, we made some corrections, which\
    \ were mainly due to inconsistencies in punctuation and capitalised letters.\n"
  language:
  - lat
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '1540'
    notAfter: '1580'
  hands:
    count: '1'
    precision: exact
  license:
    name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  volume:
  - count: 4040
    metric: lines
  - count: 142
    metric: files
  - count: 155
    metric: regions
  - count: 144301
    metric: characters
  production-software: Transkribus
  automatically-aligned: false
  _bibtex: "@dataset{peter_stotz_2021_4780947,\n  author       = {Peter Stotz and\n\
    \                  Phillip Ströbel},\n  title        = {{bullinger-digital/gwalther-handwriting-ground-\
    \ \n                   truth: Initial release}},\n  month        = may,\n  year\
    \         = 2021,\n  publisher    = {Zenodo},\n  version      = {v1.0},\n  doi\
    \          = {10.5281/zenodo.4780947},\n  url          = {https://doi.org/10.5281/zenodo.4780947}\n\
    }"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: BiblIA
  url: https://zenodo.org/record/5167263
  project-name: "Scripta PSL\n"
  project-website: https://escripta.hypotheses.org/
  authors:
  - name: Stökl Ben Ezra
    surname: Daniel
    roles:
    - transcriber
    - project-manager
  - name: Brown-DeVost
    surname: Bronson
  - name: Jablonski
    surname: Pawel
  - name: Kiessling
    surname: Benjamin
  - name: Lolli
    surname: Elena
  - name: Lapin
    surname: Hayim
  description: "This dataset for Handwritten Text Recognition includes layout segmentation\
    \ (regions, toplines and linepolygons) and unicode-transcriptions in alto 4.2\
    \ XML for 202 images of Medieval Hebrew manuscripts from the Bibliothèque nationale\
    \ de France (BnF, National Library of France) and the Biblioteca Apostolica Vaticana\
    \ (BAV, Vatican Library) corresponding to the article \"BiblIA - a General Model\
    \ for Medieval Hebrew Manuscripts and an Open Annotated Dataset\" by Daniel Stökl\
    \ Ben Ezra, Bronson Brown-DeVost, Pawel Jablonski, Benjamin Kiessling, Elena Lolli,\
    \ and Hayim Lapin, published in HIP@ICDAR 2021 held in Lausanne, September 2021.\n"
  language:
  - heb
  script:
  - iso: Hebr
  script-type: only-manuscript
  time:
    notBefore: '1000'
    notAfter: '1499'
  hands:
    count: more-than-10
    precision: exact
  license:
  - name: CC-BY-SA 4.0
    url: https://creativecommons.org/licenses/by-sa/4.0/
  format: Alto-XML
  volume:
  - metric: files
    count: 202
  - metric: pages
    count: 202
  - metric: lines
    count: 12461
  - metric: regions
    count: 509
  - metric: characters
    count: 278641
  transcription-guidelines: "See the guidelines detailed in Stoekl Ben Ezra Daniel,\
    \ Brown-DeVost Bronson, Jablonski Pawel, Lapin Hayim, Kiessling Benjamin, and\
    \ Lolli Elena. 2021. BiblIA - a General Model for Medieval Hebrew Manuscripts\
    \ and an Open Annotated Dataset. In The 6th International Workshop on Historical\
    \ Document Imaging and Processing (HIP '21). Association for Computing Machinery,\
    \ New York, NY, USA, 61–66. DOI:https://doi.org/10.1145/3476887.3476896'\n"
  production-software: eScriptorium + Kraken
  automatically-aligned: false
  _bibtex: "@dataset{stokl_ben_ezra_2021_5167263,\n  author       = {Stökl Ben Ezra,\
    \ Daniel and\n                  Brown-DeVost, Bronson and\n                  Jablonski,\
    \ Pawel and\n                  Kiessling, Benjamin and\n                  Lolli,\
    \ Elena and\n                  Lapin, Hayim},\n  title        = {BiblIA - an Open\
    \ Annotated Dataset},\n  month        = aug,\n  year         = 2021,\n  publisher\
    \    = {Zenodo},\n  version      = {1.0},\n  doi          = {10.5281/zenodo.5167263},\n\
    \  url          = {https://doi.org/10.5281/zenodo.5167263}\n}"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: The POPP datasets
  url: https://zenodo.org/record/6581158
  authors:
  - name: Thomas
    surname: Constum
    roles:
    - aligner
    - quality-control
    - support
  - name: Nicolas
    surname: Kempf
  - name: Pierrick
    surname: Tranouez
  - name: Thierry
    surname: Paquet
    roles:
    - project-manager
  - name: Sandra
    surname: Brée
    orcid: 0000-0002-2802-5563
    roles:
    - transcriber
    - project-manager
  - name: François
    surname: Merveille
    roles:
    - transcriber
  institutions: []
  description: >-
    The POPP datasets is a set of 3 datasets created within the POPP project
    (Project for the Oceration of the Paris Population Census) for the task of
    handwriting text recognition. These datasets have been published in
    "Recognition and information extraction in historical handwritten tables:
    toward understanding early 20th century Paris census" at DAS 2022.


    The 3 datasets are called “Generic dataset”, “Belleville”, and “Chaussée
    d’Antin” and contains lines made from the extracted rows of census tables from
    1926. Each table in the Paris census contains 30 rows, thus each page in these
    datasets corresponds to 30 lines.
  project-name: Project for the Oceration of the Paris Population Census
  project-website: https://popp.hypotheses.org
  language:
  - fra
  production-software: Pivan
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '1926'
    notAfter: '1926'
  hands:
    count: more-than-10
    precision: estimated
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  volume:
  - metric: lines
    count: 7050
  transcription-guidelines: >
    The text is transcribed as in the image (no correction of mispelling, no
    resolution of abbreviation).

    Since the lines are extracted from table rows, we defined 4 special characters
    to describe the structure of the text:
        ¤ : indicates an empty cell
        / : indicates the separation into columns
        ? : indicates that the content of the cell following this symbol is written
    above the regular baseline
        ! : indicates that the content of the cell following this symbol is written
    below the regular baseline
  automatically-aligned: false
  _bibtex: "@dataset{constum_2022_6581158,\n  author       = {CONSTUM, Thomas and\n\
    \                  KEMPF, Nicolas and\n                  PAQUET, Thierry and\n\
    \                  TRANOUEZ, Pierrick and\n                  CHATELAIN, Clément\
    \ and\n                  BREE, Sandra and\n                  MERVEILLE, François},\n\
    \  title        = {{POPP Datasets : Datasets for handwriting \n              \
    \     recognition from French population census}},\n  month        = may,\n  year\
    \         = 2022,\n  publisher    = {Zenodo},\n  version      = {v1.0},\n  doi\
    \          = {10.5281/zenodo.6581158},\n  url          = {https://doi.org/10.5281/zenodo.6581158}\n\
    }"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Wien ÖNB Cod. 2160 f. 164-184 Ground Truth from HTR Winter School 2022
  url: https://zenodo.org/record/7467027#.Y6LRj3bMK3B
  authors:
  - name: Geelhaar
    surname: Tim
    orcid: 0000-0002-7653-5859
    roles:
    - transcriber
    - project-manager
  - name: D'Amico
    surname: Sara
    orcid: 0000-0002-8937-2040
    roles:
    - transcriber
  - name: Hofmann
    surname: Lara
    orcid: 0000-0003-4698-3906
    roles:
    - transcriber
  - name: Gnasso
    surname: Alessandro
    orcid: 0000-0001-5964-2989
    roles:
    - transcriber
  - name: Audebrand
    surname: Justine
    roles:
    - transcriber
  - name: Stitts
    surname: Jeremy
    orcid: 0000-0001-6988-1836
    roles:
    - transcriber
  - name: Sweeney
    surname: Mary
    orcid: 0000-0001-7028-2072
    roles:
    - transcriber
  - name: Atwood
    surname: Grace
    orcid: 0000-0002-1546-6546
    roles:
    - transcriber
  institutions: []
  description: >-
    This is Ground Truth data created during the HTR Winter School 2022 for the
    Cod. 2160 ÖNB that contains one version of the so called Lex Dei. 
  project-name: HTR Winter School 2022, Vienna
  language:
  - lat
  production-software: Transkribus
  script:
  - iso: Latn
    qualify: Carolingian Minuscule
  script-type: only-manuscript
  time:
    notBefore: '850'
    notAfter: '900'
  hands:
    count: '1'
    precision: exact
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Alto-XML
  sources:
  - reference: ''
    link: http://data.onb.ac.at/rec/AC13956457
  volume:
  - metric: pages
    count: 40
  transcription-guidelines: >-
    Abbreviations resolved, but no normalization and no correcting of mispelling.
    No transcription of initials and interlinear script.
  automatically-aligned: false
  _bibtex: "@dataset{attwood_2022_7467027,\n  author       = {Attwood and\n      \
    \            Sweeney and\n                  Stitts and\n                  Audebrand\
    \ and\n                  D'Amico and\n                  Geelhaar and\n       \
    \           Hofmann and\n                  Gnasso},\n  title        = {{Wien ÖNB\
    \ Cod. 2160 f. 164-184 Ground Truth from \n                   HTR Winter School\
    \ 2022}},\n  month        = dec,\n  year         = 2022,\n  publisher    = {Zenodo},\n\
    \  doi          = {10.5281/zenodo.7467027},\n  url          = {https://doi.org/10.5281/zenodo.7467027}\n\
    }"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Padeřov-Bible-handwriting-ground-truth
  url: https://zenodo.org/record/7467034#.Y6LQZBWZM2w
  authors:
  - name: Anna
    surname: Michalcová
    orcid: 0000-0003-4760-6950
    roles:
    - transcriber
    - aligner
    - project-manager
    - quality-control
    - support
  - name: Jan
    surname: Odstrčilík
    orcid: 0000-0001-9104-9827
    roles:
    - project-manager
    - support
  - name: Laura
    surname: Maniaková
    roles:
    - transcriber
  - name: Eliška
    surname: Pěnkavová
    orcid: 0000-0002-5494-8847
  - name: Kamil
    surname: Bazelides
    orcid: 0000-0002-5199-8726
  - name: Jan
    surname: Hajič
    orcid: 0000-0002-9207-567X
  - name: Hana
    surname: Kreisingerová
    orcid: 0000-0002-2924-598X
  - name: Jitka
    surname: Filipová
    orcid: 0000-0002-3570-4038
  - name: Chi-hung
    surname: Liu
  - name: Martina
    surname: Dvořáková
  institutions:
  - name: Institute of the Czech Language
  - name: Masaryk Institute and Archives
  description: >-
    This is ground truth based on the Padeřov Bible (Vienna, Austrian National
    Library, shelfmark Cod. 1175, 1432–1435), the bible of the third redaction of
    the Old Czech Bible translation. The transcription rules were based on
    semi-diplomatic transcription rules set by PERO OCR and Směrnice pro vydávání
    starších českých textů set by Jiří Daňhelka
    (https://vokabular.ujc.cas.cz/moduly/edicnipoznamka.aspx?id=DanhelkaSmernice).
    Abbreviations were tagged and expanded.
  project-name: HTR Winter School 2022, Vienna
  project-website: >-
    https://www.oeaw.ac.at/imafo/veranstaltungen/detail/introduction-into-handwritten-text-recognition-1
  language:
  - ces
  production-software: Transkribus
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '1432'
    notAfter: '1435'
  hands:
    count: '1'
    precision: exact
  license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Page-XML
  sources:
  - reference: ''
    link: >-
      https://search.onb.ac.at/primo-explore/fulldisplay?docid=ONB_alma21302405460003338&context=L&adaptor=Local%20Search%20Engine&vid=ONB&lang=de_DE&search_scope=ONB_gesamtbestand&tab=default_tab&query=addsrcrid,exact,AC13954505
  volume:
  - metric: pages
    count: 63
  transcription-guidelines: >-
    Transliteration. Differentiates long and short "s". Abbreviations tagged and
    expanded. No misspelling corrections.
  automatically-aligned: false
  _bibtex: "@dataset{michalcova_2022_7467034,\n  author       = {Michalcová, Anna\
    \ and\n                  Bazelides, Kamil and\n                  Hajič, Jan and\n\
    \                  Pěnkavová, Eliška and\n                  Maniaková, Laura and\n\
    \                  Kreisingerová, Hana and\n                  Filipová, Jitka\
    \ and\n                  Chi-hung Lu and\n                  Dvořáková, Martina},\n\
    \  title        = {{Padeřov-Bible-handwriting-ground-truth: Initial \n       \
    \            release}},\n  month        = dec,\n  year         = 2022,\n  publisher\
    \    = {Zenodo},\n  doi          = {10.5281/zenodo.7467034},\n  url          =\
    \ {https://doi.org/10.5281/zenodo.7467034}\n}"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Belfort
  url: https://zenodo.org/record/8041668
  authors:
  - name: Solène
    surname: Tarride
    orcid: 0000-0001-6174-9865
  - name: Tristan
    surname: Faine
  - name: Mélodie
    surname: Boillet
    orcid: 0000-0002-0618-7852
  - name: Harold
    surname: Mouchère
    orcid: 0000-0001-6220-7216
  - name: Christopher
    surname: Kermorvant
    orcid: 0000-0002-7508-4080
  institutions: []
  description: >
    This dataset includes minutes of Belfort municipal council drawn up between
    1790 and 1946. Documents include deliberations, lists of councillors,
    convocations, and agendas. The dataset includes 24,105 text-line images that
    were automatically detected from pages. 

    Up to four transcriptions are available for each line image: 

    * two from human annotators (in `Transcriptions/callico_1/` and
    `Transcriptions/callico_2/`)

    * two from automatic models (in `Transcriptions/dan/` and
    `Transcriptions/pylaia/`) 
  project-name: Handwritten Text Recognition from Crowdsourced Annotations
  project-website: https://arxiv.org/abs/2306.10878
  language:
  - fra
  production-software: Callico
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '1790'
    notAfter: '1946'
  hands:
    count: more-than-10
    precision: estimated
  license:
    name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Image-Text-Pairs
  sources:
  - reference: >-
      Solène Tarride, Tristan Faine, Mélodie Boillet, Harold Mouchère, &
      Christopher Kermorvant. (2023). The Belfort dataset: Handwritten Text
      Recognition from Crowdsourced Annotations [Data set]. 7th International
      Workshop on Historical Document Imaging and Processing (HIP'23), San
      José, California, USA. Zenodo. https://doi.org/10.5281/zenodo.8041668
    link: https://arxiv.org/abs/2306.10878
  volume:
  - metric: lines
    count: 24105
  _bibtex: "@dataset{solene_tarride_2023_8041668,\n  author       = {Solène Tarride\
    \ and\n                  Tristan Faine and\n                  Mélodie Boillet\
    \ and\n                  Harold Mouchère and\n                  Christopher Kermorvant},\n\
    \  title        = {{The Belfort dataset: Handwritten Text Recognition \n     \
    \              from Crowdsourced Annotations}},\n  month        = jun,\n  year\
    \         = 2023,\n  publisher    = {Zenodo},\n  doi          = {10.5281/zenodo.8041668},\n\
    \  url          = {https://doi.org/10.5281/zenodo.8041668}\n}"
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: EPARCHOS
  url: https://zenodo.org/records/4095301
  authors:
  - name: Aleksandros
    surname: Papazoglou
    roles:
    - transcriber
    - project-manager
  - name: Ioannis
    surname: Pratikakis
    orcid: 0000-0002-4124-3688
    roles:
    - transcriber
    - project-manager
  - name: Kleopatra
    surname: Markou
    roles:
    - transcriber
    - project-manager
  - name: Lazaros
    surname: Tsochatzidis
    orcid: 0000-0002-4634-7419
    roles:
    - transcriber
    - project-manager
  institutions: []
  description: >-
    The dataset originates from a Greek handwritten codex that dates from around
    1500-1530. This is the subset of the codex British Museum Addit. 6791, written
    by two hands, one by Antonius Eparchos and the other by Camillos Zanettus (ff.
    104r-174v) and delivers texts by Hierocles (In Aureum carmen), Matthaeus
    Blastares (Collectio alphabetica) and, notably, texts by Michael Psellos (De
    omnifaria doctrina). The writing delivers the most important abbreviations,
    logograms and conjunctions, which are cited in virtually every Greek minuscule
    handwritten codex from the years of the manuscript transliteration and the
    prevalence of the minuscule script (9th century) to the post-Byzantine years.
    This dataset consists of 120 scanned handwritten text pages, containing 9285
    lines of text, 18809 words (6787 unique words). For each page, a PageXML is
    provided containing the following groundtruth:
    1. Text region polygon coordinates
    2. Text line polygon coordinates with the corresponding transcription text
    3. Word polygon coordinated with the corresponding transcription text
  language:
  - grc
  transcription-guidelines: |
    - Abbreviation and ligatures were resolved
    - Minuscule in the beginning of sentences were kept as such.
    - Polytonic spelling and diaeresis are kept
  production-software: Unknown
  automatically-aligned: false
  characters:
    mode: NFD
  script:
  - iso: Grek
  script-type: only-manuscript
  time:
    notBefore: '1500'
    notAfter: '1530'
  hands:
    count: less-than-11
    precision: exact
  license:
    name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Page-XML
  volume:
  - metric: lines
    count: 2272
  - metric: characters
    count: 116894
  - metric: files
    count: 120
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Stavronikita Monastery Collection No. 79
  url: https://zenodo.org/records/5578136
  authors:
  - name: Ioannis
    surname: Pratikakis
    orcid: 0000-0002-4124-3688
    roles:
    - transcriber
    - project-manager
  - name: Aleksandros
    surname: Papazoglou
    roles:
    - transcriber
    - project-manager
  - name: Symeon
    surname: Symeonidis
    orcid: 0000-0002-3259-614X
    roles:
    - transcriber
    - project-manager
  - name: Lazaros
    surname: Tsochatzidis
    orcid: 0000-0002-4634-7419
    roles:
    - transcriber
    - project-manager
  institutions: []
  description: >-
    It comprises manuscripts made of paper, written in the 16th century and its  dimensions
    are 220X165 mm. The manuscript is embellished with epititles and  red initials.
    Tachygraphical symbols and abbreviations are encountered in  the manuscript as
    well. The dataset of XΦ79 consists of 803 lines of text  containing 4389 words
    (2069 unique words) that are distributed over  40 scanned handwritten text pages.
    For each page, a PageXML is provided containing the following ground-truth:
    1. Text region polygon coordinates
    2. Text line polygon coordinates with the corresponding transcription text
    3. Word polygon coordinated with the corresponding transcription text
  language:
  - grc
  transcription-guidelines: |
    - Abbreviation and ligatures were resolved
    - Minuscule in the beginning of sentences were kept as such.
    - Polytonic spelling and diaeresis are kept
  production-software: Unknown
  automatically-aligned: false
  characters:
    mode: NFD
  script:
  - iso: Grek
  script-type: only-manuscript
  time:
    notBefore: '1501'
    notAfter: '1600'
  hands:
    count: less-than-11
    precision: exact
  license:
    name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Page-XML
  volume:
  - {count: 803, metric: lines}
  - {count: 40, metric: files}
  - {count: 40, metric: regions}
  - {count: 29112, metric: characters}
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Stavronikita Monastery Collection No. 114
  url: https://zenodo.org/records/5578251
  authors:
  - name: Ioannis
    surname: Pratikakis
    orcid: 0000-0002-4124-3688
    roles:
    - transcriber
    - project-manager
  - name: Aleksandros
    surname: Papazoglou
    roles:
    - transcriber
    - project-manager
  - name: Symeon
    surname: Symeonidis
    orcid: 0000-0002-3259-614X
    roles:
    - transcriber
    - project-manager
  - name: Lazaros
    surname: Tsochatzidis
    orcid: 0000-0002-4634-7419
    roles:
    - transcriber
    - project-manager
  institutions: []
  description: >-
    It comprises manuscripts made of paper, written at the end of the 15th century  and
    its dimensions are 218X150 mm. In various pages, we find red initials and
    epititles which enrich the manuscript’s decoration. 

    The dataset of ΧΦ114 consists of 1051 lines of text containing 5467 (2877  unique
    words) words that are distributed over 44 scanned handwritten text pages. 

    For each page, a PageXML is provided containing the following ground-truth:

    1. Text region polygon coordinates
    2. Text line polygon coordinates with the corresponding transcription text
    3. Word polygon coordinated with the corresponding transcription text
  language:
  - grc
  transcription-guidelines: |
    - Abbreviation and ligatures were resolved
    - Minuscule in the beginning of sentences were kept as such.
    - Polytonic spelling and diaeresis are kept
  production-software: Unknown
  automatically-aligned: false
  characters:
    mode: NFD
  script:
  - iso: Grek
  script-type: only-manuscript
  time:
    notBefore: '1401'
    notAfter: '1500'
  hands:
    count: less-than-11
    precision: exact
  license:
    name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Page-XML
  volume:
  - {count: 1006, metric: lines}
  - {count: 44, metric: files}
  - {count: 44, metric: regions}
  - {count: 36898, metric: characters}
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Stavronikita Monastery Collection No. 53
  url: https://zenodo.org/records/5595669
  authors:
  - name: Ioannis
    surname: Pratikakis
    orcid: 0000-0002-4124-3688
    roles:
    - transcriber
    - project-manager
  - name: Aleksandros
    surname: Papazoglou
    roles:
    - transcriber
    - project-manager
  - name: Symeon
    surname: Symeonidis
    orcid: 0000-0002-3259-614X
    roles:
    - transcriber
    - project-manager
  - name: Lazaros
    surname: Tsochatzidis
    orcid: 0000-0002-4634-7419
    roles:
    - transcriber
    - project-manager
  institutions: []
  description: >-
    The collection is one of the oldest Stavronikita Monastery on Mount Athos.  It
    is a parchment, four-gospel manuscript which has been written between  1301 and
    1350. It comprises 54 pages with dimensions that are approximately
     250x185 mm. The script is elegant minuscule and the use of majuscule letters
     is rare. Tachygraphical symbols and abbreviations are encountered in the 
     manuscript as well. Furthermore, the manuscript is enriched with 
     chrysography, elegant epititles and initials. 

     The dataset of ΧΦ53 consists of 1038 lines of text containing 5592 words
     (2374 unique words) that are distributed over 54 scanned handwritten text pages.
  language:
  - grc
  transcription-guidelines: |
    - Abbreviation and ligatures were resolved
    - Minuscule in the beginning of sentences were kept as such.
    - Polytonic spelling and diaeresis are kept
  production-software: Unknown
  automatically-aligned: false
  characters:
    mode: NFD
  script:
  - iso: Grek
  script-type: only-manuscript
  time:
    notBefore: '1301'
    notAfter: '1350'
  hands:
    count: less-than-11
    precision: exact
  license:
    name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Page-XML
  volume:
  - {count: 1038, metric: lines}
  - {count: 54, metric: files}
  - {count: 54, metric: regions}
  - {count: 37070, metric: characters}
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: Ground-Truthed Data Set of Zenon Papyri for Handwritten Text Recognition
  url: https://zenodo.org/records/6565706
  authors:
  - name: Isabelle
    surname: Marthot-Santaniello
    orcid: 0000-0003-0407-8748
    roles:
    - transcriber
    - project-manager
  - name: Hodel
    surname: Tobias
    orcid: 0000-0002-2071-6407
    roles:
    - transcriber
    - project-manager
  institutions: []
  description: >-
    Diplomatic transcription of papyri found in the Zenon archive [see
    en.wikipedia.org/wiki/Zenon_of_Kaunos]


    Manually prepared as PageXML with Transkribus within D-Scribes project.
  project-name: D-Scribes
  project-website: https://d-scribes.philhist.unibas.ch/en/
  language:
  - grc
  production-software: Transkribus
  automatically-aligned: false
  characters:
    mode: NFD
  script:
  - iso: Grek
  script-type: only-manuscript
  time:
    notBefore: '-250'
    notAfter: '-230'
  hands:
    count: unknown
    precision: estimated
  license:
    name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Page-XML
  volume:
  - metric: lines
    count: 321
  - metric: characters
    count: 5850
  - metric: files
    count: 27
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: ANR e-NDP Ground Truth
  url: https://zenodo.org/records/7575693
  authors:
  - name: Julie
    surname: Claustre
    orcid: 0000-0001-8504-3920
    roles:
    - transcriber
    - project-manager
  - name: Darwin
    surname: Smith
    roles:
    - transcriber
    - project-manager
  - name: Sergio
    surname: Torres Aguilar
    orcid: 0000-0002-1801-3147
    roles:
    - aligner
    - quality-control
    - support
  - name: Isabelle
    surname: Bretthauer
    orcid: 0000-0002-1780-772X
    roles:
    - transcriber
  - name: Pierre
    surname: Brochard
    orcid: 0000-0003-1955-556X
    roles:
    - quality-control
  - name: Olivier
    surname: Canteaut
    orcid: 0000-0003-4586-1931
    roles:
    - transcriber
    - quality-control
  - name: Emilie
    surname: Cottereau
    orcid: 0000-0001-6880-2112
    roles:
    - transcriber
  - name: Fabrice
    surname: Delivré
    roles:
    - transcriber
  - name: Mathilde
    surname: Denglos
    roles:
    - transcriber
  - name: Vincent
    surname: Jolivet
    orcid: 0000-0003-0600-0362
    roles:
    - aligner
    - quality-control
    - support
  - name: Véronique
    surname: Julerot
    roles:
    - transcriber
  - name: Thierry
    surname: Kouamé
    orcid: 0000-0001-9728-2988
    roles:
    - transcriber
  - name: Elisabeth
    surname: Lusset
    orcid: 0000-0003-1572-1890
    roles:
    - transcriber
  - name: Anne
    surname: Massoni
    orcid: 0000-0002-1690-9804
    roles:
    - transcriber
  - name: Sebastien
    surname: Nadiras
    roles:
    - transcriber
  - name: Nicolas
    surname: Perreaux
    orcid: 0000-0002-0103-817X
    roles:
    - transcriber
  - name: Hugo
    surname: Regazzi
    orcid: 0000-0002-3059-2874
    roles:
    - transcriber
  - name: Mathilde
    surname: Treglia
    roles:
    - transcriber
  institutions: []
  description: >-
    This repository hosts HTR ground truth created within the context of the ANR
    e-NDP project.

    This dataset based on 512 pages from the 26 registers of the Notre-Dame de Paris
    cathedral chapter.

    The volumes containing the chapter conclusions were conceived to serve as memorial
    records, but above all as documents for regular use and consultation in the daily
    practice of administration and management. 

    The registers were written using a Cursive script (ca. late XIIIe - XVIe) and
    their content is were written mainly in Latin, the
    rest in French. There are no fewer than 18 hands in these pages.

     The transcriptions were manually completed in two rounds by a group of 12 contributors,
    including historians and paleographers, over the course of 2021-2022 using eScriptorium.
  project-name: ANR e-NDP
  project-website: https://endp.hypotheses.org/presentation
  language:
  - fra
  - lat
  production-software: eScriptorium + Kraken
  automatically-aligned: true
  script:
  - iso: Latn
    qualify: cursive
  script-type: only-manuscript
  time:
    notBefore: '1326'
    notAfter: '1504'
  hands:
    count: more-than-10
    precision: estimated
  license:
    name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
  format: Page-XML
  volume:
  - metric: pages
    count: 512
  - metric: lines
    count: 34231
  - metric: characters
    count: 3320407
  - metric: files
    count: 512
  - metric: regions
    count: 2448
  transcription-guidelines: |-
    - The abbreviations have been resolved, both those by suspension (facimꝰ ---> facimus) and by contraction (dñi --> domini). Likewise, those using conventional signs (⁊ --> et ; ꝓ --> pro) have been resolved. 
    - The named entities (names of persons, places and institutions) have been capitalized. The beginning of a block of text as well as the original capitals used by the notary are also capitalized.
    - The consonantal i and u characters have been transcribed as j and v in both French and Latin.
    - The punctuation marks used in the text: . and / have been transcribed, but the transcription has not been standardized with modern punctuation.
    - Corrections and words that appear cancelled in the manuscript have been transcribed surrounded by the sign $ at the beginning and at the end.
    - More specific transcription rules can be found into the file `transcription_guidelines.pdf` on Zenodo repository. 
- schema: https://htr-united.github.io/schema/2023-06-27/schema.json
  title: ARletta
  url: zenodo.org/records/11191457
  authors:
  - name: Lith
    surname: Lefranc
  - name: Ilja
    surname: Van Damme
  - name: Thibault
    surname: Clérice
  - name: Mike
    surname: Kestemont
  institutions:
  - name: University of Antwerp
  - name: National Institute for Research in Digital Science and Technology, Paris
  description: Open-source handwritten text recognition models for historic Dutch
  project-name: Bias in History
  project-website: https://www.bias-in-history.eu/
  language:
  - nld
  - fra
  production-software: eScriptorium + Kraken
  automatically-aligned: false
  script:
  - iso: Latn
  script-type: only-manuscript
  time:
    notBefore: '1600'
    notAfter: '1940'
  hands:
    count: more-than-10
    precision: estimated
  license:
    name: CC-BY-SA 4.0
    url: https://creativecommons.org/licenses/by-sa/4.0/
  format: Page-XML
  volume:
  - metric: lines
    count: 431359
  - metric: regions
    count: 44536
  - metric: pages
    count: 10267
  - metric: characters
    count: 14253206
  transcription-guidelines: >-
    **Diplomatic transcription.** All of the text was transcribed verbatim, preserving
    all of its original features:

    - orthography: preserve original spelling

    - abbreviations: do not expand abbreviations

    - capitalization: retain original use of uppercase and lowercase letters

    - punctuation: transcribe punctuation marks exactly as they appear, even of they
    are unconventional by modern standards

    - special characters: include any special characters or symbols as they appear

    - formatting: maintain original formatting such as underlining or strikethrough

    - errors and corrections: include all errors and corrections found in the text

    - non-interpretative: avoid interpreting or modernizing the text

    - use the '@' symbol for characters you can not read an tag them as 'unclear'
    on baseline level

    - tag marginal text as 'marginalia' and main body text as 'paragraph' on region
    level