Skip to content
Change the repository type filter

All

    Repositories list

    • retriever

      Public
      A data processing tool designed to extract and process articles from text files exported by a system called "Retriever".
      Python
      0000Updated Nov 27, 2024Nov 27, 2024
    • This repository is designed to handle the extraction, processing, and curation of text data from UNESCO’s Proceedings, 1945-2017. It provides a set of tools and scripts to facilitate the extraction of text from various formats, language detection, filtering, and metadata indexing.
      Python
      0000Updated Nov 27, 2024Nov 27, 2024
    • Tools for extracting text from PDFs
      Python
      0000Updated Nov 26, 2024Nov 26, 2024
    • Python
      0010Updated Sep 3, 2024Sep 3, 2024
    • Script and code related to collecting data (scraping) from the UNESCO website
      Python
      MIT License
      0220Updated Sep 3, 2024Sep 3, 2024
    • International Ideas at UNESCO: Digital Approaches to Global Conceptual History
      HTML
      1000Updated Feb 28, 2024Feb 28, 2024
    • Code related to collecting SSI (legal instruments) corpus data from the UNESCO website.
      Python
      0000Updated Dec 6, 2023Dec 6, 2023
    • Text analytic tools
      Jupyter Notebook
      03100Updated Nov 9, 2023Nov 9, 2023
    • Explore UNESCO Courier magazine (in your browser).
      Jupyter Notebook
      0030Updated Sep 19, 2023Sep 19, 2023
    • Java
      0010Updated Sep 6, 2023Sep 6, 2023
    • PM and notes related to the project
      0000Updated Aug 29, 2023Aug 29, 2023
    • Extracted article corpus
      0000Updated Aug 22, 2023Aug 22, 2023
    • Extracted and tagged Courier issues
      0210Updated Jun 30, 2023Jun 30, 2023
    • Jupyter Hub & Lab for INIDUN project
      Makefile
      MIT License
      0000Updated Jun 29, 2023Jun 29, 2023
    • Curation scripts for the COURIER corpus
      Python
      MIT License
      0000Updated Jan 19, 2023Jan 19, 2023
    • INIDUN Jupyter notebooks
      Jupyter Notebook
      0000Updated Oct 2, 2021Oct 2, 2021
    • pdfbox

      Public
      Mirror of Apache PDFBox
      Java
      Apache License 2.0
      869000Updated Jun 2, 2021Jun 2, 2021
    • wiki

      Public
      0000Updated Jun 2, 2020Jun 2, 2020