Skip to content
omar edited this page Dec 22, 2020 · 3 revisions

SDO-Check Wiki

The purpose of this Wiki is to document the goal setting and the theoretical basis for SDO-Check, and to introduce interested parties in the working methods of the tool.

Goal

The main purpose of SDO-Check is to offer a web GUI that enables users to verify their Schema.org annotations and help them to understand the detected irregularities in order to improve the quality of their annotations.

The primary target audience for this tool are (web) developers who want to test their semantic annotations based on the Schema.org vocabulary. The tool focuses first on the presumably format used by the target audience, namely JSON-LD annotations in compacted form that use only Schema.org as vocabulary, e.g.:

{
    "@context": "http://schema.org/",
    "@type": "WebApplication",
    "description": "SDO-Check is a free-to-use web-tool that enables the fast and simple verification of schema.org annotations.",
    "name": "SDO-Check",
    "url": "https://sdocheck.semantify.it",
    "applicationCategory": "Web service",
    "featureList": "Verification of schema.org annotations"
}

That being said, any further formats and variations that can be implemented for SDO-Check are a welcomed addition.

GUI

The GUI of this tool is roughly divided in 3 parts:

  1. URL input (blue)
  2. Code Snippet input (yellow)
  3. Tree visualization of verification results (green)

General GUI of SDO-Check

General working process

  1. User input
    1. URL of target web-page, or
    2. Code snippet (HTML or JSON-LD)
  2. Annotation extraction from input
  3. General verification of found Annotations
  4. Rendering of the verification results with a Tree Visualization
  5. Exploration of rendered results by the User

Components

Front-end

The front-end contains scripts that enable the functionality of sdo-check. This includes code for the UI handling, the overall verification process, the rendering of the annotations with a tree-visualization, and the advanced error explanation.

General Verifier

The General Verifier checks the compliance of semantic annotations based on the schema.org vocabulary.

Since the code was built for the backend we browserify the source code /verification/src/ into a single js file (generalVerificationBundle.js) so that is possible to run it in a browser.

You have to load the dev-dependencies if you want to edit and rebuild the general verification bundle (node script: buildGeneralVerificator). Note that we omit the required SDO-Adapter here because the frontend already loads this library.

Further documentation about the general verification can be found here:

Extractor

The Extractor extracts semantic annotations from HTML code.

This module is build based on the abandoned project https://www.npmjs.com/package/web-auto-extractor . Since it is a npm module we browserify the source code /extraction/src/ into a single js file (extractorBundle.js) so that is possible to run it in a browser.

You have to load the dev-dependencies if you want to edit and rebuild the extractor bundle (node script: buildExtractor).

Web-page scrapping

Web-page scrapping including dynamically generated HTML content is usually done by a crawler/scrapper like https://www.npmjs.com/package/puppeteer

Since that requires its own backend, which we wanted to omit for this project, we decided to use the public API of semantify.it to do this task. Of course, you can substitute this with your own HTML fetching module.

License

CC-BY-SA-3.0