Skip to content

Commit

Permalink
Merge pull request #79 from casework/release-0.8.0
Browse files Browse the repository at this point in the history
Release 0.8.0
  • Loading branch information
kchason authored Jun 12, 2023
2 parents ee10dc6 + b82a15e commit 42d0581
Show file tree
Hide file tree
Showing 79 changed files with 24,863 additions and 5,663 deletions.
88 changes: 85 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# CASE Implementation: PROV-O

This repository maps [CASE](https://caseontology.org/) to [W3C PROV-O](https://www.w3.org/TR/prov-o/), and provides a provenance review mechanism. Note that contrary to other CASE implementations, this maps CASE out to another data model, instead of mapping another data model or tool into CASE.
This repository maps [CASE](https://caseontology.org/) to [W3C PROV-O](https://www.w3.org/TR/prov-o/) and [OWL-Time](https://www.w3.org/TR/owl-time/), and provides a provenance review mechanism. Note that contrary to other CASE implementations, this maps CASE out to another data model, instead of mapping another data model or tool into CASE.


## Disclaimer
Expand Down Expand Up @@ -107,7 +107,7 @@ One CASE practice that might look non-obvious in the PROV context is CASE's repr
Some of the tests include small galleries of figures that are tracked as documentation. Other figures can be generated by an interested user, but are not version-controlled at the moment.

See for example:
* [The CASE website narrative "Urgent Evidence"](tests/casework.github.io/examples/urgent_evidence/)
* [The CASE website narrative "Urgent Evidence"](tests/casework.github.io/examples/urgent_evidence/#readme)

The following notes describe visual-design decisions.

Expand All @@ -118,7 +118,7 @@ The `case_prov_dot` module adopts the design vocabulary used by Trung Dong Huynh

The version of `prov` that `case_prov_dot` draws its designs from is tracked as a Git submodule. This tracking is not for any purpose of importing code. The [`prov.dot` package](https://github.com/trungdong/prov/blob/2.0.0/src/prov/dot.py) is imported as a library for its styling dictionaries, though this CASE project implements its own dot-formatted render to implement some extending design decisions, some of which are specific to CASE concepts.

[Conventions provided by the W3C](https://www.w3.org/2011/prov/wiki/Diagrams) were found after initial design of this section. Color selection has not been compared, but directional flow has been adopted. Notably, **time flows from up to down**, and when compared, **left to right**. *(Note, though, that left-to-right temporal flow is not yet implemented.)*
[Conventions provided by the W3C](https://www.w3.org/2011/prov/wiki/Diagrams) were found after initial design of this section. Color selection has not been compared, but directional flow has been adopted. Notably, **time flows from up to down**, and "Arrows point 'back into the past.'"


### Departures from original visual-design vocabularies
Expand Down Expand Up @@ -189,6 +189,88 @@ To illustrate the difference in projective capability of the subject CASE instan

![Qualified vs. unqualified relationship illustration](figures/readme-attribution.svg)


### Temporal relations

The [W3C Time Ontology in OWL](https://www.w3.org/TR/owl-time/) offers an example, though non-normative, illustration, "[Alignment of PROV-O with OWL-Time](https://www.w3.org/TR/owl-time/#time-prov)." This illustration has an encoded alignment ontology, [here](https://github.com/w3c/sdw/blob/6baa33fa84ccd79a43975f9a335fe479f9cf4069/time/rdf/time-prov.ttl). The alignment ontology is also non-normative.

`case_prov_dot` takes some of the alignments suggested and uses them to provide a render of usage of the "Allen algebra" of temporal interval relations. The relations are illustrated in a figure in the OWL-Time documentation [here](https://www.w3.org/TR/owl-time/#fig-thirteen-elementary-possible-relations-between-time-periods-af-97). The relations, as rendered by `case_prov_dot`, are shown in this figure (click to view the figure; the "Raw" display will navigate to the figure as SVG with selectable text):

![Allen relations with instants visible](figures/readme-allen-relations-visible.svg)

The above figure uses a flag from `case_prov_dot`, `--display-time-links`, to show how `time:ProperInterval` endpoints (the beginning and ending `time:Instant`s) render the `prov:Activity` as 1-dimensional intervals. The `-i` and `-j` node spellings reflect the illustration excerpted in OWL-Time Figure 2. The same figure is also available in the default display mode, where time links are rendered invisibly, [here](figures/readme-allen-relations-invisible.svg).

The above figures use `prov:Activity` coloring for `time:ProperInterval` illustration, using an alignment that includes `prov:Activity rdfs:subClassOf time:ProperInterval`. Here is how `prov:Activity`s and `time:ProperInterval`s render with `case_prov_dot --display-time-links`:

![Activity vs proper interval](figures/readme-activity-vs-proper-interval-visible.svg)

One effect added by using time sorting is that `prov:Activity` and `time:ProperInterval` beginnings are now always defined with a `time:Instant`. An interval bar is used to denote that the temporal thing begins at a linked `time:Instant`. If an end is known to exist for the `uco-action:Action`, `prov:Activity`, `time:ProperInterval`, or `prov:Entity`, an ending `time:Instant` will also be defined. These instants were found necessary for topologically ordering intervals and `time:Instant`s to be contained within them, such as when a `prov:Activity` is known to contain a `prov:Generation` event (see [the temporal order and timestamp granularity example](#temporal-order-and-timestamp-granularity) below for illustration).

Ending instants are not defined by default, because their existence implies the end of the temporal thing is known. Also, `prov:Entity`s are not automatically assigned a `prov:Generation` event, because there are some `prov:Entity`s that are atemporal---take for example `prov:EmptyCollection`, the mathematical empty set. To make this explicit, here are the default expanded inferences, and time-bounded expanded inferences, for activities, entities, and proper intervals.

| Default inferred boundary instants | Explicit boundary instants |
| --- | --- |
| ![Activity, Entity, and Proper Interval default instants](figures/readme-eapi-default-visible.svg) | ![Activity, Entity, and Proper Interval default instants](figures/readme-eapi-bounded-visible.svg) |
| [Source](figures/readme-eapi-default.ttl) | [Source](figures/readme-eapi-bounded.ttl) |


#### Other temporal entity relators

Other predicates that relate `time:TemporalEntity`s (including `time:Instant`s and `time:ProperInterval`s) are also illustrated, including `time:inside`, `time:before`, and `time:after`. `case_prov_dot` renders them as shown in this figure (again, click to view the figure as SVG with selectable text):

![Relations between intervals and instants](figures/readme-time-instants-visible.svg)

If `--display-time-links` is not requested, [this figure](figures/readme-time-instants-invisible.svg) shows the same items to show that position is preserved even if the temporal items are not colored visibly.

`uco-action:Action`s and `prov:Activity`s can be related using containing `time:ProperInterval`s. This figure shows two `prov:Activity`s with no timestamps and no direct link to one another, contained within two `time:ProperInterval`s that *do* link to one another. The left column shows the default display, and the right shows the display with `--display-time-links`.

| Default display | Display with time intervals | Display with time links |
| --- | --- | --- |
| ![Activities related by containing intervals, time invisible](figures/readme-activities-related-by-intervals-invisible.svg) | ![Activities related by containing intervals, intervals visible](figures/readme-activities-related-by-intervals-with-intervals.svg) | ![Activities related by containing intervals, links visible](figures/readme-activities-related-by-intervals-visible.svg) |
| Default | `--display-time-intervals` | `--display-time-links` |


#### Timestamp-based ordering

If no ordering is asserted with properties like `time:intervalBefore` and the like, `case_prov_dot` will use timestamp information. [This example JSON](figures/readme-actions-ordered-by-timestamp.json) shows three `case-investigation:InvestigativeAction`s with no relationship linking them. Here is how they render, by default and with `--display-time-links`:

| Default display | Display with time links |
| --- | --- |
| ![Actions ordered only by timestamp, time invisible](figures/readme-actions-ordered-by-timestamp-invisible.svg) | ![Actions ordered only by timestamp, time visible](figures/readme-actions-ordered-by-timestamp-visible.svg) |

**Note**: Timestamp ordering is based on lexicographic sorting, and as a pragmatic programming matter, `case_prov` will only sort timestamps with a GMT timezone (i.e. ending with `Z` or `+00:00`). Timestamps in UCO and PROV use the `xsd:dateTime` datatype, which does not require a time zone be GMT, or even present. OWL-Time has deprecated its property `time:inXSDDateTime` in favor of `time:inXSDDateTimeStamp`, which uses the timezone-requiring datatype `xsd:dateTimeStamp`. `case_prov` follows the implementation influenced by `time:inXSDDateTimeStamp`, with the more stringent requirement to use GMT in order to handle sorting. If a UCO or PROV timestamp cannot be straightforwardly converted to use `xsd:dateTimeStamp` with OWL-Time (i.e. by only swapping datatype), that timestamp instance will be disregarded in sorting and omitted from inferred `time:Instant`s.


#### Temporal order and timestamp granularity

In the context of PROV-O and OWL-Time, encoding `time:inside` with links lets one relate the `prov:InstantaneousEvent`s with `prov:Activity`s. This can be a significant aid when relating timestamps of different specificity. Suppose some moderately fast automated action is recorded as having started at `12:00:30Z` on some day, ended at `12:00:30Z`, and is known to have made two files in fairly quick succession, one a temporary file that was deleted. The timeline from available records, including an application's logs and file system timestamps, shows this timestamp order:

* `2020-01-02T12:00:30Z`: Action begins.
* `2020-01-02T12:00:30.1234Z`: Temporary file created.
* `2020-01-02T12:00:30.3456Z`: Persistent file created from some contents of temporary file.
* `2020-01-02T12:00:30.5678Z`: Temporary file destroyed.
* `2020-01-02T12:00:30Z`: Action concludes.

[This JSON-LD illustration](figures/readme-two-files.json) renders the above sequence using UCO and OWL-Time. This SVG uses `case_prov_dot` to render the same timeline:

| Default display | Display with time links |
| --- | --- |
| ![Differing granularities with time links invisible](figures/readme-two-files-invisible.svg) | ![Differing granularities with time links visible](figures/readme-two-files-visible.svg) |


#### RDF export of temporal inferences

`case_prov_dot` expands the CASE, PROV-O, and OWL-Time data within its input graphs to create a temporal ordering, with a focus on rendering into the [Dot language](https://graphviz.org/doc/info/lang.html). Workflows using `case_prov_*` have Dot as one possible output, but if there are other desired consumers of OWL-Time data, `case_prov_rdf` will generate *and persist as RDF* the same expansions as done in `case_prov_dot`.

One noteworthy workflow difference is that `case_prov_dot` is implemented to handle one relaxation over a UCO policy: UCO requires graph nodes to be identified with IRIs, without usage of blank nodes. PROV-O and OWL-Time have no such restriction disallowing blank nodes.

`case_prov_dot` will expand knowledge of blank nodes, but can make no guarantee on stability of its generated content. In particular, randomized node identifiers may cause the Dot rendering pipeline to laterally shuffle graph data of equal vertical rank. (That is, vertical ordering is stable, but horizontal ordering would be random with each re-run.)

`case_prov_rdf` will perform knowledge expansion on its input graph, but will only serialize inferences about IRI-identified nodes because blank nodes cannot have external annotations applied to them without re-serializing the entire input graph.

For examples of expanded data, see the ["two files" base JSON-LD](figures/readme-two-files.json) versus its [inferred graph](figures/readme-two-files-expanded.ttl), or the ["actions ordered by timestamp" base JSON-LD](figures/readme-actions-ordered-by-timestamp.json) versus its [inferred graph](figures/readme-actions-ordered-by-timestamp-expanded.ttl).


## Licensing

This repository is licensed under the Apache 2.0 License. See [LICENSE](LICENSE).
Expand Down
2 changes: 1 addition & 1 deletion README_PyPI.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# CASE Implementation: PROV-O

This repository maps [CASE](https://caseontology.org/) to [W3C PROV-O](https://www.w3.org/TR/prov-o/). Note that contrary to other CASE implementations, this maps CASE out to another data model, instead of mapping another data model or tool into CASE.
This repository maps [CASE](https://caseontology.org/) to [W3C PROV-O](https://www.w3.org/TR/prov-o/) and [OWL-Time](https://www.w3.org/TR/owl-time/). Note that contrary to other CASE implementations, this maps CASE out to another data model, instead of mapping another data model or tool into CASE.

Full documentation is available at the [project homepage](https://github.com/casework/CASE-Implementation-PROV-O).
Loading

0 comments on commit 42d0581

Please sign in to comment.