Documenting a resource¶
In the tripper.dataset sub-package are the documents documenting the resources internally represented as JSON-LD documents stored as Python dicts. However, the API tries to hide away the complexities of JSON-LD behind simple interfaces. To support different use cases, the sub-package provide several interfaces for data documentation, including Python dicts, YAML files and tables. These are further described below.
Documenting as a Python dict¶
The API supports two Python dict representations, one for documenting a single resource and one for documenting multiple resources.
Single-resource dict¶
Below is a simple example of how to document a SEM image dataset as a Python dict:
>>> dataset = {
... "@id": "kb:image1",
... "@type": "sem:SEMImage",
... "creator": "Sigurd Wenner",
... "description": "Back-scattered SEM image of cement, polished with 1 µm diamond compound.",
... "distribution": {
... "downloadURL": "https://github.com/EMMC-ASBL/tripper/raw/refs/heads/master/tests/input/77600-23-001_5kV_400x_m001.tif",
... "mediaType": "image/tiff"
... }
... }
The keywords are defined in the default JSON-LD context and documented under Predefined keywords.
This example uses two namespace prefixes not included in the predefined prefixes. We therefore have to define them explicitly
>>> prefixes = {
... "sem": "https://w3id.com/emmo/domain/sem/0.1#",
... "kb": "http://example.com/kb/"
... }
Side note
This dict is actually a JSON-LD document with an implicit context.
You can use as_jsonld() to create a valid JSON-LD document from it.
In addition to add a @context
field, this function also adds some implicit @type
declarations.
>>> import json
>>> from tripper.dataset import as_jsonld
>>> d = as_jsonld(dataset, prefixes=prefixes)
>>> print(json.dumps(d, indent=2))
{
"@context": "https://raw.githubusercontent.com/EMMC-ASBL/tripper/refs/heads/master/tripper/context/0.2/context.json",
"@type": [
"http://www.w3.org/ns/dcat#Dataset",
"https://w3id.org/emmo#EMMO_194e367c_9783_4bf5_96d0_9ad597d48d9a",
"https://w3id.com/emmo/domain/sem/0.1#SEMImage"
],
"@id": "http://example.com/kb/image1",
"creator": "Sigurd Wenner",
"description": "Back-scattered SEM image of cement, polished with 1 \u00b5m diamond compound.",
"distribution": {
"@type": "http://www.w3.org/ns/dcat#Distribution",
"downloadURL": "https://github.com/EMMC-ASBL/tripper/raw/refs/heads/master/tests/input/77600-23-001_5kV_400x_m001.tif",
"mediaType": "image/tiff"
}
}
You can use save_dict() to save this documentation to a triplestore. Since the prefixes "sem" and "kb" are not included in the Predefined prefixes, they are have to be provided explicitly.
>>> from tripper import Triplestore
>>> from tripper.dataset import save_dict
>>> ts = Triplestore(backend="rdflib")
>>> save_dict(ts, dataset, prefixes=prefixes) # doctest: +ELLIPSIS
AttrDict(...)
The returned AttrDict
instance is an updated copy of dataset
(casted to a dict subclass with attribute access).
It correspond to a valid JSON-LD document and is the same as returned by as_jsonld().
You can use ts.serialize()
to list the content of the triplestore (defaults to turtle):
>>> print(ts.serialize())
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix emmo: <https://w3id.org/emmo#> .
@prefix kb: <http://example.com/kb/> .
@prefix sem: <https://w3id.com/emmo/domain/sem/0.1#> .
<BLANKLINE>
kb:image1 a dcat:Dataset,
sem:SEMImage,
emmo:EMMO_194e367c_9783_4bf5_96d0_9ad597d48d9a ;
dcterms:creator "Sigurd Wenner" ;
dcterms:description "Back-scattered SEM image of cement, polished with 1 µm diamond compound." ;
dcat:distribution [ a dcat:Distribution ;
dcat:downloadURL "https://github.com/EMMC-ASBL/tripper/raw/refs/heads/master/tests/input/77600-23-001_5kV_400x_m001.tif" ;
dcat:mediaType "image/tiff" ] .
<BLANKLINE>
<BLANKLINE>
Note that the image implicitly has been declared to be an individual of the classes dcat:Dataset
and emmo:DataSet
.
This is because the type
argument of save_dict() defaults to "dataset".
Multi-resource dict¶
It is also possible to document multiple resources as a Python dict.
Note
Unlike the single-resource dict representation, the multi-resource dict representation is not valid (possible incomplete) JSON-LD.
This dict representation accepts the following keywords:
- @context: Optional user-defined context to be appended to the documentation of all resources.
- prefixes: A dict mapping namespace prefixes to their corresponding URLs.
- datasets/distributions/accessServices/generators/parsers/resources: A list of valid single-resource dict of the given resource type.
See semdata.yaml for an example of a YAML representation of a multi-resource dict documentation.
Documenting as a YAML file¶
The save_datadoc() function allow to save a YAML file in multi-resource format to a triplestore. Saving semdata.yaml to a triplestore can e.g. be done with
>>> from tripper.dataset import save_datadoc
>>> save_datadoc( # doctest: +ELLIPSIS
... ts,
... "https://raw.githubusercontent.com/EMMC-ASBL/tripper/refs/heads/master/tests/input/semdata.yaml"
... )
AttrDict(...)
Documenting as table¶
The TableDoc class can be used to document multiple resources as rows in a table.
The table must have a header row with defined keywords (either predefined or provided with a custom context). Nested fields may be specified as dot-separated keywords. For example, the table
@id | distribution.downloadURL |
---|---|
:a | http://example.com/a.txt |
:b | http://example.com/b.txt |
correspond to the following turtle representation:
:a dcat:distribution [
a dcat:Distribution ;
downloadURL "http://example.com/a.txt" ] .
:b dcat:distribution [
a dcat:Distribution ;
downloadURL "http://example.com/b.txt" ] .
The below example shows how to save all datasets listed in the CSV file semdata.csv to a triplestore.
>>> from tripper.dataset import TableDoc
>>> td = TableDoc.parse_csv(
... "https://raw.githubusercontent.com/EMMC-ASBL/tripper/refs/heads/master/tests/input/semdata.csv",
... prefixes={
... "sem": "https://w3id.com/emmo/domain/sem/0.1#",
... "semdata": "https://he-matchmaker.eu/data/sem/",
... "sample": "https://he-matchmaker.eu/sample/",
... "mat": "https://he-matchmaker.eu/material/",
... "dm": "http://onto-ns.com/meta/characterisation/0.1/SEMImage#",
... "parser": "http://sintef.no/dlite/parser#",
... "gen": "http://sintef.no/dlite/generator#",
... },
... )
>>> td.save(ts)