Customisations¶
User-defined prefixes¶
A namespace prefix is a mapping from a prefix to a namespace URL. For example
owl: http://www.w3.org/2002/07/owl#
Tripper already include a default list of predefined prefixes. Additional prefixed can be provided in two ways.
With the prefixes argument¶
Several functions in the API (like store(), told() and TableDoc.parse_csv()) takes a prefixes argument with which additional namespace prefixes can provided.
This may be handy when used from the Python API.
With custom context¶
Additional prefixes can also be provided via a custom JSON-LD context as a "prefix": "namespace URL" mapping.
See User-defined keywords for how this is done.
User-defined keywords¶
Tripper already include a long list of predefined keywords, that are defined in the default JSON-LD context. A description of how to define new concepts in the JSON-LD context is given by JSON-LD 1.1 document, and can be tested in the JSON-LD Playground.
A new custom keyword can be added by providing mapping in a custom JSON-LD context from the keyword to the IRI of the corresponding concept in an ontology.
Lets assume that you already have a domain ontology with base IRI http://example.com/myonto#, that defines the concepts for the keywords you want to use for the data documentation.
First, you can add the prefix for the base IRI of your domain ontology to a custom JSON-LD context
"myonto": "http://example.com/myonto#",
How the keywords should be specified in the context depends on whether they correspond to a data property or an object property in the ontology and whether a given datatype is expected.
Simple literal¶
Simple literals keywords correspond to data properties with no specific datatype (just a plain string).
Assume you want to add the keyword batchNumber to relate documented samples to the number assigned to the batch they are taken from.
It corresponds to the data property http://example.com/myonto#batchNumber in your domain ontology.
By adding the following mapping to your custom JSON-LD context, batchNumber becomes available as a keyword for your data documentation:
"batchNumber": "myonto:batchNumber",
Literal with specific datatype¶
If batchNumber must always be an integer, you can specify this by replacing the above mapping with the following:
"batchNumber": {
"@id": "myonto:batchNumber",
"@type": "xsd:integer"
},
Here "@id" refer to the IRI batchNumber is mapped to and "@type" its datatype. In this case we use xsd:integer, which is defined in the W3C xsd vocabulary.
Object property¶
Object properties are relations between two individuals in the knowledge base.
If you want to say more about the batches, you may want to store them as individuals in the knowledge base.
In that case, you may want to add a keyword fromBatch which relate your sample to the batch it was taken from.
In your ontology you may define fromBatch as a object property with IRI: http://example.com/myonto/fromBatch.
"fromBatch": {
"@id": "myonto:fromBatch",
"@type": "@id"
},
Here the special value "@id" for the "@type" means that the value of fromBatch must be an IRI.
Creating a context with keywords from an ontology¶
Creating a context with keywords manually can be strenuous and is prone to human mistakes. It is therefore advisable to only use one source of truth, namely the ontology.
The context can be generated from a triplestore with the ontology with the Keywords class:
from tripper import Triplestore
from tripper.datadoc import get_keywords
ts = Triplestore('rdflib')
ts.parse(
'https://raw.githubusercontent.com/EMMC-ASBL/tripper/refs/heads/master/tests/ontologies/family.ttl',
format='turtle',
)
kw = get_keywords() # create an Keywords instance populated with the default keywords (ddoc:datadoc)
# Before loading the keywords file it is required that all namespaces have a prefix.
# The family namespace does not have prefix by defualt and it must be added:
kw.add_prefix('fam', 'http://onto-ns.com/ontologies/examples/family#')
# We can now load the ontology into the keywords
kw.load_rdf(ts, redefine='skip') # keywords that are already defined are skipped
# or
lw.load_rdf(ts, redefine='allow') # keywords that are already defined are redefined
Note that there are a few considerations when generating a context from an ontology:
First of all, labels that are the same as predefined keywords must be handled with care.
The default behaviour is that if this is attempted, an error is raised (redefine = raise).
This choice have been made to ensure that redefining predefined keywords is a conscious decision.
In order to redefine an existing keyword, the argument redefine of the load_rdf() method must be set to allow.
A warning will be emitted for each keyword that is redefined.
In order to generate keywords from an ontology without redefining existing keywords, the redefine argument can be set to skip, in which case existing keywords are left unchanged and a warning is emitted for each new keyword that is skipped to the advantage of the existing keyword.
Providing a custom context¶
A custom context with defined keywords can be provided for all the interfaces described in the section Documenting a resource.
Python dict¶
Both for the single-resource and multi-resource dicts, you can add a "@context" key to the dict who's value is
- a string containing a resolvable URL to the custom context,
- a dict with the custom context or
- a list of the aforementioned strings and dicts.
For example
{
"@context": [
# URL to a JSON file, typically a domain-specific context
"https://json-ld.org/contexts/person.jsonld",
# Local context
{
"fromBatch": {
"@id": "myonto:fromBatch",
"@type": "@id"
}
}
],
# Documenting of the resource using keywords defined in the context
...
}
Note that the default context is always included and doesn't need to be specified explicitly.
YAML file¶
Since the YAML representation is just a YAML serialisation of a multi-resource dict, custom context can be provided by adding a "@context" keyword.
For example, the following YAML file defines a custom context defining the myonto prefix as well as the batchNumber and fromBatch keywords.
An additional "kb" prefix (used for documented resources) is defined with the prefixes keyword.
---
# Custom context
"@context":
myonto: http://example.com/myonto#
batchNumber:
"@id": myonto:batchNumber
"@type": xsd:integer
fromBatch:
"@id": myonto:fromBatch
"@type": "@id"
# Additional prefixes
prefixes:
kb: http://example.com/kb#
resources:
# Samples
- "@id": kb:sampleA
"@type": chameo:Sample
fromBatch: kb:batch1
- "@id": kb:sampleB
"@type": chameo:Sample
fromBatch: kb:batch1
- "@id": kb:sampleC
"@type": chameo:Sample
fromBatch: kb:batch2
# Batches
- "@id": kb:batch1
"@type": myonto:Batch
batchNumber: 1
- "@id": kb:batch2
"@type": myonto:Batch
batchNumber: 2
You can save this context to a triplestore with
>>> from tripper import Triplestore
>>> from tripper.datadoc import save_datadoc
>>>
>>> ts = Triplestore("rdflib")
>>> save_datadoc( # doctest: +ELLIPSIS
... ts,
... "https://raw.githubusercontent.com/EMMC-ASBL/tripper/refs/heads/master/tests/input/custom_context.yaml",
... )
{'@context': ...}
The content of the triplestore should now be
>>> print(ts.serialize())
@prefix chameo: <https://w3id.org/emmo/domain/characterisation-methodology/chameo#> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix kb: <http://example.com/kb#> .
@prefix myonto: <http://example.com/myonto#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<BLANKLINE>
kb:sampleA a rdfs:Resource,
dcat:Resource,
chameo:Sample ;
myonto:fromBatch kb:batch1 .
<BLANKLINE>
kb:sampleB a rdfs:Resource,
dcat:Resource,
chameo:Sample ;
myonto:fromBatch kb:batch1 .
<BLANKLINE>
kb:sampleC a rdfs:Resource,
dcat:Resource,
chameo:Sample ;
myonto:fromBatch kb:batch2 .
<BLANKLINE>
kb:batch2 a myonto:Batch,
rdfs:Resource,
dcat:Resource ;
myonto:batchNumber 2 .
<BLANKLINE>
kb:batch1 a myonto:Batch,
rdfs:Resource,
dcat:Resource ;
myonto:batchNumber 1 .
<BLANKLINE>
<BLANKLINE>
Table¶
The __init__() method of the TableDoc class takes a context argument with witch user-defined context can be provided.
The value of the context argument is the same as for the @context key of a Python dict.
Multi-table workflows¶
When documenting a knowledge base from multiple CSV (or YAML) tables that cross-reference each other's classes — for example, a classes.csv that defines domain-specific dataset types that are referenced as hasInput/hasOutput values in a computations.csv — you must explicitly enrich the shared context between parses.
Why this matters¶
Tripper determines whether an object-property value should become an owl:Restriction (instead of a plain triple) by checking whether the referenced IRI is a known class in the current context.
When two tables are parsed independently (even with the same context object), classes defined in the first table are not automatically visible to infer_restriction_types() when it processes the second table.
The consequence is silent: for scalar string values a "value" restriction is still inferred, but for list-valued object properties (e.g. a list of two hasInput classes) no restriction type is inferred at all, and the property ends up as a plain triple rather than an owl:Restriction node.
Solution: call update_context() between parses¶
After parsing the table that defines your classes, call update_context() with its output before parsing tables that reference those classes:
from tripper.datadoc import get_context, TableDoc
from tripper.datadoc.dataset import update_context
from tripper import Triplestore
context = get_context("https://example.org/context/", theme=None)
prefixes = {"myns": "https://example.org/myns/"}
ts = Triplestore("rdflib")
# 1. Parse and store the table that defines classes
classes_doc = TableDoc.parse_csv("classes.csv", context=context, prefixes=prefixes)
classes_doc.save(ts)
# 2. Register the newly-parsed classes in the shared context
update_context(classes_doc.asdicts(), context)
# 3. Now parse tables and store that reference those classes — restrictions will be inferred correctly
resources_doc = TableDoc.parse_csv("resources.csv", context=context, prefixes=prefixes)
resources_doc.save(ts)
This pattern applies whenever:
- One table defines classes (rows with
@type: owl:ClassorsubClassOf). - Another table documents resources whose object properties point to those classes.
- You want those properties to be represented as
owl:Restrictionnodes in the output graph.
If you chain more than two tables, repeat the update_context() call after each parse.
Easier solution: create one dict before populating the triplestore¶
Technically, it is often easier to create one big dict with all the resources from all tables, and then call store() once on that dict. This way, all classes are visible to infer_restriction_types() when it processes the whole dict.
from tripper.datadoc import get_context, TableDoc
from tripper.datadoc.dataset import store
context = get_context("https://example.org/context/", theme=None)
prefixes = {"myns": "https://example.org/myns/"}
# 1. Parse the table that defines classes
classes_doc = TableDoc.parse_csv("classes.csv", context=context, prefixes=prefixes)
# 2. Parse the table that defines resources that reference those classes
resources_doc = TableDoc.parse_csv("resources.csv", context=context, prefixes=prefixes)
# 3. Create one big dict with all resources from all tables
dicts = classes_doc.asdicts() + resources_doc.asdicts()
# 4. Store the big dict in the triplestore
store(ts, dicts, context=context)