Skip to content

RDF4J

Eclipse RDF4J is an open-source Java framework for working with RDF. Out of the box, it supports running RDF database servers and allows clients to interact with these servers through an HTTP protocol known as the RDF4J REST API. This API fully supports all SPARQL 1.1 W3C Recommendations and also provides additional operations for managing RDF4J concepts such as repositories and transactions. For more details about the RDF4J REST API and its capabilities, see the official documentation.

RDFLib provides two options to interface with the RDF4J REST API:

  1. Store integration - A full RDFLib Store implementation that lets users interact with RDF4J repositories seamlessly through RDFLib’s Graph and Dataset classes.
  2. RDF4J Client - A lower-level client that gives users more control over interactions with the RDF4J REST API. It covers all operations supported by the RDF4J REST API, including managing repositories and transactions.

To use RDFLib with RDF4J, first install RDFLib with the optional rdf4j extra.

pip install rdflib[rdf4j]

Note

The minimum RDF4J protocol version supported by RDFLib is 12. Versions less than 12 will raise an RDF4JUnsupportedProtocolError.

RDF4J Store

An RDF4J Store connects to a single RDF4J repository. If you need to work with multiple repositories, you can create multiple Store instances. The RDF4J Store exposes only the subset of repository operations that map directly with RDFLib’s Graph and Dataset interfaces. If you need to perform additional operations such as managing repositories, you should use the RDF4J Client instead.

Connecting to an existing repository

To get started, import the RDF4J Store class, create an instance of it, and pass it as the store parameter when creating a Graph or Dataset.

The following example connects to a local RDF4J server running on port 7200 and accesses an existing repository with the identifier my-repository.

If the server requires basic authentication, you can optionally pass a (username, password) tuple to the auth parameter.

from rdflib.plugins.stores.rdf4j import RDF4JStore

store = RDF4JStore(
    base_url="http://localhost:7200/",
    repository_id="my-repository",
    auth=("username", "password"),
)

Creating a new repository

If the repository does not exist (and the create parameter is set to False), an exception will be raised. To create the repository, set the create parameter to True. This will create the repository using RDF4J’s default configuration settings and then connect to it.

store = RDF4JStore(
    base_url="http://localhost:7200/",
    repository_id="my-repository",
    auth=("username", "password"),
    create=True,
)

To create a repository with a custom RDF4J configuration, pass the configuration as a string to the configuration parameter.

See the RDF4J documentation for more information on Repository and SAIL Configuration.

store = RDF4JStore(
    base_url="http://localhost:7200/",
    repository_id="my-repository",
    auth=("username", "password"),
    configuration=configuration,
    create=True,
)

Using the store with RDFLib

Once the store is created, you can create a Graph or Dataset object and use it as you would with any other RDFLib store.

from rdflib import Dataset

ds = Dataset(store=store)

For more information on how to use RDFLib Graphs and Datasets, see the subsections under Usage.

Namespaces

When connecting to an RDF4J repository, RDFLib automatically registers a set of built-in namespace prefixes. To disable this behavior, assign a new RDFLib NamespaceManager instance with bind_namespaces set to the string "none".

ds = Dataset(store=store)
ds.namespace_manager = NamespaceManager(ds, "none")

See NamespaceManager for more namespace binding options.

RDF4J Client

This section covers examples of how to use the RDF4J Client. For the full reference documentation of the RDF4J Client, see rdflib.contrib.rdf4j.client.

Creating a client instance

The RDF4JClient class is the main entry point for interacting with the RDF4J REST API. To create an instance, pass the base URL of the RDF4J server to the constructor and optionally a username and password tuple for basic authentication.

The preferred way to create a client instance is to use Python’s context manager syntax (with statement). When using this syntax, the client will automatically close when the block is exited.

from rdflib.contrib.rdf4j.client import RDF4JClient

with RDF4JClient("http://localhost:7200/", auth=("admin", "admin")) as client:
    # Perform your operations here.
    ...

Alternatively, you can create a client instance and manually close it when you are done using it.

from rdflib.contrib.rdf4j.client import RDF4JClient

client = RDF4JClient("http://localhost:7200/", auth=("admin", "admin"))
try:
    # Perform your operations here.
    ...
finally:
    client.close()

HTTP client configuration

The RDF4J Client uses the httpx library for making HTTP requests. When creating an RDF4J client instance, any additional keyword arguments to RDF4JClient will be passed on to the underlying httpx.Client instance.

For example, setting additional headers (such as an Authorization header) for all requests can be done as follows:

token = "secret"
headers = {
    "Authorization": f"Bearer {token}" 
}
with RDF4JClient("http://localhost:7200/", headers=headers) as client:
    # Perform your operations here.
    ...

The httpx.Client instance is available on the RDF4J client’s http_client property.

client.http_client

The repository manager

The RDF4J Client provides a RepositoryManager class that allows you to manage RDF4J repositories. It does not represent a repository itself; instead, it is responsible for creating, deleting, listing, and retrieving Repository instances.

You can access the repository manager on the RDF4J client instance using the repositories property.

client.repositories

Create a repository

To create a new repository, call the create method on the repository manager. You must provide the repository identifier as well as the RDF4J repository configuration as an RDF string. The default configuration format expected is text/turtle.

See the RDF4J documentation for more information on Repository and SAIL Configuration.

configuration = """
    PREFIX config: <tag:rdf4j.org,2023:config/>

    []    a config:Repository ;
        config:rep.id "my-repository" ;
        config:rep.impl
            [
                config:rep.type "openrdf:SailRepository" ;
                config:sail.impl
                    [
                        config:native.tripleIndexers "spoc,posc" ;
                        config:sail.defaultQueryEvaluationMode "STANDARD" ;
                        config:sail.iterationCacheSyncThreshold "10000" ;
                        config:sail.type "openrdf:NativeStore" ;
                    ] ;
            ] ;
    .
"""
repo = client.repositories.create("my-repository", configuration)

For the full reference documentation of the RepositoryManager class, see RepositoryManager.

Working with a repository

The Repository class provides attributes and methods for interacting with a single RDF4J repository. This includes:

  • Checking the health of the repository
  • Retrieving the number of statements in the repository
  • Querying the repository using both SPARQL and the Graph Store Protocol
  • Updating the repository using both SPARQL Update and the Graph Store Protocol
  • Perform queries and updates in a transaction context
  • Manage the namespace prefixes of the repository

Retrieve a repository instance

To get started, retrieve a repository instance by repository identifier.

repo = client.repositories.get("my-repository")

Note

When retrieving a repository instance, it will automatically perform a health check to ensure that the repository is healthy. To perform a manual health check, simply call the health method.

Repository size

To get the number of statements in a repository, call the size method.

repo.size()

Repository data

Upload some RDF data to the repository. Keep in mind that the upload method always appends the data to the repository.

data = """
    PREFIX ex: <https://example.com/>
    ex:Bob a ex:Person .
"""
repo.upload(data=data)

Note

Methods such as upload that accepts a data argument can automatically handle different kinds of inputs. For example, the data parameter in this case is an RDF string, but it can also be a file path, a file-like object, bytes data, or an RDFLib Graph or Dataset.

To overwrite the repository data, use the overwrite method.

Warning

This will delete all existing data in the repository and replace it with the provided data.

repo.overwrite(data=data)

There are also methods (get and delete) for querying and deleting data from the repository using a triple or quad pattern match. However, keep in mind that the patterns cannot include BNode terms.

The following example demonstrates the delete method, which deletes all statements with the subject https://example.com/Bob.

repo.delete(subject=URIRef("https://example.com/Bob"))

The following example demonstrates the get method, which returns a new in-memory Dataset object containing all the statements with https://example.com/Bob as the subject.

ds = repo.get(subject=URIRef("https://example.com/Bob"))

Note

Methods that accept a graph_name parameter can restrict the operation to a specific graph. Use None to match all graphs, or DATASET_DEFAULT_GRAPH_ID to match only the default graph.

To retrieve a list of graph names in the repository, use the graph_names method. This returns a list of IdentifiedNode.

repo.graph_names()

Querying the repository with SPARQL

SPARQL queries can be executed against a repository using the query method.

The return object is a Result. You can use Result.type to determine the type of the result. For example, an ASK query will provide a boolean result in Result.askAnswer, a DESCRIBE or CONSTRUCT query will provide a Graph result in Result.graph, and a SELECT query will provide results in Result.bindings.

query = "SELECT * WHERE { ?s ?p ?o } LIMIT 10"
result = repo.query(query)
if result.type == "ASK":
    print(result.askAnswer)
elif result.type in ("CONSTRUCT", "DESCRIBE"):
    print(result.graph)
else:
    # SELECT
    for row in result.bindings:
        print(row["s"])

Note

RDF4J supports optional query parameters that further restrict the context of the SPARQL query. To include these query parameters, pass in keyword arguments to the query method. See RDF4J REST API - Execute SPARQL query for the list of supported query parameters.

Updating the repository with SPARQL Update

To update the repository using SPARQL Update, use the update method.

query = "INSERT DATA { <https://example.com/Bob> <https://example.com/knows> <https://example.com/Alice> }"
repo.update(query)

Graph Store Manager

RDF4J repositories support the Graph Store Protocol to allow clients to upload and retrieve RDF data. To access the Graph Store Manager, use the graphs property.

repo.graphs

To add data to a specific graph, use the add method.

repo.graphs.add(graph_name="my-graph", data=data)

To clear a graph, use the clear method.

Warning

This will delete all existing data in the graph.

Note

RDF4J does not support empty named graphs. Once a graph is cleared, it will be deleted from the repository and will not appear in the list of graph_names.

repo.graphs.clear(graph_name="my-graph")

To retrieve data from a specific graph, use the get method. This returns a new in-memory Graph object containing the data in the graph.

graph = repo.graphs.get(graph_name="my-graph")

To overwrite the data in a specific graph, use the overwrite method.

Warning

This will delete all existing data in the graph and replace it with the provided data.

repo.graphs.overwrite(graph_name="my-graph", data=data)

Repository transactions

RDF4J supports transactions, which allow you to group multiple operations into a single atomic unit. To perform operations in a transaction, use the transaction method as a context manager.

The following example demonstrates how to perform operations in a transaction. The transaction will be automatically committed when the block is exited. If an error occurs during the transaction, the transaction will be rolled back and the error will be raised.

with repo.transaction() as txn:
    # Perform your operations here.
    ...

# Exiting the with-statement block will automatically commit or roll back if an error occurs.

Note

The methods available on the Transaction object vary slightly from the methods available on the Repository object. This is due to slight variations between the two endpoints (repositories and transactions) in the RDF4J REST API.

For the full reference documentation of the Transaction class, see Transaction.

Repository namespace prefixes

RDF4J supports managing namespace prefixes for a repository. To access the namespace manager, use the namespaces property on the Repository object.

repo.namespaces

To set a namespace prefix, use the set method.

repo.namespaces.set("schema", "https://schema.org/")

To retrieve a namespace by prefix, use the get method.

namespace = repo.namespaces.get("schema")

To remove a namespace prefix, use the remove method.

repo.namespaces.remove("schema")

To list all namespace prefixes, use the list method. This returns a list of NamespaceListingResult

import dataclasses

namespace_prefixes = repo.namespaces.list()
for prefix, namespace in [dataclasses.astuple(np) for np in namespace_prefixes]:
    print(prefix, namespace) 

To clear all namespace prefixes, use the clear method.

Warning

This will remove all namespace prefixes from the repository.

repo.namespaces.clear()