The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard. Originally designed as a data model for metadata, it quickly became the de facto standard for describing and exchanging graph data, especially in semantic web applications.
It provides a way to describe resources and their relationships in a machine-readable format: the information is structured as triples, statements that consist of a subject, a predicate, and an object. These triples form a graph structure, where nodes are resources, and edges represent relationships between resources, like in the picture below:
The same graph can be represented textually using the compounding statements:
Andrea knows Mario
Andrea lives in Viterbo
Viterbo is in Italy
Italy is in Europe
Mario lives in Italy
Mario knows John
John lives in Germany
Germany is in Europe
How To Store RDF Data? The RDF Store
An RDF Store, triple store, or RDF database is a specialized database designed for storing and querying RDF data. It typically supports SPARQL, a query language for querying RDF data.
SPARQL allows users to express complex queries to retrieve information from the RDF store based on patterns and conditions. The following SPARQL example retrieves the title of all books in the dataset whose price is less than 30.
PREFIX dc:
PREFIX ns:
SELECT ?title ?price
WHERE { ?x ns:price ?price .
FILTER (?price < 30) .
?x dc:title ?title . }
How To Provide RDF Data?
When I started diving into the topic of Linked Data Fragments, I was impressed by the following image on the LDF homepage.
The simple image quickly gives the reader an immediate idea about what we will discuss. The two extremes above are valid options if the matter is to publish and interact with RDF data. Let’s quickly explore them.
Data Dump
RDF portals like Dbpedia or VIAF provide online access to their search services. However, being public services, such access is limited to some usage quota.
For example, you cannot build a system that processes a considerable amount of data and executes a corresponding massive amount of API calls to those services quickly. The reason is clear: the portal is meant to provide its services at a high/medium quality to a large, potentially very vast audience; as a consequence, the marginal effort, in terms of resource usage, spent by a single API call cannot exceed a given rate.
If you are in that context, what can you do? Download the public dataset and manage it on your own in a local RDF Store, which brings us to the next extreme.
SPARQL Server
You have an RDF dataset, and you want to expose SPARQL capabilities. A first approach would be, “Okay, let’s use an RDF Store.”
So far, so good! However, the approach has some drawbacks, especially if the dataset is considerable in cardinality. In that scenario, the “high server cost” item listed above (the first of the three points in the list of the SPARQL endpoint extreme) could play a relevant role.
A scalable RDF storage solution has a cost, which is often “important,” especially in case the dataset is large; from that perspective, the open-source landscape is very poor.
Even if you leave the “on-premises” option for subscribing to a cloud-managed solution, be prepared to have a new relevant item in your grocery list.
Linked Data Fragments
Linked Data Fragments aims to offer an alternative, i.e., a third scenario in the middle, between the two extremes above. The idea is to destructure and distribute the query execution in small computation fragments.
Instead of having a centralized (i.e., server) SPARQL engine, which requires a high computation cost, the SPARQL query is destructured in an intermediate layer in its compounding blocks called “triple patterns.”
Let’s ignore any potential query optimization that the intermediate layer could apply (i.e., query rewriting, patterns reordering). Once the query has been destructured, each pattern is sent to a Triple Pattern Server, a remote service in charge of “resolving” a single triple pattern.
The response of the triple pattern resolution is a fragment. That is, a partial response composed of
- metadata about the result, the dataset, the service, and the system behind
- hypermedia controls for understanding and navigating the results
- triples matching the request pattern
The intermediate layer acts as a kind of query coordinator and federation engine: it destructures the original queries, asks for the resolution of any resulting pattern, receives the responses from the pattern resolution servers, applies a merge logic, and returns the response to the caller.
FragLink
FragLink is a framework for building Linked Data Fragments servers. In other words, FragLink enables Linked Data Fragments capabilities to your server application.
That means it’s not a server itself. Instead, it comes as a SpringBoot autoconfigure module that you can easily plug into your application. Once FragLink is plugged in, everything in terms of Linked Data Fragments Web API is enabled (i.e., HTTP endpoint, metadata, controls): of course, the concrete data binding is up to you.
Let’s see how it works.
Step 1: SpringBoot App Skeleton
This will be your Linked Data Fragment Server. It is strongly recommended to use Spring initializr to define the initial shape of the module (e.g., components, dependencies, frameworks, starters).
Step 2: FragLink Dependencies
Once the project skeleton is created, open the pom.xml (in case you’re using Gradle, there’s a corresponding configuration) and add the following section:
fraglink-package-registry
https://gitlab.com/api/v4/projects/52914288/packages/maven
The snippet above declares the coordinates to the maven repository where the FragLink artifacts are hosted. Then, in the dependencies section:
Step 3: Configuration
Assuming you already set up everything the SpringBoot module (e.g., dependencies and so on), here’s the minimal configuration required by FragLink:
com.spaziocodice.labs.rdf
fraglink-starter
1.1.1
fraglink:
base:
url: https://fragments.yourproject.org (this is an example)
page:
maxStatements: 50 (the maximum number of statements returned in response)
dataset:
name: "The dataset / project name"
description: "An optional description about the project"
Step 4: Start
Start you server, after few seconds you should see the following messages:
... : : FragLink v1.1.1 has been enabled on this server.
The server is running: great! Linked Data Fragments are exposed through the root (/) REST endpoint. The endpoint template is
http://fragments.yourproject.org{subject, predicate, object, graph, page}
However, being a triple/quad pattern resolver, it doesn’t still know how to fetch data. The default implementation is a simple NoOp, meaning no data is returned in response, only metadata. Here’s an example of such a response:
{
a , ;
;
[
[
;
"subject"
];
[
;
"predicate"
];
[
;
"object"
];
"https://fragments.yourproject.org/fragments{?subject,predicate,object,page}";
] .
a ;
"An optional description of the dataset.";
"The Dataset project/name";
.
[ a ] .
a ;
"Linked Data Fragment of Share-VDE dataset containing triples matching the pattern {?s ?p ?o ?q}"@en;
"https://fragments.yourproject.org/fragments#dataset";
"Linked Data Fragment of The Share-VDE Project Dataset"@en;
;
"0"^^;
;
"50"^^;
"0"^^ .
}
Step 5: Linked Data Fragment Resolver
To create a binding tied to a specific data source, you must create an implementation of
com.spaziocodice.labs.fraglink.service.impl.LinkedDataFragmentResolver
The interface contains a single method, which takes a triple/quad pattern in input and expects the list of matching triples in output. The FragLink framework uses Apache Jena, a powerful and open-source set of tools for dealing with RDF data.
Although implementing a resolver is trivial, we will soon publish an example repository with a sample resolver. Stay Tuned!
Links
Resource Description Frameworks
https://en.wikipedia.org/wiki/Resource_Description_Framework
Linked Data Fragments
https://en.wikipedia.org/wiki/Resource_Description_Framework
Fraglink
https://github.com/spaziocodice/FragLink
Apache Jena
We would love to hear questions, doubts, and feedback about this blog post!
Feel free to contact us or leave a message in the comment box below.