GraphQL + OWL: An "Ontologized" GraphQL Interface

GraphQL + OWL: An “Ontologized” GraphQL Interface

Introduction

This article describes a proposal for a GraphQL OWL-compliant interface, including queries and mutations. Instead of dealing with abstract OWL concepts, we reference a popular ontology, BIBFRAME, used in structuring bibliographic descriptions. 

We will go through a brief synthetic overview of GraphQL and BIBFRAME and then describe the proposal. Note the article has nothing to do with the software implementation, which will be the topic of another post.

What is GraphQL?

GraphQL is a query language and runtime for APIs, developed by Facebook in 2012 and open-sourced in 2015. It provides a more efficient, powerful, and flexible way to request and manipulate data from servers compared to traditional REST (Representational State Transfer) APIs. In a GraphQL exchange, clients can specify exactly what data they need, and the server responds with only the requested data, eliminating over-fetching or under-fetching of information. From this article’s perspective, the most relevant key features of GraphQL are:
  • Flexible Queries: Clients can specify the shape of the response they need.
  • Single Endpoint: Unlike REST APIs that often require multiple endpoints for different resources, GraphQL APIs typically have a single endpoint for everything, queries, and mutations.
  • Strongly Typed Schema: GraphQL APIs are defined by a schema that explicitly defines the data types that can be exchanged.
  • Introspection: GraphQL APIs provide introspection capabilities, allowing clients to query the schema to understand the available types and operations.
Please review this article if you want to know our thoughts on the GraphQL vs REST debate.

What is BIBFRAME?

BIBFRAME (BIBliographic FRAMEwork) is an initiative led by the Library of Congress to modernize and replace the MARC (Machine-Readable Cataloging) standard, which has been the foundation of bibliographic description and cataloging in libraries for many decades.

The BIBFRAME design transforms how bibliographic information is structured and shared in the digital age, leveraging the principles of linked data and the Semantic Web.

At its core, BIBFRAME aims to provide a more flexible, extensible, and web-friendly framework for describing bibliographic resources, such as books, journals, audiovisual materials, and other types of content.

Initially developed for card catalogs, it shifts away from the MARC record structure. MARC has limitations in representing complex relationships and adapting to the modern information landscape.

 

From this article’s perspective, the most relevant aspects of BIBFRAME are:

 

  • Linked Data Approach: BIBFRAME embraces linked data principles, which emphasize connecting and interlinking data across the web
  • Semantic Modeling: BIBFRAME employs a semantic model to describe bibliographic entities and their relationships
  • Simplification: BIBFRAME aims to simplify the cataloging process by using more intuitive and consistent data structures. 
  • Extensibility: BIBFRAME is extensible, allowing institutions, libraries, and projects to tailor the framework to their specific needs.
 

The ontology is available in OWL (Ontology Web Language) format.

"Did Brundle Absorb Fly?"

In the popular movie “The Fly”, Seth Brundle transforms into a human-insect hybrid creature due to a teleportation experiment gone wrong.

Although that’s not a happy-ending movie, I like the idea of something that transforms into something new by combining two different things, and that’s why that movie immediately came to my mind when I started thinking about combining GraphQL and BIBFRAME. 

The idea is somewhat simple:

  • Step 1 (design): define a GraphQL schema that can express queries and mutations according to the data structures defined in an arbitrary ontology (as said, we’ll be using BIBFRAME throughout this article)
  • Step 2 (schema implementation): a software component that reads an OWL file and automatically creates the schema according to the design defined in step 1.
  • Step 3 (interface implementation): a software component that implements the resolvers behind the queries and mutations defined in step 2. 

 

Being three separate and cohesive steps, each one with a different kind of challenge, we will describe them using dedicated articles. Here we will focus on the first step’s challenges (and the related proposal).  

Schema Design: Challenges

Namespaces

At the time of writing, all GraphQL types defined in a given schema instance share one global namespace. While that works well in many scenarios, there are cases like the idea we are discussing, where it represents a hard limit to overcome.

This is because, in the RDF world, namespaces are a central construct for separating concerns and avoiding name clashes between entities and properties belonging to different spaces. The fully qualified name of things described in an ontology comprises the namespace (prefixed or not) plus the “local” name.

For example, BIBFRAME defines an entity Work whose fully qualified name is http://id.loc.gov/ontologies/bibframe/Work or, shortly, bf:Work (“bf” is called a namespace prefix, it is the short form of the namespace)

If we deal with an ontology that extends or uses BIBFRAME and provides a type called Work, it represents an entirely different entity from the BIBFRAME Work seen above. 

Note the ontology extends the above to properties: the fully qualified name of a property comprises the namespace and its local name. That allows to have multiple properties using the same name. For example,

 

To complicate matters, it’s important to remember an ontology is not an isolated set of data structures. Instead, it is usually a mix of things defined in the ontology and other things (entities, properties) from the different ontologies. Here’s, for example, the header of the BIBFRAME ontology, where we can see the namespace prefixes used throughout the definition: 

				
					<rdf:RDF 
        xml:base="http://id.loc.gov/ontologies/bibframe/" 
        xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
        xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" 
        xmlns:bf="http://id.loc.gov/ontologies/bibframe/" 
        xmlns:bflc="http://id.loc.gov/ontologies/bflc/" 
        xmlns:owl="http://www.w3.org/2002/07/owl#" 
        xmlns:skos="http://www.w3.org/2004/02/skos/core#" 
        xmlns:dcterms="http://purl.org/dc/terms/" 
        xmlns:foaf="http://xmlns.com/foaf/0.1/" 
        xmlns:cc="http://creativecommons.org/ns#">
...
				
			

Under this perspective, the implicit and single global namespace of GraphQL becomes the first obstacle to our idea. 

GraphQL Naming

As said above, in GraphQL, we do not have namespaces. To overcome the issue, there are approaches like this, a well-detailed (and unfortunately old) proposal here. The same guy who wrote the proposal also created an issue on GitHub, but as you can see, that generated a long (still open) debate.  

Unfortunately, the approach suggested in the Apollo docs doesn’t fit our use case, and the proposal, although I’d like the design very much, it’s still a proposal, not part of the official GraphQL specs.

We need to think of a schema design that can fulfill the requirements above and, at the same time, perfectly complies with the GraphQL specs. 

The GraphQL naming rules don’t allow to use URI as entities or property names because characters like : or /. Without that limitation, we could solve the naming issue using something like this:

				
					type https://id.loc.gov/ontologies/bibframe/Work {

    ...BIBFRAME Work fields follow...
}

type https://x.y.z/Work {
    
    ...XYZ Work fields follow...
}
				
			

or even, more concisely

				
					type bf:Work {

    ...BIBFRAME Work fields follow...
}

type xyz:Work {
    
    ...XYZ Work fields follow...
}
				
			

Although the examples above, especially the one which uses prefixes, suggest a very simple workaround like “Why don’t we use the underscore instead of the colon?”. I mean something like this: 

				
					type bf_Work {

    ...BIBFRAME Work fields follow...
}

type xyz_Work {
    
    ...XYZ Work fields follow...
}
				
			

The outcoming design has the following drawbacks, in my opinion:

  • Namespaces are not entities; they are just prefixes
  • Underscores cannot be used in names. If we use an ontology that has a property called “has_permission” we should differentiate between the first underscore and the others that potentially could occur in the name (e.g., “bf_has_permission”)
  • Names of properties strongly change , from a semantic perspective, compared with the source ontology. That makes requests (i.e., queries and mutations) less readable and therefore understandable
  • Namespaces are not containers; as a consequence of that, there will be a long flat list of entities and a long flat list of properties associated with entities 

The Proposal

To illustrate the proposal, we will be using a minimalistic domain where 

  • There are a small number of classes
  • Classes have a small number of properties
  • Types belong to three different ontologies: Share-VDE, BIBFRAME, SKOS

In this way, we can catch and discuss many interesting cases, for example, those where ontologies are mixed even within a single class.  

Let’s see the example types in the following diagram:

In the example below, 

  • blue types are from BIBFRAME
  • green types are from Share-VDE 
  • red types are from SKOS 

We introduced several things to make sure we will consider aggregations, composition, and inheritance relationships

  • svde:Work extends bf:Work: that means a relationship where svde:Work IS a bf:Work. That means, among other things, the child inherits the parent properties (uri and bf:title)
  • uri is a property without namespace, it doesn’t make sense to have bf:uri and svde:uri. It always represents the entity identity.
  • svde:Work adds a svde:language property which maps to an svde:Language type
  • svde:Language has two literal properties skos:label and skos:altLabel (in the diagram depicted as classes for highlighting the facet they belong to another namespace) 

Top Level Queries: Only Namespaces

For each namespace used in the reference ontology/ontologies, we put a field in the top level Query type:  

				
					type Query {
    bf: bf
    svde: svde
    skos: skos
}
				
			

Note the schema is a bit counter-intuitive because the naming: field name and type have the same value (skos: skos); not great, not terrible, at runtime, clients do not care so much about classes and their names: what they see are a skos, a bf and svde properties at the top of the hierarchy.

What does it mean in practice? A client that wants to start a “query” interaction with such a system should declare the namespace of the required entity/entities as the first thing:

				
					query QueryOnBibFrameEntitieds (...variables...) {
    bf {
        ...query...
    }
}
				
			

"Namespaced" Queries

Each namespace provides two literal properties (uri and prefix) plus fields to retrieve the owned entities by their URI and using search parameters:  

				
					type bf {
    uri: ID!
    prefix: String!
    work(uri: String!): BfWork
    works(title: String!, offset:Int, rows: Int, sort: String): [BfWork] 
}

type svde {
    uri: ID!
    prefix: String!
    work(uri: String!): SvdeWork
    works(title: String!, offset:Int, rows: Int, sort: String): [SvdeWork]    
    language(uri: String!): SvdeLanguage
    languages(..., offset:Int, rows: Int, sort: String): [SvdeLanguage]
}

type skos {
    uri: ID!
    prefix: String!
}
				
			

Note 

  • the example assumes the only “searchable” property of a Work is the title
  • works and languages should return a complex object with search metadata (pagination, sort) and not only an array of works.
  • The skos namespace doesn’t have any type inside (in the example, there are only literal properties belonging to it)

Types

What about the types mentioned in the previous snippet? SvdeWork, BfWork, SvdeLanguage? 

First point: GraphQL doesn’t have subclasses; there’s an “extends” keyword, but it is used for plugging additional features into existing types. Interface realization and multiple inheritance instead, are possible. 

Second point: Any parent type (i.e., any class we need to subclass) should be declared an interface. That’s the case of BfWork (which is subclassed by SvdeWork).

Third point: to group the properties of a class under the corresponding namespace, we need to define a specific type. So, for example, there will be a BfWorkProperties type and a SvdeWorkProperties type. 

Fourth Point: any leaf type is declared as it is; that’s the case of BfTitle.

Here’s a first draft (there’s still a missing point):

				
					// bf:Title
interface bfTitle {
    mainTitle: String
    subtitle: String
    partNumber: String
    partName: String
}

// the properties of a bf:Work
interface bfWorkProperties {
    title: bfTitle
}


// bfWork must realize the bfWorkProperties interface
interface bfWork implements bfWorkProperties {
    uri:String
    bf: bfWorkProperties
    title: bfTitle
}

// properties of a svde:Language
type skosLanguageProperties {
    label: String
    altLabel: String
}

// svde:Language must realize the bfLanguageProperties
type SvdeLanguage {
    uri: ID!
    skos: skosLanguageProperties
}

// properties of a svde:Language
interface svdeWorkProperties {
    language: SvdeLanguage
}

// svde:Work realizes bf:Work
type svdeWork implements bfWork {
    uri:String
    bf: bfWorkProperties
    svde: svdeWorkProperties
}

				
			

There is still a point which can be improved. If I want to ask the title and the language of a svde:Work the query looks like the following:

				
					query WorkDetailsQuery {
    svde {
        work(uri: "https://svde.org/works/1234") {
            svde {
                language {
                    label
                    altLabel
                }
            }
            
            bf {
                title {
                    mainTitle
                    subTitle
                }
            }
        }
    }
}
				
			

Although that is working, we can avoid a bit of redundancy in the namespace declaration. The Work belongs to the svde namespace and the language, too.

If I am in a type that belongs to a namespace, there should be a way to avoid repeating the namespace for the properties already in that context. I mean something like this:

				
					query WorkDetailsQuery {
    svde {
        work(uri: "https://svde.org/works/1234") {
            language {
                label
                altLabel
            }

            bf {
                title {
                    mainTitle
                    subTitle
                }
            }
        }
    }
}
				
			

Note the language field is no longer within the svde namespace. It seems a trivial difference, but if you imagine a context where there are many namespaces and many properties, the query shape will surely get the benefit.

That requires a slight change in our schema: the SvdeWork should have a svdeWorkProperties as a member, and in addition, it should implement itself the svdeWorkProperties interface:

				
					type svdeWork implements bfWork, svdeWorkProperties {
    uri:String
    bf: bfWorkProperties
    svde: svdeWorkProperties
    
    language: SvdeLanguage
}

				
			

So far, we have been able to design a schema with a good compromise (at least to me) between readability and verbosity. As said, if GraphQL would support namespaces, things should have been easier, but unfortunately, it doesn’t. 

Next Steps

The three points at the beginning of the article have already defined the next steps. 

We have a design for the schema; the next goal is to create a software component that reads an ontology definition and creates the GraphQL data structures described in the proposal above. 

That is not an easy task: the component should take into account the following crucial challenges:

  • Vertical relationships like generalizations or realizations. Inheritance, as we said, has its meaning in GraphQL, which differs significantly from what is defined in OWL.    
  • Implicit relationships like “inverseOf” and “symmetric” could impact the data structures to build. 
  • Namespaces. In the proposal above, we described a design for dealing with them, but implementing such a design is challenging.
  • Flexibility: remember the requirement is that the engine must not be tied to a specific ontology   

Share this post

Leave a Reply