Let’s Call a Spade a Spade: RDF and LPG — Cousins Who Should Learn to Live Together

An objective comparison of the RDF and LPG data models The post Let’s Call a Spade a Spade: RDF and LPG — Cousins Who Should Learn to Live Together appeared first on Towards Data Science.

Apr 8, 2025 - 00:51

In recent years, there has been a proliferation of articles, LinkedIn posts, and marketing materials presenting graph data models from different perspectives. This article will refrain from discussing specific products and instead focus solely on the comparison of RDF (Resource Description Framework) and LPG (Labelled Property Graph) data models. To clarify, there is no mutually exclusive choice between RDF and LPG — they can be employed in conjunction. The appropriate choice depends on the specific use case, and in some instances both models may be necessary; there is no single data model that is universally applicable. In fact, polyglot persistence and multi—model databases (databases that can support different data models within the database engine or on top of the engine), are gaining popularity as enterprises recognise the importance of storing data in diverse formats to maximise its value and prevent stagnation. For instance, storing time series financial data in a graph model is not the most efficient approach, as it could result in minimal value extraction compared to storing it in a time series matrix database, which enables rapid and multi—dimensional analytical queries.

The purpose of this discussion is to provide a comprehensive comparison of RDF and Lpg data models, highlighting their distinct purposes and overlapping usage. While articles often present biased evaluations, promoting their own tools, it is essential to acknowledge that these comparisons are often flawed, as they compare apples to wheelbarrows rather than apples to apples. This subjectivity can leave readers perplexed and uncertain about the author’s intended message. In contrast, this article aims to provide an objective analysis, focusing on the strengths and weaknesses of both RDF and LPG data models, rather than acting as promotional material for any tool.

Quick recap of the data models

Both Rdf and LPG are descendants of the graph data model, although they possess different structures and characteristics. A graph comprises vertices (nodes) and edges that connect two vertices. Various graph types exist, including undirected graphs, directed graphs, multigraphs, hypergraphs and so on. The RDF and LPG data models adopt the directed multigraph approach, wherein edges have the “from” and “to” ordering, and can join an arbitrary number of distinct edges.

The RDF data model is represented by a set of triples reflecting the natural language structure of subject—verb—object, with the subject, predicate, and object represented as such. Consider the following simple example: Jeremy was born in Birkirkara. This sentence can be represented as an RDF statement or fact with the following structure — Jeremy is a subject resource, the predicate (relation) is born in, and the object value of Birkirkara. The value node could either be a URI (unique resource identifier) or a datatype value (such as integer or string). If the object is a semantic URI, or as they are also known a resource, then the object would lead to other facts, such as Birkirkara townIn Malta. This data model allows for resources to be reused and interlinked in the same RDF—based graph, or in any other RDF graph, internal or external. Once a resource is defined and a URI is “minted”, this URI becomes instantly available and can be used in any context that is deemed necessary.

On the other hand, the LPG data model encapsulates the set of vertices, edges, label assignment functions for vertices and edges, and key—value property assignment function for vertices and edges. For the previous example, the representation would be as follows:


(person:Person {name: "Jeremy"})

(city:City {name: "Birkirkara"}) 

(person)—[:BORN_IN]—>(city)

Consequently, the primary distinction between RDF and LPG lies within how nodes are connected together. In the RDF model, relationships are triples where predicates define the connection. In the LPG data model, edges are first—class citizens with their own properties. Therefore, in the RDF data model, predicates are globally defined in a schema and are reused in data graphs, whilst in the LPG data model, each edge is uniquely identified.

Schema vs Schema—less. Do semantics matter at all?

Semantics is a branch of linguistics and logic that is concerned about the meaning, in this case the meaning of data, enabling both humans and machines to interpret the context of the data and any relationships in the said context.

Historically, the World Wide Web Consortium (W3C) established the Resource Description Framework (RDF) data model as a standardised framework for data exchange within the Web. RDF facilitates seamless data integration and the merging of diverse sources, while simultaneously supporting schema evolution without necessitating modifications to data consumers. Schemas¹, or ontologies, serve as the foundation for data represented in RDF, and through these ontologies the semantic meaning of the data can be defined. This capability makes data integration one of the numerous suitable applications of the RDF data model. Through various W3C groups, standards were established on how schemas and ontologies can be defined, primarily RDF Schema (RDFS), Web Ontology Language (OWL), and recently SHACL. RDFS provides the low—level constructs for defining ontologies, such as the Person entity with properties name, gender, knows, and the expected type of node. OWL provides constructs and mechanisms for formally defining ontologies through axioms and rules, enabling the inference of implicit data. Whilst OWL axioms are taken as part of the knowledge graph and used to infer additional facts, SHACL was introduced as a schema to validate constraints, better known as data shapes (consider it as “what should a Person consist of?”) against the knowledge graph. Moreover, through additional features to the SHACL specifications, rules and inference axioms can also be defined using SHACL.

In summary, schemas facilitate the enforcement of the right instance data. This is possible because the RDF permits any value to be defined within a fact, provided it adheres to the specifications. Validators, such as in—built SHACL engines or OWL constructs, are responsible for verifying the data’s integrity. Given that these validators are standardised, all triple stores, those adhering to the RDF data model, are encouraged to implement them. However, this does not negate the concept of flexibility. The RDF data model is designed to accommodate the growth, extension, and evolution of data within the schema’s boundaries. Consequently, while an RDF data model strongly encourages the use of schemas (or ontologies) as its foundation, experts discourage the creation of ivory tower ontologies. This endeavour does require an upfront effort and collaboration with domain experts to construct an ontology that accurately reflects the use case and the data that will be stored in the knowledge graph. Nonetheless, the RDF data model offers the flexibility to create and define RDF—based data independently of a pre—existing ontology, or to develop an ontology iteratively throughout a data project. Furthermore, schemas are designed for reuse, and the RDF data model facilitates this reusability. It is noteworthy that an RDF—based knowledge graph typically encompasses both instance data (such as “Giulia and Matteo are siblings”) and ontology/schema axioms (such as “Two people are siblings when they have a parent in common”).

Nonetheless, the significance of ontologies extends beyond providing a data structure; they also impart semantic meaning to the data. For instance, in constructing a family tree, an ontology enables the explicit definition of relationships such as aunt, uncle, cousins, niece, nephew, ancestors, and descendants without the need for the explicit data to be defined in the knowledge graph. Consider how this concept can be applied in various pharmaceutical scenarios, just to mention one vertical domain. Reasoning is a fundamental component that renders the RDF data model a semantically powerful model for designing knowledge graphs. Ontologies provide a particular data point with all the necessary context, including its neighbourhood and its meaning. For instance, if there is a literal node with the value 37, an RDF—based agent can comprehend that the value 37 represents the age of a person named Jeremy, who is the nephew of a person named Peter.

In contrast, the LPG data model offers a more agile and straightforward deployment of graph data. LPGs have reduced focus on schemas (they only support some constraints and “labels”/classes). Graph databases adhering to the LPG data model are known for their speed in preparing data for consumption due to its schema—less nature. This makes them a more suitable choice for data architects seeking to deploy their data in such a manner. The LPG data model is particularly advantageous in scenarios where data is not intended for growth or significant changes. For instance, a modification to a property would necessitate refactoring the graph to update nodes with the newly added or updated key—value property. While LPG provides the illusion of providing semantics through node and edge labels and corresponding functions, it does not inherently do so. LPG functions consistently return a map of values associated with a node or edge. Nonetheless, this is fundamental when dealing with use cases that need to perform fast graph algorithms as the data is available directly in the nodes and edges, and there is no need for further graph traversal.

However, one fundamental feature of the LPG data model is its ease and flexibility of attaching granular attributes or properties to either vertices or edges. For instance, if there are two person nodes, “Alice” and “Bob,” with an edge labelled “marriedTo,” the LPG data model can accurately and easily state that Alice and Bob were married on February 29, 2024. In contrast, the RDF data model could achieve this through various workarounds, such as reification, but this would result in more complex queries compared to the LPG data model’s counterpart.

Standards, Standardisation Bodies, Interoperability.

In the previous section we described how W3C provides standardisation groups pertaining to the RDF data model. For instance, a W3C working group is actively developing the RDF* standard, which incorporates the complex relationship concept (attaching attributes to facts/triples) within the RDF data model. This standard is anticipated to be adopted and supported by all triple stores tools and agents based on the RDF data model. However, the process of standardisation can be protracted, frequently resulting in delays that leave such vendors at a disadvantage.

Nonetheless, standards facilitate much—needed interoperability. Knowledge Graphs built upon the RDF data model can be easily ported between different applications and triple store, as they have no vendor lock—in, and standardisation formats are provided. Similarly, they can be queried with one standard query language called SPARQL, which is used by the different vendors. Whilst the query language is the same, vendors opt for different query execution plans, equivalent to how any database engine (SQL or NoSQL) is implemented, to enhance performance and speed.

Most LPG graph implementations, although open source, utilise proprietary or custom languages for storing and querying data, lacking a standard adherence. This practice decreases interoperability and portability of data between different vendors. However, in recent months, ISO approved and published ISO/IEC 39075:2024 that standardises the Graph Query Language (GQL) based on Cypher. As the charter rightly points out, the graph data model has unique advantages over relational databases such as fitting data that is meant to have hierarchical, complex or arbitrary structures. Nevertheless, the proliferation of vendor—specific implementations overlooks a crucial functionality – a standardised approach to querying property graphs. Therefore, it is paramount that property graph vendors reflect their products to this standard.

Recently, OneGraph² was proposed as an interoperable metamodel that is meant to overcome the choice between the RDF data model and the LPG data model. Furthermore, extensions to openCypher are proposed³ to allow the querying over RDF data to be extended as a way of querying over RDF data. This vision aims to pave the way for having data in both RDF and LPG combined in a single, integrated database, ensuring the benefits of both data models.

Other notable differences

Notable differences, mostly in query languages, are there to support the data models. However, we strongly argue against the fact that a set of query language features should dictate which data model to use. Nonetheless, we will discuss some of the differences here for a more complete overview.

The RDF data model offers a natural way of supporting global unique resource identifiers (URIs), which manifest in three distinct characteristics. Within the RDF domain, a set of facts described by an RDF statement (i.e. s, p, o) having the same subject URI is referred to as a resource. Data stored in RDF graphs can be conveniently split into multiple named graphs, ensuring that each graph encapsulates distinct concerns. For instance, using the RDF data model it is straightforward to construct graphs that store data or resources, metadata, audit and provenance data separately, whilst interlinking and querying capabilities can be seamlessly executed across these multiple graphs. Furthermore, graphs can establish interlinks with resources located in graphs hosted on different servers. Querying these external resources is facilitated through query federation within the SPARQL protocol. Given the adoption of URIs, RDF embodies the original vision of Linked Data⁴, a vision that has since been adopted, to an extent, as a guiding principle in the FAIR principles⁵, Data Fabric, Data Mesh, and HATEOAS amongst others. Consequently, the RDF data model serves as a versatile framework that can seamlessly integrate with these visions without the need for any modifications.

LPGs, on the other hand, are better geared towards path traversal queries, graph analytics and variable length path queries. Whilst these functionalities can be considered as specific implementations in the query language, they are pertinent considerations when modelling data in a graph, since these are also benefits over traditional relational databases. SPARQL, through the W3C recommendation, has limited support to path traversal⁶, and some vendor triple store implementations do support and implement (although not as part of the SPARQL 1.1 recommendation) variable length path⁷. At time of writing, the SPARQL 1.2 recommendation will not incorporate this feature either.

Data Graph Patterns

The following section describes various data graph patterns and how they would fit, or not, both data models discussed in this article.

Pattern	RDF data model	LPG data model
Global Definition of relations/properties	Through schemas properties are globally defined through various semantic properties such as domain and ranges, algebraic properties such as inverse of, reflexive, transitive, and allow for informative annotations on properties definitions.	Semantics of relations (edges) is not supported in property graphs
Multiple Languages	String data can have a language tag attached to it and is considered when processing	Can be a custom field or relationship (e.g. label_en, label_mt) but have no special treatment.
Taxonomy – Hierarchy	Automatic inferencing, reasoning and can handle complex classes.	Can model hierarchies, but not model hierarchies of classes of individuals. Would require explicit traversal of classification hierarchies
Individual Relationships	Requires workarounds like reification and complex queries.	Can make direct assertions over them, natural representation and efficient querying.
Property Inheritance	Properties inherited through defined class hierarchies. Furthermore, the RDF data model has the ability to represent subproperties.	Must be handled in application logic.
N—ary Relations	Generally binary relationships are represented in triples, but N—ary relations can be done via blank nodes, additional resources, or reification.	Can often be translated to additional attributes on edges.
Property Constraints and Validation	Available through schema definitions: RDFS, OWL or SHACL.	Supports minimal constraints such as value uniqueness but generally requires validation through schema layers or application logic.
Context and Provenance	Can be done in various ways, including having a separate named graph and links to the main resources, or through reification.	Can add properties to nodes and edges to capture context and provenance.
Inferencing	Automate the inferencing of inverse relationships, transitive patterns, complex property chains, disjointness and negation.	Either require explicit definition, in application logic, or no support at all (disjointness and negation).

Semantics in Graphs — A Family Tree Example

A comprehensive exploration of the application of RDF data model and semantics within an LPG application can be found in various articles published on Medium, LinkedIn, and other blogs. As outlined in the previous section, the LPG data model is not specifically designed for reasoning purposes. Reasoning involves applying logical rules on existing facts as a way to deduce new knowledge; this is important as it helps uncover hidden relationships that were not explicitly stated before.

In this section we will demonstrate how axioms are defined for a simple yet practical example of a family tree. A family tree is an ideal candidate for any graph database due to its hierarchical structure and its flexibility in being defined within any data model. For this demonstration, we will model the Pewterschmidt family, which is a fictional family from the popular animated television series Family Guy.

All images, unless otherwise noted, are by the author.

In this case, we are just creating one relationship called ‘hasChild’. So, Carter has a child named Lois, and so on. The only other attribute we’re adding is the gender (Male/Female). For the RDF data model, we have created a simple OWL ontology:

A diagram of a child

AI-generated content may be incorrect.

The current schema enables us to represent the family tree in an RDF data model. With ontologies, we can commence defining the following properties, whose data can be deduced from the initial data. We introduce the following properties:

Property	Comment	Axiom	Example
isAncestorOf	A transitive property which is also the inverse of the isDescendentOf property. OWL engines automatically infer transitive properties without the need of rules.	hasChild(?x, ?y) —> isAncestorOf(?x, ?y)	Carter – isAncestorOf —> Lois – isAncestorOf —> Chris Carter – isAncestorOf —> Chris
isDescendentOf	A transitive property, inverse of isAncestorOf. OWL engines automatically infers inverse properties without the need of rules	—	Chris – isDescendentOf —> Peter
isBrotherOf	A subproperty of isSiblingOf and disjoint with isSisterOf, meaning that the same person cannot be the brother and the sister of another person at the same time, whilst they cannot be the brother of themselves.	hasChild(?x, ?y), hasChild(?x, ?z), hasGender(?y, Male), notEqual(?y, ?z) —> isBrotherOf(?y, ?z)	Chris – isBrotherOf —> Meg
isSisterOf	A subproperty of isSiblingOf and disjoint with isBrotherOf, meaning that the same person cannot be the brother and the sister or another person at the same time, whilst they cannot be the brother of themselves.	hasChild(?x, ?y), hasChild(?x, ?z), hasGender(?y, Female), notEqual(?y, ?z) —> isSisterOf(?y, ?z)	Meg – isSisterOf —> Chris
isSiblingOf	A super—property of isBrotherOf and isSisterOf. OWL engines automatically infers super—properties	—	Chris – isSiblingOf —> Meg
isNephewOf	A property that infers the aunts and uncles of children based on their gender.	isSiblingOf(?x, ?y), hasChild(?x, ?z), hasGender(?z, Male), notEqual(?y, ?x) —> isNephewOf(?z, ?y	Stewie – isNephewOf —> Carol
isNieceOf	A property that infers the aunts and uncles of children based on their gender.	isSiblingOf(?x, ?y), hasChild(?x, ?z), hasGender(?z, Female), notEqual(?y, ?x) —> isNieceOf(?z, ?y)	Meg – isNieceOf —> Carol

These axioms are imported into a triple store, to which the engine will apply them to the explicit facts in real—time. Through these axioms, triple stores allow the querying of inferred/hidden triples.. Therefore, if we want to get the explicit information about Chris Griffin, the following query can be executed:

SELECT ?p ?o WHERE {
  ?p ?o EXPLICIT true
}

If we need to get the inferred values for Chris, the SPARQL engine will provide us with 10 inferred facts:

SELECT ?p ?o WHERE {
  ?p ?o EXPLICIT false
}

This query will return all implicit facts for Chris Griffin. The image below shows the discovered facts. These are not explicitly stored in the triple store.

These results could not be produced by the property graph store, as no reasoning could be applied automatically.

The RDF data model empowers users to discover previously unknown facts, a capability that the LPG data model lacks. Nevertheless, LPG implementations can bypass this limitation by developing complex stored procedures. However, unlike in RDF, these stored procedures may have variations (if at all possible) across different vendor implementations, rendering them non—portable and impractical.

Take-home message

In this article, the RDF and LPG data models have been presented objectively. On the one hand, the LPG data model offers a rapid deployment of graph databases without the need for an advanced schema to be defined (i.e. it is schema—less). Conversely, the RDF data model requires a more time—consuming bootstrapping process for graph data, or knowledge graph, due to its schema definition requirement. However, the decision to adopt one model over the other should consider whether the additional effort is justified in providing meaningful context to the data. This consideration is influenced by specific use cases. For instance, in social networks where neighbourhood exploration is a primary requirement, the LPG data model may be more suitable. On the other hand, for more advanced knowledge graphs that necessitate reasoning or data integration across multiple sources, the RDF data model is the preferred choice.

It is crucial to avoid letting personal preferences for query languages dictate the choice of data model. Regrettably, many articles available primarily serve as marketing tools rather than educational resources, hindering adoption and creating confusion within the graph database community. Furthermore, in the era of abundant and accessible information, it would be better for vendors to refrain from promoting misinformation about opposing data models. A general misconception promoted by property graph evangelists is that the RDF data model is overly complex and academic, leading to its dismissal. This assertion is based on a preferential prejudice. RDF is both a machine and human readable data model that is close to business language, especially through the definition of schemas and ontologies. Moreover, the adoption of the RDF data model is widespread. For instance, Google uses the RDF data model as their standard to represent meta—information about web pages using schema.org. There is also the assumption that the RDF data model will exclusively function with a schema. This is also a misconception, as after all, the data defined using the RDF data model could also be schema—less. However, it is acknowledged that all semantics would be lost, and the data will be reduced to simply graph data. This article also mentions how the oneGraph vision aims to establish a bridge between the two data models.

To conclude, technical feasibility alone should not drive implementation decisions in which graph data model to select. Reducing higher—level abstractions to primitive constructs often increases complexity and can impede solving specific use cases effectively. Decisions should be guided by use case requirements and performance considerations rather than merely what is technically possible.

The author would like to thank Matteo Casu for his input and review. This article is dedicated to Norm Friend, whose untimely demise left a void in the Knowledge Graph community.

¹ Schemas and ontologies are used interchangeably in this article.
² Lassila, O. et al. The OneGraph Vision: Challenges of Breaking the Graph Model Lock—In. https://www.semantic-web-journal.net/system/files/swj3273.pdf.
³ Broekema, W. et al. openCypher Queries over Combined RDF and LPG Data in Amazon Neptune. https://ceur-ws.org/Vol-3828/paper44.pdf.
⁴ https://www.w3.org/DesignIssues/LinkedData.html
⁵ https://www.go-fair.org/fair-principles

The post Let’s Call a Spade a Spade: RDF and LPG — Cousins Who Should Learn to Live Together appeared first on Towards Data Science.