Interlink and Merge Graphs
Learn how to merge multiple knowledge graphs, deduplicate common entities, and handle conflicting information.
The full code for this example is available on GitHub.
A powerful feature of knowledge graphs is the ability to merge information from different sources. This example demonstrates how to build two separate knowledge graphs and then intelligently merge them into a single, unified graph using the perseus_client.interlink() method.
What You'll Learn
- How to build multiple knowledge graphs from different text files.
- How to merge (interlink) these graphs to deduplicate common entities.
- How to handle conflicting information during the merge process.
- How to save the final, unified graph to Neo4j.
Example Use Case: Combining Information about a Person
Imagine you have two documents with slightly different information about the same people. person1.txt describes Alice as a "software engineer," while person2.txt says she is a "dentist." The interlink() function can identify that the "Alice" in both documents is the same entity and merge them, while providing options for handling the conflicting job titles.
Prerequisites
Before running this example, ensure you have followed the installation guide to set up your environment and obtain the necessary API keys.
Code Walkthrough
This walkthrough covers building, merging, and saving multiple graphs.
Build the Initial Graphs
First, we call perseus_client.build_graph() with a list of file paths. This generates a separate KnowledgeGraph object for each file. We also apply a shared ontology and common metadata to both.
import perseus_client
from typing import List
knowledge_graphs = perseus_client.build_graph(
file_paths=["assets/person1.txt", "assets/person2.txt"],
ontology_path="assets/ontology.ttl",
metadata={"source": "interlink_graphs_example"},
)Interlink the Knowledge Graphs
This is the key step. We pass our list of KnowledgeGraph objects to the perseus_client.interlink() function. This function analyzes the graphs, identifies entities with the same rdfs:label (like "Alice"), and merges them into a single, deduplicated graph.
merged_kg = perseus_client.interlink(
kbs=knowledge_graphs,
merge_properties_on_conflict=True,
)Save the Fused Graph
The resulting merged_kg object can be saved just like any other knowledge graph. Here, we save it to local TTL and CQL files, and also upload it to our Neo4j instance.
merged_kg.save_ttl("./output/merged_graph.ttl")
merged_kg.save_cql("./output/merged_graph.cql")
merged_kg.save_to_neo4j(strip_prefixes=True)Advanced: Handling Conflicting Properties
What happens when two graphs have conflicting information, like Alice's job title? The interlink function has parameters to control this behavior.
-
merge_properties_on_conflict=True(Default): If set toTrue, the merged entity will keep the properties from both sources. In our example, the final "Alice" node would have twohasJobTitleproperties: "software engineer" and "dentist". -
immutable_properties=["hasJobTitle"]: You can provide a list of property names that should not be merged if they conflict. If this is set, theinterlinkfunction will see that thehasJobTitlevalues for "Alice" are different and will not merge the two "Alice" entities. The final graph will contain two separate nodes for Alice.
Here is how you would use immutable_properties:
# The two "Alice" entities will NOT be merged due to the conflict.
merged_kg = perseus_client.interlink(
kbs=knowledge_graphs,
merge_properties_on_conflict=True,
immutable_properties=["hasJobTitle"]
)