A Coding Implementation for Creating, Annotating, and Visualizing Complex Biological Knowledge Graphs Using PyBEL
In this tutorial, we explore how to leverage the PyBEL ecosystem to construct and analyze rich biological knowledge graphs directly within Google Colab. We begin by installing all necessary packages, including PyBEL, NetworkX, Matplotlib, Seaborn, and Pandas. We then demonstrate how to define proteins, processes, and modifications using the PyBEL DSL. From there, we guide […] The post A Coding Implementation for Creating, Annotating, and Visualizing Complex Biological Knowledge Graphs Using PyBEL appeared first on MarkTechPost.

In this tutorial, we explore how to leverage the PyBEL ecosystem to construct and analyze rich biological knowledge graphs directly within Google Colab. We begin by installing all necessary packages, including PyBEL, NetworkX, Matplotlib, Seaborn, and Pandas. We then demonstrate how to define proteins, processes, and modifications using the PyBEL DSL. From there, we guide you through the creation of an Alzheimer’s disease-related pathway, showcasing how to encode causal relationships, protein–protein interactions, and phosphorylation events. Alongside graph construction, we introduce advanced network analyses, including centrality measures, node classification, and subgraph extraction, as well as techniques for extracting citation and evidence data. By the end of this section, you will have a fully annotated BEL graph ready for downstream visualization and enrichment analyses, laying a solid foundation for interactive biological knowledge exploration.
!pip install pybel pybel-tools networkx matplotlib seaborn pandas -q
import pybel
import pybel.dsl as dsl
from pybel import BELGraph
from pybel.io import to_pickle, from_pickle
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from collections import Counter
import warnings
warnings.filterwarnings('ignore')
print("PyBEL Advanced Tutorial: Biological Expression Language Ecosystem")
print("=" * 65)
We begin by installing PyBEL and its dependencies directly in Colab, ensuring that all necessary libraries, NetworkX, Matplotlib, Seaborn, and Pandas, are available for our analysis. Once installed, we import the core modules and suppress warnings to keep our notebook clean and focused on the results.
print("\n1. Building a Biological Knowledge Graph")
print("-" * 40)
graph = BELGraph(
name="Alzheimer's Disease Pathway",
version="1.0.0",
description="Example pathway showing protein interactions in AD",
authors="PyBEL Tutorial"
)
app = dsl.Protein(name="APP", namespace="HGNC")
abeta = dsl.Protein(name="Abeta", namespace="CHEBI")
tau = dsl.Protein(name="MAPT", namespace="HGNC")
gsk3b = dsl.Protein(name="GSK3B", namespace="HGNC")
inflammation = dsl.BiologicalProcess(name="inflammatory response", namespace="GO")
apoptosis = dsl.BiologicalProcess(name="apoptotic process", namespace="GO")
graph.add_increases(app, abeta, citation="PMID:12345678", evidence="APP cleavage produces Abeta")
graph.add_increases(abeta, inflammation, citation="PMID:87654321", evidence="Abeta triggers neuroinflammation")
tau_phosphorylated = dsl.Protein(name="MAPT", namespace="HGNC",
variants=[dsl.ProteinModification("Ph")])
graph.add_increases(gsk3b, tau_phosphorylated, citation="PMID:11111111", evidence="GSK3B phosphorylates tau")
graph.add_increases(tau_phosphorylated, apoptosis, citation="PMID:22222222", evidence="Hyperphosphorylated tau causes cell death")
graph.add_increases(inflammation, apoptosis, citation="PMID:33333333", evidence="Inflammation promotes apoptosis")
graph.add_association(abeta, tau, citation="PMID:44444444", evidence="Abeta and tau interact synergistically")
print(f"Created BEL graph with {graph.number_of_nodes()} nodes and {graph.number_of_edges()} edges")
We initialize a BELGraph with metadata for an Alzheimer’s disease pathway and define proteins and processes using the PyBEL DSL. By adding causal relationships, protein modifications, and associations, we construct a comprehensive network that captures key molecular interactions.
print("\n2. Advanced Network Analysis")
print("-" * 30)
degree_centrality = nx.degree_centrality(graph)
betweenness_centrality = nx.betweenness_centrality(graph)
closeness_centrality = nx.closeness_centrality(graph)
most_central = max(degree_centrality, key=degree_centrality.get)
print(f"Most connected node: {most_central}")
print(f"Degree centrality: {degree_centrality[most_central]:.3f}")
We compute degree, betweenness, and closeness centralities to quantify each node’s importance within the graph. By identifying the most connected nodes, we gain insight into potential hubs that may drive disease mechanisms.
print("\n3. Biological Entity Classification")
print("-" * 35)
node_types = Counter()
for node in graph.nodes():
node_types[node.function] += 1
print("Node distribution:")
for func, count in node_types.items():
print(f" {func}: {count}")
We classify each node by its function, such as Protein or BiologicalProcess, and tally their counts. This breakdown helps us understand the composition of our network at a glance.
print("\n4. Pathway Analysis")
print("-" * 20)
proteins = [node for node in graph.nodes() if node.function == 'Protein']
processes = [node for node in graph.nodes() if node.function == 'BiologicalProcess']
print(f"Proteins in pathway: {len(proteins)}")
print(f"Biological processes: {len(processes)}")
edge_types = Counter()
for u, v, data in graph.edges(data=True):
edge_types[data.get('relation')] += 1
print("\nRelationship types:")
for rel, count in edge_types.items():
print(f" {rel}: {count}")
We separate all proteins and processes to measure the pathway’s scope and complexity. Counting the different relationship types further reveals which interactions, like increases or associations, dominate our model.
print("\n5. Literature Evidence Analysis")
print("-" * 32)
citations = []
evidences = []
for _, _, data in graph.edges(data=True):
if 'citation' in data:
citations.append(data['citation'])
if 'evidence' in data:
evidences.append(data['evidence'])
print(f"Total citations: {len(citations)}")
print(f"Unique citations: {len(set(citations))}")
print(f"Evidence statements: {len(evidences)}")
We extract citation identifiers and evidence strings from each edge to evaluate our graph’s grounding in published research. Summarizing total and unique citations allows us to assess the breadth of supporting literature.
print("\n6. Subgraph Analysis")
print("-" * 22)
inflammation_nodes = [inflammation]
inflammation_neighbors = list(graph.predecessors(inflammation)) + list(graph.successors(inflammation))
inflammation_subgraph = graph.subgraph(inflammation_nodes + inflammation_neighbors)
print(f"Inflammation subgraph: {inflammation_subgraph.number_of_nodes()} nodes, {inflammation_subgraph.number_of_edges()} edges")
We isolate the inflammation subgraph by collecting its direct neighbors, yielding a focused view of inflammatory crosstalk. This targeted subnetwork highlights how inflammation interfaces with other disease processes.
print("\n7. Advanced Graph Querying")
print("-" * 28)
try:
paths = list(nx.all_simple_paths(graph, app, apoptosis, cutoff=3))
print(f"Paths from APP to apoptosis: {len(paths)}")
if paths:
print(f"Shortest path length: {len(paths[0])-1}")
except nx.NetworkXNoPath:
print("No paths found between APP and apoptosis")
apoptosis_inducers = list(graph.predecessors(apoptosis))
print(f"Factors that increase apoptosis: {len(apoptosis_inducers)}")
We enumerate simple paths between APP and apoptosis to explore mechanistic routes and identify key intermediates. Listing all predecessors of apoptosis also shows us which factors may trigger cell death.
print("\n8. Data Export and Visualization")
print("-" * 35)
adj_matrix = nx.adjacency_matrix(graph)
node_labels = [str(node) for node in graph.nodes()]
plt.figure(figsize=(12, 8))
plt.subplot(2, 2, 1)
pos = nx.spring_layout(graph, k=2, iterations=50)
nx.draw(graph, pos, with_labels=False, node_color='lightblue',
node_size=1000, font_size=8, font_weight='bold')
plt.title("BEL Network Graph")
plt.subplot(2, 2, 2)
centralities = list(degree_centrality.values())
plt.hist(centralities, bins=10, alpha=0.7, color='green')
plt.title("Degree Centrality Distribution")
plt.xlabel("Centrality")
plt.ylabel("Frequency")
plt.subplot(2, 2, 3)
functions = list(node_types.keys())
counts = list(node_types.values())
plt.pie(counts, labels=functions, autopct='%1.1f%%', startangle=90)
plt.title("Node Type Distribution")
plt.subplot(2, 2, 4)
relations = list(edge_types.keys())
rel_counts = list(edge_types.values())
plt.bar(relations, rel_counts, color='orange', alpha=0.7)
plt.title("Relationship Types")
plt.xlabel("Relation")
plt.ylabel("Count")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
We prepare adjacency matrices and node labels for downstream use and generate a multi-panel figure showing the network structure, centrality distributions, node-type proportions, and edge-type counts. These visualizations bring our BEL graph to life, supporting a deeper biological interpretation.
In this tutorial, we have demonstrated the power and flexibility of PyBEL for modeling complex biological systems. We showed how easily one can construct a curated white-box graph of Alzheimer’s disease interactions, perform network-level analyses to identify key hub nodes, and extract biologically meaningful subgraphs for focused study. We also covered essential practices for literature evidence mining and prepared data structures for compelling visualizations. As a next step, we encourage you to extend this framework to your pathways, integrating additional omics data, running enrichment tests, or coupling the graph with machine-learning workflows.
Check out the Codes here. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post A Coding Implementation for Creating, Annotating, and Visualizing Complex Biological Knowledge Graphs Using PyBEL appeared first on MarkTechPost.