Example 2: SmCCNet + GNN Embeddings + Subject Representation¶
This tutorial illustrates how to:
Build: an adjacency matrix with SmCCNet.
Enhance Representation: Generate node embeddings using GNNEmbedding.
Integrate: Incorporate these embeddings into subject-level omics data using SubjectRepresentation.
Workflow:
Construct: - A multi-omics network adjacency using SmCCNet.
Generate: - Node embeddings with a Graph Neural Network (GNN).
Integrate: - These embeddings into subject-level omics data for enhanced representation.
Subject-level embeddings provide richer phenotypic and clinical context.¶
View full-size image: Subject Representation
Step-by-Step Instructions:
Data Setup: - Load omics data, phenotype data, and clinical data using DatasetLoader.
Network Construction (SmCCNet): - Call auto_pysmccnet() to produce an adjacency matrix from multi-omics data.
Generate GNN Embeddings: - Pass the adjacency, omics data, and (optionally) clinical data to GNNEmbedding. - Use .fit() and .embed() to generate node embeddings.
Subject Representation: - Integrate these embeddings into omics data via SubjectRepresentation.
Below is a complete Python implementation:
import pandas as pd
from bioneuralnet.datasets import DatasetLoader
from bioneuralnet.network import auto_pysmccnet
from bioneuralnet.network_embedding import GNNEmbedding
from bioneuralnet.downstream_task import SubjectRepresentation
# 1) Data Setup
Example = DatasetLoader("example")
omics_genes = Example.data["X1"]
omics_proteins = Example.data["X2"]
phenotype = Example.data["Y"]
clinical = Example.data["clinical_data"]
# 2) Network Construction
result = auto_pysmccnet(
X=[omics1, omics2],
Y=phenotype,
DataType=["genes", "mirna"],
subSampNum=1000,
seed=SEED,
Kfold=3,
BetweenShrinkage=5,
CutHeight=1 - 0.1**10,
summarization="NetSHy",
)
global_network = result["AdjacencyMatrix"]
subnetworks = result["Subnetworks"]
# 3) Generate embeddings using GNNEmbedding
merged_omics = pd.concat([omics_genes, omics_proteins], axis=1)
gnn_embedding = GNNEmbedding(
adjacency_matrix=global_network,
omics_data=merged_omics,
phenotype_data=phenotype,
clinical_data=clinical,
tune=True,
)
gnn_embedding.fit()
embeddings_output = gnn_embedding.embed(as_df=True)
print(f"GNN embeddings generated. Shape: {embeddings_output.shape}")
# 4) Enhance subject profiles using with the embeddings from GNNs with SubjectRepresentation
graph_embedding = SubjectRepresentation(
omics_data=merged_omics,
embeddings=embeddings_output,
phenotype_data=phenotype,
tune=True,
)
enhanced_data = graph_embedding.run()
print(f"Enhanced omics data shape: {enhanced_data.shape}")
# Save enhanced omics data
enhanced_data.to_csv("enhanced_omics_data.csv")
Results:
Adjacency Matrix generated using SmCCNet.
Node Embeddings from GNN.
Enhanced Omics Data, integrating node embeddings for subject-level analysis.