GNN Embeddings¶

BioNeuralNet leverages Graph Neural Networks (GNNs) to generate biologically meaningful, low-dimensional embeddings from multi-omics network data. These embeddings integrate complex biological interactions and structural information, facilitating accurate downstream analyses, such as phenotype prediction and biomarker discovery.

Core Features¶

Biologically Informed Embeddings: Models like GCN, GAT, GraphSAGE, and GIN produce embeddings informed by network connectivity and biologically relevant supervised signals (e.g., phenotype correlations).
Flexible, Modular Integration: Outputs structured as pandas DataFrames, seamlessly compatible with common bioinformatics workflows.
Comprehensive Workflow: Handles data from initial network construction through embedding generation to disease prediction in a unified, end-to-end pipeline.

Supported GNN Architectures¶

Graph Convolutional Network (GCN): GCN aggregates node features based on local neighborhood structure using spectral-based convolution:

\[X^{(l+1)} \;=\; \mathrm{ReLU}\!\Bigl(\widehat{D}^{-\tfrac{1}{2}}\,\widehat{A}\,\widehat{D}^{-\tfrac{1}{2}}\ X^{(l)}\,W^{(l)}\Bigr)\]

where \(\widehat{A}\) adds self-loops to the adjacency matrix, ensuring that each node also considers its own features.

Graph Attention Network (GAT): GAT assigns learned attention scores to neighbors, enhancing model interpretability and accuracy:

\[h_{i}^{(l+1)} \;=\; \mathrm{ELU}\!\Bigl(\sum_{j \in \mathcal{N}(i)} \alpha_{ij}^{(l)}\,W^{(l)}\,h_{j}^{(l)}\Bigr)\]

with \(\alpha_{ij}^{(l)}\) representing the attention coefficient for node \(j\)’s contribution to node \(i\).

GraphSAGE: GraphSAGE performs inductive learning by aggregating neighboring node features to generalize effectively to unseen data:

\[h_{i}^{(l+1)} \;=\; \sigma\!\Bigl(W^{(l)}\Bigl( h_{i}^{(l)} \,\|\, \mathrm{mean}_{j \,\in\, \mathcal{N}(i)}(h_{j}^{(l)}) \Bigr)\Bigr)\]

where the mean aggregator provides a simple yet effective way to summarize local neighborhood information.

Graph Isomorphism Network (GIN): GIN leverages sum-aggregation and an MLP to discriminate subtle structural variations between graphs:

\[h_i^{(l+1)} \;=\; \mathrm{MLP}^{(l)}\!\Bigl(\,\bigl(1 + \epsilon^{(l)}\bigr) h_{i}^{(l)} + \sum_{j \in \mathcal{N}(i)} h_{j}^{(l)}\Bigr)\]

where \(\epsilon^{(l)}\) is either learnable or fixed.

Task-Driven Embeddings for Phenotype Prediction¶

BioNeuralNet generates embeddings optimized for disease prediction through supervised and semi-supervised training:

Phenotype-Guided Labels: Nodes labeled by correlation with clinical or phenotype data.
Supervised Training Objective: Minimizes MSE between predicted node correlations and actual phenotype correlations, ensuring biologically relevant embeddings.
Subject-Level Integration: Embeddings enhance patient-level datasets, significantly improving classification performance via DPMON (Disease Prediction using Multi-Omics Networks).

Embedding Generation Workflow¶

Embeddings produced by BioNeuralNet capture both topological and biological insights from multi-omics networks:

Subject Representation Workflow — Workflow: Nodes labeled by phenotype correlation, embedded via GNNs, dimensionally reduced (PCA/Autoencoder), then integrated into subject-level data for enhanced predictive accuracy.¶

View full-size image: Subject Representation

Dimensionality Reduction¶

BioNeuralNet provides two main dimensionality reduction techniques post-GNN embedding:

PCA: Simple, linear, interpretable, suitable for datasets where linear assumptions hold.
Autoencoders: Nonlinear, flexible neural-network-based approach capturing complex biological patterns. Recommended with hyperparameter tuning (tune=True) for superior performance on highly dimensional or complex data.

How DPMON Utilizes GNN Embeddings¶

DPMON extends embedding applications to patient-level phenotype prediction:

Integrates node embeddings directly into patient-level features.
Uses a classification head (e.g., softmax with cross-entropy) trained to predict clinical outcomes.
Leverages both local molecular interaction information (node-level embeddings) and global omics data, yielding highly accurate phenotype predictions.

View full-size image: Disease Prediction (DPMON)

Return to BioNeuralNet: Graph Neural Networks for Multi-Omics Network Analysis