Quick Start Guide¶

This notebook provides an end-to-end walkthrough of the BioNeuralNet pipeline using a synthetic demo dataset, following the stages of the Data Decision Framework:

Network construction
Network quality assessment
Phenotype-driven subgraph detection
Downstream disease prediction via Graph Neural Networks

Load Demo Dataset¶

BioNeuralNet includes built-in demo datasets via DatasetLoader, allowing you to explore the full framework without preparing your own data.

Two omics layers: 500 genes and 100 miRNAs
358 pre-aligned samples with phenotype and clinical annotations

#!pip install bioneuralnet
#!pip install torch
#!pip install torch_geometric

import pandas as pd
from bioneuralnet.datasets import DatasetLoader

Example = DatasetLoader("example")
omics1 = Example.data["X1"]
omics2= Example.data["X2"]
phenotype = Example.data["Y"]
clinical = Example.data["clinical"]

# Inspect the first 3 rows and 5 colums.

display(omics1.iloc[:3,:5])
display(omics1.shape)

display(omics2.iloc[:3,:5])
display(omics2.shape)

display(clinical.iloc[:3,:5])
display(clinical.shape)

display(phenotype.iloc[:3,:5])
display(phenotype.shape)

	Gene_1	Gene_2	Gene_3	Gene_4	Gene_5
Samp_1	22.485701	40.353720	31.025745	20.847206	26.697293
Samp_2	37.058850	34.052233	33.487020	23.531461	26.754628
Samp_3	20.530767	31.669623	35.189567	20.952544	25.018826

(358, 500)

	Mir_1	Mir_2	Mir_3	Mir_4	Mir_5
Samp_1	15.223913	17.545826	15.784719	14.891983	10.348205
Samp_2	16.306965	16.672830	13.361529	14.488549	12.660905
Samp_3	16.545119	16.735005	14.617472	17.845267	13.822790

(358, 100)

	Age	Gender	BMI	Chronic_Bronchitis	Emphysema
PatientID
Samp_1	78	0	31.2	1	1
Samp_2	68	1	19.2	1	0
Samp_3	54	1	19.3	0	1

(358, 6)

	phenotype
Samp_1	235.067423
Samp_2	253.544991
Samp_3	234.204994

(358, 1)

Constructing a Multi-omics Network¶

BioNeuralNet supports a variety of graph construction techniques to model relationships between biological features. These graphs serve as the foundation for applying Graph Neural Networks in downstream tasks. Using the following functions, users can convert a multi-omics dataset (subjects in rows, features in columns) into a weighted adjacency matrix suitable for GNN-based analysis.

We recommend building 3-5 candidate networks and evaluating each with the NetworkAnalyzer class before selecting one for downstream modeling. See the Data Decision Framework for guidance on parameter selection and topology assessment.

Supported graph construction methods:

Similarity - cosine or Euclidean similarity with optional k-NN sparsification
Correlation - Pearson or Spearman correlation mapped to [0,1]
Threshold - soft-thresholding on absolute Pearson correlation (WGCNA-style)
Gaussian k-NN - Gaussian (RBF) kernel on Euclidean distances
Phenotype-driven (SmCCNet) - supervised multi-omics network inference via the native auto_pysmccnet integration

Each method is available through the bioneuralnet.network module. Additional details and examples can be found in the Network Construction documentation.

Example 1: SmCCNet 2.0¶

SmCCNet 2.0 is a phenotype-driven network inference method based on Sparse Multiple Canonical Correlation Analysis (SmCCA), constructing multi-omics networks specific to a variable of interest (e.g., a disease phenotype).

It is available via from bioneuralnet.network import auto_pysmccnet. R and the SmCCNet R package are required. For setup instructions see the Network Construction documentation.

Resources

CRAN: SmCCNet on CRAN
GitHub: KechrisLab/SmCCNet
Paper: Liu et al., BMC Bioinformatics (2024)

from bioneuralnet.network import auto_pysmccnet
from bioneuralnet.utils import set_seed

SEED = 123
set_seed(SEED)

result = auto_pysmccnet(
    X=[omics1, omics2],
    Y=phenotype,
    DataType=["genes", "mirna"],
    subSampNum=1000,
    seed=SEED,
    Kfold=3,
    BetweenShrinkage=5,
    CutHeight=1 - 0.1**10,
    summarization="NetSHy",
)

global_network = result["AdjacencyMatrix"]
subnetworks = result["Subnetworks"]

Setting global seed for reproducibility to: 123
CUDA available. Applying seed to all GPU operations
Seed setting complete

**********************************
* Welcome to Automated SmCCNet! *
* Device: cpu
**********************************


This project uses multiomics CCA


--------------------------------------------------
>> Determining scaling factor...
--------------------------------------------------

The scaling factor selection is: 

genes-mirna: 0.07357254609375365
genes-Phenotype: 1.0
mirna-Phenotype: 1.0

--------------------------------------------------
>> Determining best penalty via Cross-Validation...
--------------------------------------------------

>> Generating random K-Fold splits...
Processing fold 1/3...
Processing fold 2/3...
Processing fold 3/3...


The best penalty term for genes is: 0.1
The best penalty term for mirna is: 0.1
Testing Canonical Correlation: 1.696, Prediction Error: 0.037

Running multi-omics CCA with best penalty on complete dataset.

Robust Weights: 100%|██████████| 1000/1000 [00:25<00:00, 38.96it/s]

--------------------------------------------------
>> Now starting network clustering...
--------------------------------------------------

Clustering completed...

--------------------------------------------------
>> Now starting network pruning and summarization score extraction...
--------------------------------------------------


There are 6 network modules before pruning
 Now extracting subnetwork for network module 1
Network module 1 Result:The final network size is: 19 with maximum PC correlation w.r.t. phenotype to be: 0.679
 Now extracting subnetwork for network module 2
Network module 2 Result:The final network size is: 21 with maximum PC correlation w.r.t. phenotype to be: 0.090

************************************
* Execution Finished!            *
************************************

print(f"Global network shape: {global_network.shape}")
print(f"Number of SmCCnet clusters: {len(subnetworks)}")

print(subnetworks[1].keys())

Global network shape: (600, 600)
Number of SmCCnet clusters: 2
dict_keys(['module_id', 'nodes', 'node_indices', 'adjacency', 'correlation', 'pc_correlations', 'netshy', 'omics_correlation'])

Network Quality Assessment¶

Before proceeding to GNN embedding, we use the NetworkAnalyzer class to inspect the topology of both the global network and the strongest subnetwork (Module 1, PC correlation: 0.679). This corresponds to the Network Quality stage of the Data Decision Framework.

Density and isolated nodes confirm the network is suitable for stable GNN message passing.
Hub analysis identifies the most highly connected molecular features driving the subnetwork structure.
Cross-omics connectivity reveals whether network structure is driven by within-omics or between-omics interactions.

For a full parameter reference and topology interpretation guide, see the Data Decision Framework.

from bioneuralnet.network import NetworkAnalyzer

global_analyzer = NetworkAnalyzer(adjacency_matrix=global_network)
global_analyzer.edge_weight_analysis()
global_analyzer.basic_statistics(threshold=0.001)
global_analyzer.cross_omics_analysis(threshold=0.001)
_ = global_analyzer.find_strongest_edges(top_n=10)

module_analyzer = NetworkAnalyzer(adjacency_matrix=subnetworks[1]["adjacency"])
module_analyzer.edge_weight_analysis()
module_analyzer.basic_statistics(threshold=0.001)
module_analyzer.hub_analysis(threshold=0.001, top_n=10)
module_analyzer.cross_omics_analysis(threshold=0.001)
_ = module_analyzer.find_strongest_edges(top_n=5)

Initialized on CUDA
Nodes: 600
Omics types: ['genes', 'mirna']

============================================================
EDGE WEIGHT DISTRIBUTION
============================================================
Total edges (weight > 0): 1,395
Weight statistics:
 Mean: 0.004743
 Std: 0.046160
 Median: 0.000013
 Min: 0.000000
 Max: 1.000000

Percentiles:
  25th: 0.000001
  50th: 0.000013
  75th: 0.000122
  90th: 0.001690
  95th: 0.009074
  99th: 0.063821

Edges at different biological thresholds:
  > 0.001: 192 edges
  > 0.1: 10 edges
  > 0.3: 5 edges
  > 0.5: 3 edges
  > 0.7: 2 edges
  > 0.8: 2 edges
  > 0.9: 2 edges

============================================================
BASIC NETWORK STATISTICS (threshold > 0.001)
============================================================
Nodes: 600
Edges: 192
Density: 0.001068
Avg Degree: 0.64
Max Degree: 32
Min Degree: 0
Isolated Nodes: 564 (94.0%)

============================================================
CROSS-OMICS CONNECTIVITY (threshold > 0.001)
============================================================

Omics Pair           |      Edges | Max Possible |    Density
------------------------------------------------------------
genes (within)       |        136 |      124,750 |   0.001090
genes-mirna          |         56 |       50,000 |   0.001120
mirna (within)       |          0 |        4,950 |   0.000000

============================================================
TOP 10 STRONGEST EDGES
============================================================
Rank  | Feature 1                           | Feature 2                           |     Weight
-----------------------------------------------------------------------------------------------
1     | genes_Gene_7                        | mirna_Mir_2                         |   1.000000
2     | genes_Gene_6                        | mirna_Mir_2                         |   0.960915
3     | genes_Gene_1                        | mirna_Mir_2                         |   0.557853
4     | genes_Gene_6                        | genes_Gene_7                        |   0.484429
5     | genes_Gene_2                        | mirna_Mir_2                         |   0.461481
6     | genes_Gene_1                        | genes_Gene_7                        |   0.262026
7     | genes_Gene_1                        | genes_Gene_6                        |   0.259615
8     | genes_Gene_2                        | genes_Gene_7                        |   0.226277
9     | genes_Gene_2                        | genes_Gene_6                        |   0.217955
10    | genes_Gene_1                        | genes_Gene_2                        |   0.152222
Initialized on CUDA
Nodes: 21
Omics types: ['genes', 'mirna']

============================================================
EDGE WEIGHT DISTRIBUTION
============================================================
Total edges (weight > 0): 210
Weight statistics:
 Mean: 0.000405
 Std: 0.000948
 Median: 0.000040
 Min: 0.000001
 Max: 0.006987

Percentiles:
  25th: 0.000009
  50th: 0.000040
  75th: 0.000382
  90th: 0.001248
  95th: 0.001934
  99th: 0.005424

Edges at different biological thresholds:
  > 0.001: 26 edges
  > 0.1: 0 edges
  > 0.3: 0 edges
  > 0.5: 0 edges
  > 0.7: 0 edges
  > 0.8: 0 edges
  > 0.9: 0 edges

============================================================
BASIC NETWORK STATISTICS (threshold > 0.001)
============================================================
Nodes: 21
Edges: 26
Density: 0.123810
Avg Degree: 2.48
Max Degree: 9
Min Degree: 0
Isolated Nodes: 11 (52.4%)

============================================================
TOP 10 HUB NODES (threshold > 0.001)
============================================================
 1. mirna_Mir_53                             | mirna  | degree: 9
 2. genes_Gene_190                           | genes  | degree: 8
 3. genes_Gene_189                           | genes  | degree: 8
 4. genes_Gene_54                            | genes  | degree: 7
 5. genes_Gene_28                            | genes  | degree: 4
 6. genes_Gene_10                            | genes  | degree: 4
 7. genes_Gene_72                            | genes  | degree: 4
 8. genes_Gene_88                            | genes  | degree: 4
 9. genes_Gene_222                           | genes  | degree: 3
10. genes_Gene_46                            | genes  | degree: 1

============================================================
CROSS-OMICS CONNECTIVITY (threshold > 0.001)
============================================================

Omics Pair           |      Edges | Max Possible |    Density
------------------------------------------------------------
genes (within)       |         17 |          190 |   0.089474
genes-mirna          |          9 |           20 |   0.450000
mirna (within)       |          0 |            0 |   0.000000

============================================================
TOP 5 STRONGEST EDGES
============================================================
Rank  | Feature 1                           | Feature 2                           |     Weight
-----------------------------------------------------------------------------------------------
1     | genes_Gene_189                      | mirna_Mir_53                        |   0.006987
2     | genes_Gene_189                      | genes_Gene_190                      |   0.005841
3     | genes_Gene_190                      | mirna_Mir_53                        |   0.005524
4     | genes_Gene_54                       | mirna_Mir_53                        |   0.004413
5     | genes_Gene_54                       | genes_Gene_189                      |   0.003219

# smccnet prefixes the network: the following lines remove this prefix to allow dpmon and hybrid louvain to match on omics/nodes
# ONLY need to run this two lines once

global_network.index = global_network.index.str.split('_', n=1).str[1]
global_network.columns = global_network.columns.str.split('_', n=1).str[1]

Phenotype-Driven Subgraph Detection¶

The NetworkAnalyzer results reveal a sparse global network (density: 0.001, 192 edges) with strong cross-omics interactions concentrated around a small set of hub features. To systematically identify the biologically meaningful subgraphs driving phenotype association within this structure, we apply the HybridLouvain module.

HybridLouvain alternates between Correlated Louvain community detection and Correlated PageRank refinement to isolate subgraphs that simultaneously maximize phenotype correlation and internal connectivity. Detecting significant subgraphs within multi-omics networks can help identify complex biological relationships involved in disease progression (Abdel-Hafiz et al., 2022).

Key parameters and their trade-offs:

k_L: controls the balance between modularity and phenotype correlation in the Louvain objective. Lower values (e.g., 0.2) emphasize correlation; higher values (e.g., 0.8) favor well-connected communities but may reduce phenotype relevance.
teleport_prob: higher values restrict PageRank to nodes near the seed set, producing smaller and more focused subgraphs; lower values allow broader exploration of the network.
k_P: controls the balance between conductance and correlation in the PageRank sweep cut. Lower values prioritize phenotype signal over topological compactness.
max_iter: correlation generally peaks at iterations 3–5 before declining as the subgraph becomes over-restricted.

from bioneuralnet.clustering import HybridLouvain
omics_combined = pd.concat([omics1, omics2], axis=1)

hl = HybridLouvain(
    G=global_network,
    B=omics_combined,
    Y=phenotype,
    k_L=0.7,
    k_P=0.8,
    max_iter=10,
    min_nodes=5,
    seed=123,
)
subnetworks_hl = hl.run(as_dfs=True)

Converting adjacency DataFrame to NetworkX graph.
HybridLouvain: nodes=600, edges=1395, features=600, k_L=0.7, teleport=0.05, k_P=0.8, max_iter=10

--- Iteration 1: 600 nodes remaining ---
CorrelatedLouvain: n=600, k_L=0.7, matched_features=600/600
Level 1: 600 nodes
Level 2: 571 nodes
Correlated Louvain done: 571 communities, Q=-0.1574, Q*=-0.0554
Top community: |rho|=0.4729, size=2
CorrelatedPageRank: nodes=600, teleport=0.05 (nx_alpha=0.95), k_P=0.8
Personalization: 2 nodes, max_weight=0.0500, min_weight=0.0500
PageRank computation completed.
Sweep cut: size=42, cond=0.6221, corr=0.5752, composite=0.3826
  Iteration 0: 42 nodes, |rho|=0.5752

--- Iteration 2: 42 nodes remaining ---
CorrelatedLouvain: n=42, k_L=0.7, matched_features=42/42
Level 1: 42 nodes
Level 2: 20 nodes
Correlated Louvain done: 20 communities, Q=0.0001, Q*=0.0472
Top community: |rho|=0.5893, size=2
CorrelatedPageRank: nodes=42, teleport=0.05 (nx_alpha=0.95), k_P=0.8
Personalization: 2 nodes, max_weight=0.0500, min_weight=0.0500
PageRank computation completed.
Sweep cut: size=5, cond=0.8758, corr=-0.5133, composite=0.598
  Iteration 1: 5 nodes, |rho|=0.5133

--- Iteration 3: 5 nodes remaining ---
CorrelatedLouvain: n=5, k_L=0.7, matched_features=5/5
Level 1: 5 nodes
Level 2: 3 nodes
Correlated Louvain done: 3 communities, Q=-0.0007, Q*=0.1092
Top community: |rho|=0.5893, size=2
CorrelatedPageRank: nodes=5, teleport=0.05 (nx_alpha=0.95), k_P=0.8
Personalization: 2 nodes, max_weight=0.0500, min_weight=0.0500
PageRank did not converge; retrying with 200 iters.
PageRank computation completed.
Sweep cut found no valid cluster.
Sweep cut found no valid cluster.
PageRank produced < 2 nodes. Done.

Hybrid Louvain completed: 2 iterations, best |rho|=0.5752 at iteration 0
All iterations:
  iter 0: size=42, |rho|=0.5752, conductance=0.6221
  iter 1: size=5, |rho|=0.5133, conductance=0.8758
  Best subgraph: 42 nodes, |rho|=0.5752 (iter 0)

Subgraph Visualization¶

We visualize the detected subgraphs using plot_network, which renders the weighted adjacency structure and returns a ranked node mapping table. Key parameters:

weight_threshold: filters edges below this value; adjust based on the network’s weight distribution to balance visual clarity against information loss.
show_labels: displays numeric node indices, mapped to feature names in the returned DataFrame.
show_edge_weights: annotates edges with their raw adjacency weights.
layout: graph layout algorithm; "kamada" (default) works well for sparse biological networks, "spring" and "spectral" are available for denser structures.

Node size scales with degree, making hub features visually prominent. The summary box reports full cluster size before and after threshold filtering.

from bioneuralnet.metrics import plot_network

# Iteration 0: 42 nodes, |rho| = 0.575
iteration0 = subnetworks_hl[0]

mapping_0 = plot_network(
    iteration0,
    weight_threshold=0.01,
    show_labels=True,
    show_edge_weights=True
)
print(mapping_0.head(10))

# Iteration 1: 5 nodes, |rho| = 0.513
iteration1 = subnetworks_hl[1]

mapping_1 = plot_network(
    iteration1,
    weight_threshold=0.001,
    show_labels=True,
    show_edge_weights=True
)
print(mapping_1.head(10))

../_images/65af4b8b778cdc4f4ddeeddfc6c581f3a0b08c9a0acdabf29e1c3d71597c9396.png

           Omic  Degree
Index                  
       Mir_2       7
      Gene_7       4
     Gene_28       2
    Gene_485       2
    Gene_222       1
     Gene_46       1
      Mir_53       1
    Gene_214       1
    Gene_219       1

../_images/49e34cdc90f310fa27cad850fdf64ee03541046680cb342f24663dee6954fdb8.png

           Omic  Degree
Index                  
       Mir_2       4
      Gene_7       4
    Gene_485       2
    Gene_214       2
    Gene_219       2

Disease Classification with DPMON¶

DPMON enables end-to-end network-based phenotype prediction by integrating omics data, clinical variables, and network structure within a single pipeline. It combines the adjacency network from the previous step with multi-omics data and optional clinical covariates, supporting GCN, GAT, GraphSAGE, and GIN architectures.

In this example, we binarize the continuous phenotype using a median split, enabling binary classification.

from bioneuralnet.datasets import DatasetLoader
import pandas as pd

Example = DatasetLoader("example")
omics1 = Example.data["X1"]
omics2 = Example.data["X2"]
phenotype = Example.data["Y"]
clinical = Example.data["clinical"]

phenotype["phenotype"] = (phenotype["phenotype"] > phenotype["phenotype"].median()).astype(int)

print(phenotype)
print(phenotype["phenotype"].value_counts(sort=False))

          phenotype
Samp_1            0
Samp_2            1
Samp_3            0
Samp_4            1
Samp_5            1
...             ...
Samp_354          0
Samp_355          0
Samp_356          1
Samp_357          0
Samp_358          0

[358 rows x 1 columns]
phenotype
0    179
1    179
Name: count, dtype: int64

DPMON: End-to-End Disease Prediction¶

DPMON (Disease Prediction using Multi-Omics Networks) integrates omics data, clinical variables, and network structure into a single end-to-end supervised pipeline. While the module offers extensive configurability, BioNeuralNet is designed to eliminate manual guesswork. Setting tune=True activates automated hyperparameter optimization across an empirically validated search space covering GNN depth, hidden dimensions, learning rate, weight decay, and autoencoder encoding dimension.

Here we run DPMON on the global SmCCNet network (600 nodes), enabling the full embedding space to be projected for latent space visualization and omics modality separation analysis.

cv=True enables stratified K-fold cross-validation (n_folds=5)
tune=True searches the validated parameter space across tune_trials=20 configurations
.run() returns predictions, evaluation metrics, and learned embeddings

For a full parameter reference, see the DPMON API documentation.

from bioneuralnet.downstream_task import DPMON

dpmon = DPMON(
    adjacency_matrix=global_network,
    omics_list=[omics1, omics2],
    phenotype_data=phenotype,
    phenotype_col="phenotype",
    clinical_data=clinical,
    cv=True,
    tune=True,
    n_folds=5,
    repeat_num=1,
    output_dir="dpmon_output"
)

predictions, metrics, embeddings = dpmon.run()

Output directory set to: dpmon_output
Initialized DPMON with model: GAT
Setting global seed for reproducibility to: 1804
CUDA available. Applying seed to all GPU operations
Seed setting complete
Random seed set to: 1804
Running in Cross-Validation mode (cv=True) with 5 folds.
CV Setup: Standard 5-fold split.
Starting Fold 1/5
Building graph with 600 common features.
Node feature matrix shape: (600, 6) (mode=abs_pearson)
Inner CV: 5 folds  |  X shape: (286, 600)  |  Graph nodes: torch.Size([600, 6])
Best trial config: {'gnn_layer_num': 3, 'gnn_hidden_dim': 64, 'lr': 0.00040867069309739006, 'weight_decay': 0.002578131912470874, 'nn_hidden_dim1': 256, 'nn_hidden_dim2': 64, 'ae_encoding_dim': 4, 'ae_architecture': 'original', 'num_epochs': 256, 'gnn_dropout': 0.5, 'gnn_activation': 'relu', 'dim_reduction': 'ae', 'gat_heads': 1}
Best trial val_accuracy: 0.8495
Best trial val_loss: 0.3716
Best trial val_f1_macro: 0.8485
Best trial val_aupr: 0.9328
Fold 1 best config: {'gnn_layer_num': 3, 'gnn_hidden_dim': 64, 'lr': 0.00040867069309739006, 'weight_decay': 0.002578131912470874, 'nn_hidden_dim1': 256, 'nn_hidden_dim2': 64, 'ae_encoding_dim': 4, 'ae_architecture': 'original', 'num_epochs': 256, 'gnn_dropout': 0.5, 'gnn_activation': 'relu', 'dim_reduction': 'ae', 'gat_heads': 1}
Building graph features for Fold 1 using train split only
Building graph with 600 common features.
Node feature matrix shape: (600, 6) (mode=abs_pearson)
Evaluating model for Fold 1 on test set
Fold 1 results:
 Accuracy: 0.8472
 F1 Macro: 0.8465
 F1 Weighted: 0.8465
 Recall: 0.8472
 Precision: 0.8541
 AUC: 0.9344
 AUPR: 0.9381

Starting Fold 2/5
Building graph with 600 common features.
Node feature matrix shape: (600, 6) (mode=abs_pearson)
Inner CV: 5 folds  |  X shape: (286, 600)  |  Graph nodes: torch.Size([600, 6])
Best trial config: {'gnn_layer_num': 4, 'gnn_hidden_dim': 32, 'lr': 0.0003855831045713611, 'weight_decay': 0.000891365148135137, 'nn_hidden_dim1': 256, 'nn_hidden_dim2': 64, 'ae_encoding_dim': 4, 'ae_architecture': 'dynamic', 'num_epochs': 128, 'gnn_dropout': 0.5, 'gnn_activation': 'relu', 'dim_reduction': 'linear', 'gat_heads': 2}
Best trial val_accuracy: 0.8358
Best trial val_loss: 0.3946
Best trial val_f1_macro: 0.8351
Best trial val_aupr: 0.9134
Fold 2 best config: {'gnn_layer_num': 4, 'gnn_hidden_dim': 32, 'lr': 0.0003855831045713611, 'weight_decay': 0.000891365148135137, 'nn_hidden_dim1': 256, 'nn_hidden_dim2': 64, 'ae_encoding_dim': 4, 'ae_architecture': 'dynamic', 'num_epochs': 128, 'gnn_dropout': 0.5, 'gnn_activation': 'relu', 'dim_reduction': 'linear', 'gat_heads': 2}
Building graph features for Fold 2 using train split only
Building graph with 600 common features.
Node feature matrix shape: (600, 6) (mode=abs_pearson)
Evaluating model for Fold 2 on test set
Fold 2 results:
 Accuracy: 0.8889
 F1 Macro: 0.8888
 F1 Weighted: 0.8888
 Recall: 0.8889
 Precision: 0.8901
 AUC: 0.9630
 AUPR: 0.9572

Starting Fold 3/5
Building graph with 600 common features.
Node feature matrix shape: (600, 6) (mode=abs_pearson)
Inner CV: 5 folds  |  X shape: (286, 600)  |  Graph nodes: torch.Size([600, 6])
Best trial config: {'gnn_layer_num': 3, 'gnn_hidden_dim': 64, 'lr': 0.0002949084126925678, 'weight_decay': 0.0008811626734017767, 'nn_hidden_dim1': 256, 'nn_hidden_dim2': 64, 'ae_encoding_dim': 8, 'ae_architecture': 'dynamic', 'num_epochs': 256, 'gnn_dropout': 0.5, 'gnn_activation': 'elu', 'dim_reduction': 'ae', 'gat_heads': 2}
Best trial val_accuracy: 0.8428
Best trial val_loss: 0.3948
Best trial val_f1_macro: 0.8411
Best trial val_aupr: 0.9400
Fold 3 best config: {'gnn_layer_num': 3, 'gnn_hidden_dim': 64, 'lr': 0.0002949084126925678, 'weight_decay': 0.0008811626734017767, 'nn_hidden_dim1': 256, 'nn_hidden_dim2': 64, 'ae_encoding_dim': 8, 'ae_architecture': 'dynamic', 'num_epochs': 256, 'gnn_dropout': 0.5, 'gnn_activation': 'elu', 'dim_reduction': 'ae', 'gat_heads': 2}
Building graph features for Fold 3 using train split only
Building graph with 600 common features.
Node feature matrix shape: (600, 6) (mode=abs_pearson)
Evaluating model for Fold 3 on test set
Fold 3 results:
 Accuracy: 0.8194
 F1 Macro: 0.8166
 F1 Weighted: 0.8166
 Recall: 0.8194
 Precision: 0.8407
 AUC: 0.9144
 AUPR: 0.9165

Starting Fold 4/5
Building graph with 600 common features.
Node feature matrix shape: (600, 6) (mode=abs_pearson)
Inner CV: 5 folds  |  X shape: (287, 600)  |  Graph nodes: torch.Size([600, 6])
Best trial config: {'gnn_layer_num': 4, 'gnn_hidden_dim': 64, 'lr': 0.0007382745174615928, 'weight_decay': 0.0011048358857129027, 'nn_hidden_dim1': 128, 'nn_hidden_dim2': 64, 'ae_encoding_dim': 4, 'ae_architecture': 'dynamic', 'num_epochs': 128, 'gnn_dropout': 0.4, 'gnn_activation': 'relu', 'dim_reduction': 'ae', 'gat_heads': 2}
Best trial val_accuracy: 0.8010
Best trial val_loss: 0.4477
Best trial val_f1_macro: 0.7947
Best trial val_aupr: 0.8976
Fold 4 best config: {'gnn_layer_num': 4, 'gnn_hidden_dim': 64, 'lr': 0.0007382745174615928, 'weight_decay': 0.0011048358857129027, 'nn_hidden_dim1': 128, 'nn_hidden_dim2': 64, 'ae_encoding_dim': 4, 'ae_architecture': 'dynamic', 'num_epochs': 128, 'gnn_dropout': 0.4, 'gnn_activation': 'relu', 'dim_reduction': 'ae', 'gat_heads': 2}
Building graph features for Fold 4 using train split only
Building graph with 600 common features.
Node feature matrix shape: (600, 6) (mode=abs_pearson)
Evaluating model for Fold 4 on test set
Fold 4 results:
 Accuracy: 0.8732
 F1 Macro: 0.8728
 F1 Weighted: 0.8729
 Recall: 0.8726
 Precision: 0.8762
 AUC: 0.9460
 AUPR: 0.9462

Starting Fold 5/5
Building graph with 600 common features.
Node feature matrix shape: (600, 6) (mode=abs_pearson)
Inner CV: 5 folds  |  X shape: (287, 600)  |  Graph nodes: torch.Size([600, 6])
Best trial config: {'gnn_layer_num': 2, 'gnn_hidden_dim': 64, 'lr': 0.0006115393988331093, 'weight_decay': 4.123423193325894e-05, 'nn_hidden_dim1': 256, 'nn_hidden_dim2': 64, 'ae_encoding_dim': 8, 'ae_architecture': 'dynamic', 'num_epochs': 128, 'gnn_dropout': 0.5, 'gnn_activation': 'elu', 'dim_reduction': 'linear', 'gat_heads': 1}
Best trial val_accuracy: 0.8641
Best trial val_loss: 0.3375
Best trial val_f1_macro: 0.8638
Best trial val_aupr: 0.9447
Fold 5 best config: {'gnn_layer_num': 2, 'gnn_hidden_dim': 64, 'lr': 0.0006115393988331093, 'weight_decay': 4.123423193325894e-05, 'nn_hidden_dim1': 256, 'nn_hidden_dim2': 64, 'ae_encoding_dim': 8, 'ae_architecture': 'dynamic', 'num_epochs': 128, 'gnn_dropout': 0.5, 'gnn_activation': 'elu', 'dim_reduction': 'linear', 'gat_heads': 1}
Building graph features for Fold 5 using train split only
Building graph with 600 common features.
Node feature matrix shape: (600, 6) (mode=abs_pearson)
Evaluating model for Fold 5 on test set
Fold 5 results:
 Accuracy: 0.7606
 F1 Macro: 0.7606
 F1 Weighted: 0.7606
 Recall: 0.7607
 Precision: 0.7607
 AUC: 0.8619
 AUPR: 0.8544

Successfully saved global best CV model state to: dpmon_output/best_cv_model.pt
Successfully saved global best CV model state to: dpmon_output/best_cv_model_embds.pt
Successfully saved CV tuning parameters to: dpmon_output/cv_tuning_results.csv
Cross-Validation Results:

 Avg Accuracy: 0.8379 +/- 0.0506
 Avg F1 Macro: 0.8371 +/- 0.0508
 Avg F1 Weighted: 0.8371 +/- 0.0508
 Avg Recall: 0.8378 +/- 0.0505
 Avg Precision: 0.8444 +/- 0.0505
 Avg AUC: 0.9239 +/- 0.0389
 Avg AUPR: 0.9225 +/- 0.0409
DPMON run completed.

Latent Space Visualization¶

We project the feature embeddings learned by DPMON into a 2D latent space using t-SNE. This allows inspection of how different omics layers cluster within the learned representation and whether the GNN has captured meaningful separation between modalities. Node labels map each feature to its source omics layer.

import numpy as np
from bioneuralnet.metrics import plot_embeddings
feature_names = global_network.index.tolist()
embeddings_df = pd.DataFrame(embeddings.numpy(), index=feature_names)

node_labels = []
gene_feats = set(omics1.columns)
mirna_feats = set(omics2.columns)

for feat in feature_names:
    if feat in gene_feats:
        node_labels.append(1)
    elif feat in mirna_feats:
        node_labels.append(2)
    else:
        node_labels.append(0)

node_labels = np.array(node_labels)

plot_embeddings(
    embeddings=embeddings_df,
    node_labels=node_labels,
    legend_labels=["Genes", "miRNA"]
)

../_images/b868a6f31f669030e1e9dd2e1c571a8b9f9ebdfdc7bf57c892f544763cda86ad.png

Summary¶

This notebook demonstrated the complete BioNeuralNet workflow on a synthetic multi-omics dataset:

Network construction via SmCCNet
Topology assessment with NetworkAnalyzer
Phenotype-driven subgraph detection with HybridLouvain
End-to-end binary classification with DPMON (Avg AUC: 0.924)

For a full API reference, see the Documentation.