bioneuralnet.downstream_task.dpmon¶
DPMON: Optimized Network Embedding and Fusion for Disease Prediction.
This module implements an end-to-end Graph Neural Network (GNN) pipeline integrating network topology with subject-level omics data.
References
Hussein, S. et al. (2024), “Learning from Multi-Omics Networks to Enhance Disease Prediction: An Optimized Network Embedding and Fusion Approach” IEEE BIBM.
- Algorithm:
The pipeline consists of three distinct phases:
- Phase 1: Task-Aware Embedding Generation
Construct a multi-omics network.
Initialize node features using clinical correlation vectors.
Pass graph through a GNN (GAT/GCN/GIN).
- Phase 2: Dimensionality Reduction
Compress embeddings into scalar weights via AutoEncoder/MLP.
- Phase 3: Fusion and Prediction
Fuse embeddings with subject-level data via element-wise multiplication (Feature Reweighting).
Notes
The embedding space is optimized dynamically using the loss function:
The fusion acts as a Network-Guided Attention Mechanism, amplifying features that are topologically central.
Functions
|
Compute average precision (AP) from prediction scores. |
|
Compute the F1 score, also known as balanced F-score or F-measure. |
|
Retrieves a global logger configured to write to 'bioneuralnet.log'. |
|
Binarize labels in a one-vs-all fashion. |
|
Compute the Matthews correlation coefficient (MCC). |
|
Calculate a point biserial correlation coefficient and its p-value. |
|
Compute the precision. |
|
Build node-level features and return a PyTorch Geometric graph. |
|
Compute the recall. |
|
Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores. |
|
Run Ray Tune hyperparameter search with inner k-fold CV. |
|
|
|
Sets seeds for maximum reproducibility across Python, NumPy, and PyTorch. |
|
|
|
|
|
|
|
Split arrays or matrices into random train and test subsets. |
Classes
|
alias of |
|
Compresses high-dimensional node embeddings into a lower-dimensional latent space. |
|
Uses Tune's variant generation for resolving variables. |
|
Command-line reporter |
|
A reference to data persisted as a directory in local or remote storage. |
|
DPMON (Disease Prediction using Multi-Omics Networks) end-to-end pipeline for multi-omics disease prediction. |
|
MLP for final prediction - outputs raw logits. |
|
Graph Attention Network - uses edge_dim=1 to incorporate edge weights. |
|
Graph Convolutional Network |
|
Graph Isomorphism Network - uses GINEConv for edge-weight awareness. |
|
|
|
Core DPMON model combining GNN feature weighting and sample-level prediction. |
|
PurePath subclass that can make system calls. |
|
Repeated class-wise stratified K-Fold cross validator. |
|
GraphSAGE - aligned layer_num convention. |
|
|
|
Class-wise stratified K-Fold cross-validator. |
|
Early stop single trials when they reached a plateau. |
Exceptions
|
General error class raised by ray.tune. |
- class bioneuralnet.downstream_task.dpmon.AutoEncoder(*args: Any, **kwargs: Any)[source]¶
Bases:
ModuleCompresses high-dimensional node embeddings into a lower-dimensional latent space.
- Parameters:
input_dim – Input feature dimension (gnn_hidden_dim).
encoding_dim – Output latent dimension.
architecture – original or dynamic. “original” (input -> 8 -> 4 encoding_dim). “dynamic” (input -> input//2 -> encoding_dim).
- class bioneuralnet.downstream_task.dpmon.DPMON(adjacency_matrix: DataFrame, omics_list: List[DataFrame], phenotype_data: DataFrame, clinical_data: DataFrame | None = None, correlation_mode: str = 'abs_pearson', model: str = 'GAT', phenotype_col: str = 'phenotype', gnn_hidden_dim: int = 16, gnn_layer_num: int = 4, gnn_dropout: float = 0.1, gnn_activation: str = 'relu', dim_reduction: str = 'ae', ae_architecture: str = 'original', ae_encoding_dim: int = 8, nn_hidden_dim1: int = 16, nn_hidden_dim2: int = 8, num_epochs: int = 100, repeat_num: int = 1, n_folds: int = 5, lr: float = 0.1, weight_decay: float = 0.0001, gat_heads: int = 1, tune: bool = False, tune_trials: int = 20, gpu: bool = False, cv: bool = False, cuda: int = 0, seed: int = 1804, seed_trials: bool = False, output_dir: str | None = None)[source]¶
Bases:
objectDPMON (Disease Prediction using Multi-Omics Networks) end-to-end pipeline for multi-omics disease prediction.
Instead of node-level MSE regression, DPMON aggregates node embeddings with patient-level omics data and feeds them to a downstream classification head (e.g., a softmax layer with CrossEntropyLoss) for sample-level disease prediction. This end-to-end setup leverages both local (node-level) and global (patient-level) network information.
- adjacency_matrix¶
Adjacency matrix of the feature-level network; index/columns are feature names.
- Type:
pd.DataFrame
- omics_list¶
List of omics data matrices or a single merged omics DataFrame (samples x features).
- Type:
List[pd.DataFrame] | pd.DataFrame
- phenotype_data¶
Phenotype labels used for supervision.
- Type:
pd.DataFrame | pd.Series
- clinical_data¶
Optional clinical covariates (samples x clinical features); may be None.
- Type:
Optional[pd.DataFrame]
Hidden dimension size of GNN layers.
- Type:
- dim_reduction¶
Dimensionality reduction strategy for omics input (e.g., “ae” for autoencoder).
- Type:
Hidden dimension of the first fully connected layer in the downstream classifier.
- Type:
Hidden dimension of the second fully connected layer in the downstream classifier.
- Type:
- repeat_num¶
Number of repeated training runs (for repeated train/test splits or repeated CV).
- Type:
- seed_trials¶
If True, use a fixed seed for hyperparameter sampling to ensure reproducibility across trials.
- Type:
- output_dir¶
Directory where logs, checkpoints, and results are written.
- Type:
Path
- run() Tuple[pd.DataFrame, object, torch.Tensor | None][source]¶
Execute the DPMON pipeline.
This method aligns the graph and omics features, optionally performs hyperparameter tuning, and then trains and evaluates the chosen GNN model using either K-fold cross-validation (cv=True) or repeated train/test splits (cv=False). It returns prediction outputs, a metrics/config object, and optionally the learned embeddings.
- Returns:
- A tuple (predictions_df, metrics, embeddings) where:
predictions_df (pd.DataFrame): If cv=False, per-sample predictions with actual vs predicted labels; if cv=True, aggregated CV performance or fold-level results depending on the backend metrics (object): Dictionary or configuration object containing evaluation metrics and, when tuning is enabled, information about the selected hyperparameters. embeddings (torch.Tensor | None): Learned embedding tensor (e.g., node or sample embeddings) if produced by the training routine, otherwise None.
- Return type:
Tuple[pd.DataFrame, object, torch.Tensor | None]
- class bioneuralnet.downstream_task.dpmon.DownstreamTaskNN(*args: Any, **kwargs: Any)[source]¶
Bases:
ModuleMLP for final prediction - outputs raw logits.
- class bioneuralnet.downstream_task.dpmon.MLPProjection(*args: Any, **kwargs: Any)[source]¶
Bases:
Module
- class bioneuralnet.downstream_task.dpmon.NeuralNetwork(*args: Any, **kwargs: Any)[source]¶
Bases:
ModuleCore DPMON model combining GNN feature weighting and sample-level prediction. When using GAT with heads > 1, the GNN output is hidden_dim * heads.
- class bioneuralnet.downstream_task.dpmon.ScalarProjection(*args: Any, **kwargs: Any)[source]¶
Bases:
Module
- bioneuralnet.downstream_task.dpmon.prepare_node_features(adjacency_matrix: DataFrame, omics_datasets: List[DataFrame], clinical_data: DataFrame | None, phenotype_col: str, correlation_mode: str = 'abs_pearson') List[torch_geometric.data.Data][source]¶
Build node-level features and return a PyTorch Geometric graph.
- Parameters:
adjacency_matrix – Symmetric adjacency matrix (node names as index/columns).
omics_datasets – List of omics matrices (samples x features); first element used.
clinical_data – Clinical covariates for correlation-based node features; may be None.
phenotype_col – Column name storing phenotype labels (dropped from features).
correlation_mode – How to compute node features from clinical correlations. - “abs_pearson”: Absolute Pearson correlation, no transforms = DPMON. - “adaptive”: Mixed correlation types + Fisher-Z + standardization.
- Returns:
Single-element list with a PyG Data object.
- Return type:
List[Data]
- bioneuralnet.downstream_task.dpmon.run_hyperparameter_tuning(X_train, y_train, adjacency_matrix, clinical_data, dpmon_params) Dict[str, Any][source]¶
Run Ray Tune hyperparameter search with inner k-fold CV.
Each trial trains one model per inner fold, epoch-synchronised, and reports the mean validation metrics. Asha early-stops on the averaged signal, which is far more stable than a single split.
- Parameters:
X_train – Training features for this outer fold (pd.DataFrame).
y_train – Training labels for this outer fold (pd.Series).
adjacency_matrix – Feature-level adjacency matrix.
clinical_data – Clinical covariates for the training fold.
dpmon_params – Full DPMON parameter dictionary.
- Returns:
Dict with the best hyperparameter configuration.
- bioneuralnet.downstream_task.dpmon.run_standard_training(dpmon_params, adjacency_matrix, combined_omics, clinical_data, seed, cv=False, output_dir=None)[source]¶