bioneuralnet.downstream_task.dpmon¶

DPMON: Optimized Network Embedding and Fusion for Disease Prediction.

This module implements an end-to-end Graph Neural Network (GNN) pipeline integrating network topology with subject-level omics data.

References

Hussein, S. et al. (2024), “Learning from Multi-Omics Networks to Enhance Disease Prediction: An Optimized Network Embedding and Fusion Approach” IEEE BIBM.

Algorithm:

The pipeline consists of three distinct phases:

Phase 1: Task-Aware Embedding Generation

Construct a multi-omics network.
Initialize node features using clinical correlation vectors.
Pass graph through a GNN (GAT/GCN/GIN).

Phase 2: Dimensionality Reduction

Compress embeddings into scalar weights via AutoEncoder/MLP.

Phase 3: Fusion and Prediction

Fuse embeddings with subject-level data via element-wise multiplication (Feature Reweighting).

Notes

The embedding space is optimized dynamically using the loss function:

\[L_{total} = L_{classification} + \lambda L_{regularization}\]

The fusion acts as a Network-Guided Attention Mechanism, amplifying features that are topologically central.

Functions

`average_precision_score`(y_true, y_score, *)	Compute average precision (AP) from prediction scores.
`f1_score`(y_true, y_pred, *[, labels, ...])	Compute the F1 score, also known as balanced F-score or F-measure.
`get_logger`(name)	Retrieves a global logger configured to write to 'bioneuralnet.log'.
`label_binarize`(y, *, classes[, neg_label, ...])	Binarize labels in a one-vs-all fashion.
`matthews_corrcoef`(y_true, y_pred, *[, ...])	Compute the Matthews correlation coefficient (MCC).
`pointbiserialr`(x, y)	Calculate a point biserial correlation coefficient and its p-value.
`precision_score`(y_true, y_pred, *[, labels, ...])	Compute the precision.
`prepare_node_features`(adjacency_matrix, ...)	Build node-level features and return a PyTorch Geometric graph.
`recall_score`(y_true, y_pred, *[, labels, ...])	Compute the recall.
`roc_auc_score`(y_true, y_score, *[, average, ...])	Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.
`run_hyperparameter_tuning`(X_train, y_train, ...)	Run Ray Tune hyperparameter search with inner k-fold CV.
`run_standard_training`(dpmon_params, ...[, ...])
`set_seed`(seed_value)	Sets seeds for maximum reproducibility across Python, NumPy, and PyTorch.
`setup_device`(gpu, cuda)
`slice_omics_datasets`(omics_dataset, ...[, ...])
`train_model`(model, criterion, optimizer, ...)
`train_test_split`(*arrays[, test_size, ...])	Split arrays or matrices into random train and test subsets.

Classes

`ASHAScheduler`	alias of `AsyncHyperBandScheduler`
`AutoEncoder`(args, *kwargs)	Compresses high-dimensional node embeddings into a lower-dimensional latent space.
`BasicVariantGenerator`([points_to_evaluate, ...])	Uses Tune's variant generation for resolving variables.
`CLIReporter`(*[, metric_columns, ...])	Command-line reporter
`Checkpoint`(path[, filesystem])	A reference to data persisted as a directory in local or remote storage.
`DPMON`(adjacency_matrix, omics_list, ...[, ...])	DPMON (Disease Prediction using Multi-Omics Networks) end-to-end pipeline for multi-omics disease prediction.
`DownstreamTaskNN`(args, *kwargs)	MLP for final prediction - outputs raw logits.
`GAT`(args, *kwargs)	Graph Attention Network - uses edge_dim=1 to incorporate edge weights.
`GCN`(args, *kwargs)	Graph Convolutional Network
`GIN`(args, *kwargs)	Graph Isomorphism Network - uses GINEConv for edge-weight awareness.
`MLPProjection`(args, *kwargs)
`NeuralNetwork`(args, *kwargs)	Core DPMON model combining GNN feature weighting and sample-level prediction.
`Path`(args, *kwargs)	PurePath subclass that can make system calls.
`RepeatedStratifiedKFold`(*[, n_splits, ...])	Repeated class-wise stratified K-Fold cross validator.
`SAGE`(args, *kwargs)	GraphSAGE - aligned layer_num convention.
`ScalarProjection`(args, *kwargs)
`StratifiedKFold`([n_splits, shuffle, ...])	Class-wise stratified K-Fold cross-validator.
`TrialPlateauStopper`(metric[, std, ...])	Early stop single trials when they reached a plateau.

Exceptions

TuneError

General error class raised by ray.tune.

class bioneuralnet.downstream_task.dpmon.AutoEncoder(*args: Any, **kwargs: Any)[source]¶

Bases: Module

Compresses high-dimensional node embeddings into a lower-dimensional latent space.

Parameters:

input_dim – Input feature dimension (gnn_hidden_dim).
encoding_dim – Output latent dimension.
architecture – original or dynamic. “original” (input -> 8 -> 4 encoding_dim). “dynamic” (input -> input//2 -> encoding_dim).

forward(x)[source]¶

class bioneuralnet.downstream_task.dpmon.DPMON(adjacency_matrix: DataFrame, omics_list: List[DataFrame], phenotype_data: DataFrame, clinical_data: DataFrame | None = None, correlation_mode: str = 'abs_pearson', model: str = 'GAT', phenotype_col: str = 'phenotype', gnn_hidden_dim: int = 16, gnn_layer_num: int = 4, gnn_dropout: float = 0.1, gnn_activation: str = 'relu', dim_reduction: str = 'ae', ae_architecture: str = 'original', ae_encoding_dim: int = 8, nn_hidden_dim1: int = 16, nn_hidden_dim2: int = 8, num_epochs: int = 100, repeat_num: int = 1, n_folds: int = 5, lr: float = 0.1, weight_decay: float = 0.0001, gat_heads: int = 1, tune: bool = False, tune_trials: int = 20, gpu: bool = False, cv: bool = False, cuda: int = 0, seed: int = 1804, seed_trials: bool = False, output_dir: str | None = None)[source]¶

Bases: object

DPMON (Disease Prediction using Multi-Omics Networks) end-to-end pipeline for multi-omics disease prediction.

Instead of node-level MSE regression, DPMON aggregates node embeddings with patient-level omics data and feeds them to a downstream classification head (e.g., a softmax layer with CrossEntropyLoss) for sample-level disease prediction. This end-to-end setup leverages both local (node-level) and global (patient-level) network information.

adjacency_matrix¶

Adjacency matrix of the feature-level network; index/columns are feature names.

Type:: pd.DataFrame

omics_list¶

List of omics data matrices or a single merged omics DataFrame (samples x features).

Type:: List[pd.DataFrame] | pd.DataFrame

phenotype_data¶

Phenotype labels used for supervision.

Type:: pd.DataFrame | pd.Series

clinical_data¶

Optional clinical covariates (samples x clinical features); may be None.

Type:: Optional[pd.DataFrame]

phenotype_col¶

Column name in phenotype_data that stores the target labels.

Type:: str

model¶

GNN backbone; one of {“GCN”, “GAT”, “SAGE”, “GIN”}.

Type:: str

gnn_hidden_dim¶

Hidden dimension size of GNN layers.

Type:: int

gnn_layer_num¶

Number of stacked GNN layers.

Type:: int

gnn_dropout¶

Dropout rate applied within the GNN.

Type:: float

gnn_activation¶

Non-linear activation used in GNN layers (e.g., “relu”).

Type:: str

dim_reduction¶

Dimensionality reduction strategy for omics input (e.g., “ae” for autoencoder).

Type:: str

ae_encoding_dim¶

Encoding dimension of the autoencoder bottleneck if dim_reduction=”ae”.

Type:: int

nn_hidden_dim1¶

Hidden dimension of the first fully connected layer in the downstream classifier.

Type:: int

nn_hidden_dim2¶

Hidden dimension of the second fully connected layer in the downstream classifier.

Type:: int

num_epochs¶

Number of training epochs per run.

Type:: int

repeat_num¶

Number of repeated training runs (for repeated train/test splits or repeated CV).

Type:: int

n_folds¶

Number of folds to use when cv=True.

Type:: int

lr¶

Learning rate for the optimizer.

Type:: float

weight_decay¶

L2 weight decay (regularization) coefficient.

Type:: float

tune¶

If True, perform hyperparameter tuning before final training.

Type:: bool

tune_trials¶

Number of trials to perform if tune=True.

Type:: int

gpu¶

If True, use GPU if available.

Type:: bool

cv¶

If True, use K-fold cross-validation; otherwise use repeated train/test splits.

Type:: bool

cuda¶

CUDA device index to use when gpu=True.

Type:: int

seed¶

Random seed for reproducibility.

Type:: int

seed_trials¶

If True, use a fixed seed for hyperparameter sampling to ensure reproducibility across trials.

Type:: bool

output_dir¶

Directory where logs, checkpoints, and results are written.

Type:: Path

run() → Tuple[pd.DataFrame, object, torch.Tensor | None][source]¶

Execute the DPMON pipeline.

This method aligns the graph and omics features, optionally performs hyperparameter tuning, and then trains and evaluates the chosen GNN model using either K-fold cross-validation (cv=True) or repeated train/test splits (cv=False). It returns prediction outputs, a metrics/config object, and optionally the learned embeddings.

Returns:

A tuple (predictions_df, metrics, embeddings) where:: predictions_df (pd.DataFrame): If cv=False, per-sample predictions with actual vs predicted labels; if cv=True, aggregated CV performance or fold-level results depending on the backend metrics (object): Dictionary or configuration object containing evaluation metrics and, when tuning is enabled, information about the selected hyperparameters. embeddings (torch.Tensor | None): Learned embedding tensor (e.g., node or sample embeddings) if produced by the training routine, otherwise None.

Return type:

Tuple[pd.DataFrame, object, torch.Tensor | None]

class bioneuralnet.downstream_task.dpmon.DownstreamTaskNN(*args: Any, **kwargs: Any)[source]¶

Bases: Module

MLP for final prediction - outputs raw logits.

forward(x)[source]¶

class bioneuralnet.downstream_task.dpmon.MLPProjection(*args: Any, **kwargs: Any)[source]¶

Bases: Module

forward(x)[source]¶

class bioneuralnet.downstream_task.dpmon.NeuralNetwork(*args: Any, **kwargs: Any)[source]¶

Bases: Module

Core DPMON model combining GNN feature weighting and sample-level prediction. When using GAT with heads > 1, the GNN output is hidden_dim * heads.

forward(omics_dataset, omics_network_tg, clinical_tensor=None)[source]¶

class bioneuralnet.downstream_task.dpmon.ScalarProjection(*args: Any, **kwargs: Any)[source]¶

Bases: Module

forward(x)[source]¶

bioneuralnet.downstream_task.dpmon.prepare_node_features(adjacency_matrix: DataFrame, omics_datasets: List[DataFrame], clinical_data: DataFrame | None, phenotype_col: str, correlation_mode: str = 'abs_pearson') → List[torch_geometric.data.Data][source]¶

Build node-level features and return a PyTorch Geometric graph.

Parameters:

adjacency_matrix – Symmetric adjacency matrix (node names as index/columns).
omics_datasets – List of omics matrices (samples x features); first element used.
clinical_data – Clinical covariates for correlation-based node features; may be None.
phenotype_col – Column name storing phenotype labels (dropped from features).
correlation_mode – How to compute node features from clinical correlations. - “abs_pearson”: Absolute Pearson correlation, no transforms = DPMON. - “adaptive”: Mixed correlation types + Fisher-Z + standardization.

Returns:

Single-element list with a PyG Data object.

Return type:

List[Data]

bioneuralnet.downstream_task.dpmon.run_hyperparameter_tuning(X_train, y_train, adjacency_matrix, clinical_data, dpmon_params) → Dict[str, Any][source]¶

Run Ray Tune hyperparameter search with inner k-fold CV.

Each trial trains one model per inner fold, epoch-synchronised, and reports the mean validation metrics. Asha early-stops on the averaged signal, which is far more stable than a single split.

Parameters:

X_train – Training features for this outer fold (pd.DataFrame).
y_train – Training labels for this outer fold (pd.Series).
adjacency_matrix – Feature-level adjacency matrix.
clinical_data – Clinical covariates for the training fold.
dpmon_params – Full DPMON parameter dictionary.

Returns:

Dict with the best hyperparameter configuration.

bioneuralnet.downstream_task.dpmon.run_standard_training(dpmon_params, adjacency_matrix, combined_omics, clinical_data, seed, cv=False, output_dir=None)[source]¶

bioneuralnet.downstream_task.dpmon.setup_device(gpu, cuda)[source]¶

bioneuralnet.downstream_task.dpmon.slice_omics_datasets(omics_dataset: DataFrame, adjacency_matrix: DataFrame, phenotype_col: str = 'phenotype') → List[DataFrame][source]¶

bioneuralnet.downstream_task.dpmon.train_model(model, criterion, optimizer, train_features, train_labels, epoch_num)[source]¶