bioneuralnet.downstream_task.dpmon

DPMON: Optimized Network Embedding and Fusion for Disease Prediction.

This module implements an end-to-end Graph Neural Network (GNN) pipeline integrating network topology with subject-level omics data.

References

Hussein, S. et al. (2024), “Learning from Multi-Omics Networks to Enhance Disease Prediction: An Optimized Network Embedding and Fusion Approach” IEEE BIBM.

Algorithm:

The pipeline consists of three distinct phases:

Phase 1: Task-Aware Embedding Generation
  1. Construct a multi-omics network.

  2. Initialize node features using clinical correlation vectors.

  3. Pass graph through a GNN (GAT/GCN/GIN).

Phase 2: Dimensionality Reduction

Compress embeddings into scalar weights via AutoEncoder/MLP.

Phase 3: Fusion and Prediction

Fuse embeddings with subject-level data via element-wise multiplication (Feature Reweighting).

Notes

The embedding space is optimized dynamically using the loss function:

\[L_{total} = L_{classification} + \lambda L_{regularization}\]

The fusion acts as a Network-Guided Attention Mechanism, amplifying features that are topologically central.

Functions

average_precision_score(y_true, y_score, *)

Compute average precision (AP) from prediction scores.

f1_score(y_true, y_pred, *[, labels, ...])

Compute the F1 score, also known as balanced F-score or F-measure.

get_logger(name)

Retrieves a global logger configured to write to 'bioneuralnet.log'.

label_binarize(y, *, classes[, neg_label, ...])

Binarize labels in a one-vs-all fashion.

matthews_corrcoef(y_true, y_pred, *[, ...])

Compute the Matthews correlation coefficient (MCC).

pointbiserialr(x, y)

Calculate a point biserial correlation coefficient and its p-value.

precision_score(y_true, y_pred, *[, labels, ...])

Compute the precision.

prepare_node_features(adjacency_matrix, ...)

Build node-level features and return a PyTorch Geometric graph.

recall_score(y_true, y_pred, *[, labels, ...])

Compute the recall.

roc_auc_score(y_true, y_score, *[, average, ...])

Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.

run_hyperparameter_tuning(X_train, y_train, ...)

Run Ray Tune hyperparameter search with inner k-fold CV.

run_standard_training(dpmon_params, ...[, ...])

set_seed(seed_value)

Sets seeds for maximum reproducibility across Python, NumPy, and PyTorch.

setup_device(gpu, cuda)

slice_omics_datasets(omics_dataset, ...[, ...])

train_model(model, criterion, optimizer, ...)

train_test_split(*arrays[, test_size, ...])

Split arrays or matrices into random train and test subsets.

Classes

ASHAScheduler

alias of AsyncHyperBandScheduler

AutoEncoder(*args, **kwargs)

Compresses high-dimensional node embeddings into a lower-dimensional latent space.

BasicVariantGenerator([points_to_evaluate, ...])

Uses Tune's variant generation for resolving variables.

CLIReporter(*[, metric_columns, ...])

Command-line reporter

Checkpoint(path[, filesystem])

A reference to data persisted as a directory in local or remote storage.

DPMON(adjacency_matrix, omics_list, ...[, ...])

DPMON (Disease Prediction using Multi-Omics Networks) end-to-end pipeline for multi-omics disease prediction.

DownstreamTaskNN(*args, **kwargs)

MLP for final prediction - outputs raw logits.

GAT(*args, **kwargs)

Graph Attention Network - uses edge_dim=1 to incorporate edge weights.

GCN(*args, **kwargs)

Graph Convolutional Network

GIN(*args, **kwargs)

Graph Isomorphism Network - uses GINEConv for edge-weight awareness.

MLPProjection(*args, **kwargs)

NeuralNetwork(*args, **kwargs)

Core DPMON model combining GNN feature weighting and sample-level prediction.

Path(*args, **kwargs)

PurePath subclass that can make system calls.

RepeatedStratifiedKFold(*[, n_splits, ...])

Repeated class-wise stratified K-Fold cross validator.

SAGE(*args, **kwargs)

GraphSAGE - aligned layer_num convention.

ScalarProjection(*args, **kwargs)

StratifiedKFold([n_splits, shuffle, ...])

Class-wise stratified K-Fold cross-validator.

TrialPlateauStopper(metric[, std, ...])

Early stop single trials when they reached a plateau.

Exceptions

TuneError

General error class raised by ray.tune.

class bioneuralnet.downstream_task.dpmon.AutoEncoder(*args: Any, **kwargs: Any)[source]

Bases: Module

Compresses high-dimensional node embeddings into a lower-dimensional latent space.

Parameters:
  • input_dim – Input feature dimension (gnn_hidden_dim).

  • encoding_dim – Output latent dimension.

  • architecture – original or dynamic. “original” (input -> 8 -> 4 encoding_dim). “dynamic” (input -> input//2 -> encoding_dim).

forward(x)[source]
class bioneuralnet.downstream_task.dpmon.DPMON(adjacency_matrix: DataFrame, omics_list: List[DataFrame], phenotype_data: DataFrame, clinical_data: DataFrame | None = None, correlation_mode: str = 'abs_pearson', model: str = 'GAT', phenotype_col: str = 'phenotype', gnn_hidden_dim: int = 16, gnn_layer_num: int = 4, gnn_dropout: float = 0.1, gnn_activation: str = 'relu', dim_reduction: str = 'ae', ae_architecture: str = 'original', ae_encoding_dim: int = 8, nn_hidden_dim1: int = 16, nn_hidden_dim2: int = 8, num_epochs: int = 100, repeat_num: int = 1, n_folds: int = 5, lr: float = 0.1, weight_decay: float = 0.0001, gat_heads: int = 1, tune: bool = False, tune_trials: int = 20, gpu: bool = False, cv: bool = False, cuda: int = 0, seed: int = 1804, seed_trials: bool = False, output_dir: str | None = None)[source]

Bases: object

DPMON (Disease Prediction using Multi-Omics Networks) end-to-end pipeline for multi-omics disease prediction.

Instead of node-level MSE regression, DPMON aggregates node embeddings with patient-level omics data and feeds them to a downstream classification head (e.g., a softmax layer with CrossEntropyLoss) for sample-level disease prediction. This end-to-end setup leverages both local (node-level) and global (patient-level) network information.

adjacency_matrix

Adjacency matrix of the feature-level network; index/columns are feature names.

Type:

pd.DataFrame

omics_list

List of omics data matrices or a single merged omics DataFrame (samples x features).

Type:

List[pd.DataFrame] | pd.DataFrame

phenotype_data

Phenotype labels used for supervision.

Type:

pd.DataFrame | pd.Series

clinical_data

Optional clinical covariates (samples x clinical features); may be None.

Type:

Optional[pd.DataFrame]

phenotype_col

Column name in phenotype_data that stores the target labels.

Type:

str

model

GNN backbone; one of {“GCN”, “GAT”, “SAGE”, “GIN”}.

Type:

str

gnn_hidden_dim

Hidden dimension size of GNN layers.

Type:

int

gnn_layer_num

Number of stacked GNN layers.

Type:

int

gnn_dropout

Dropout rate applied within the GNN.

Type:

float

gnn_activation

Non-linear activation used in GNN layers (e.g., “relu”).

Type:

str

dim_reduction

Dimensionality reduction strategy for omics input (e.g., “ae” for autoencoder).

Type:

str

ae_encoding_dim

Encoding dimension of the autoencoder bottleneck if dim_reduction=”ae”.

Type:

int

nn_hidden_dim1

Hidden dimension of the first fully connected layer in the downstream classifier.

Type:

int

nn_hidden_dim2

Hidden dimension of the second fully connected layer in the downstream classifier.

Type:

int

num_epochs

Number of training epochs per run.

Type:

int

repeat_num

Number of repeated training runs (for repeated train/test splits or repeated CV).

Type:

int

n_folds

Number of folds to use when cv=True.

Type:

int

lr

Learning rate for the optimizer.

Type:

float

weight_decay

L2 weight decay (regularization) coefficient.

Type:

float

tune

If True, perform hyperparameter tuning before final training.

Type:

bool

tune_trials

Number of trials to perform if tune=True.

Type:

int

gpu

If True, use GPU if available.

Type:

bool

cv

If True, use K-fold cross-validation; otherwise use repeated train/test splits.

Type:

bool

cuda

CUDA device index to use when gpu=True.

Type:

int

seed

Random seed for reproducibility.

Type:

int

seed_trials

If True, use a fixed seed for hyperparameter sampling to ensure reproducibility across trials.

Type:

bool

output_dir

Directory where logs, checkpoints, and results are written.

Type:

Path

run() Tuple[pd.DataFrame, object, torch.Tensor | None][source]

Execute the DPMON pipeline.

This method aligns the graph and omics features, optionally performs hyperparameter tuning, and then trains and evaluates the chosen GNN model using either K-fold cross-validation (cv=True) or repeated train/test splits (cv=False). It returns prediction outputs, a metrics/config object, and optionally the learned embeddings.

Returns:

A tuple (predictions_df, metrics, embeddings) where:

predictions_df (pd.DataFrame): If cv=False, per-sample predictions with actual vs predicted labels; if cv=True, aggregated CV performance or fold-level results depending on the backend metrics (object): Dictionary or configuration object containing evaluation metrics and, when tuning is enabled, information about the selected hyperparameters. embeddings (torch.Tensor | None): Learned embedding tensor (e.g., node or sample embeddings) if produced by the training routine, otherwise None.

Return type:

Tuple[pd.DataFrame, object, torch.Tensor | None]

class bioneuralnet.downstream_task.dpmon.DownstreamTaskNN(*args: Any, **kwargs: Any)[source]

Bases: Module

MLP for final prediction - outputs raw logits.

forward(x)[source]
class bioneuralnet.downstream_task.dpmon.MLPProjection(*args: Any, **kwargs: Any)[source]

Bases: Module

forward(x)[source]
class bioneuralnet.downstream_task.dpmon.NeuralNetwork(*args: Any, **kwargs: Any)[source]

Bases: Module

Core DPMON model combining GNN feature weighting and sample-level prediction. When using GAT with heads > 1, the GNN output is hidden_dim * heads.

forward(omics_dataset, omics_network_tg, clinical_tensor=None)[source]
class bioneuralnet.downstream_task.dpmon.ScalarProjection(*args: Any, **kwargs: Any)[source]

Bases: Module

forward(x)[source]
bioneuralnet.downstream_task.dpmon.prepare_node_features(adjacency_matrix: DataFrame, omics_datasets: List[DataFrame], clinical_data: DataFrame | None, phenotype_col: str, correlation_mode: str = 'abs_pearson') List[torch_geometric.data.Data][source]

Build node-level features and return a PyTorch Geometric graph.

Parameters:
  • adjacency_matrix – Symmetric adjacency matrix (node names as index/columns).

  • omics_datasets – List of omics matrices (samples x features); first element used.

  • clinical_data – Clinical covariates for correlation-based node features; may be None.

  • phenotype_col – Column name storing phenotype labels (dropped from features).

  • correlation_mode – How to compute node features from clinical correlations. - “abs_pearson”: Absolute Pearson correlation, no transforms = DPMON. - “adaptive”: Mixed correlation types + Fisher-Z + standardization.

Returns:

Single-element list with a PyG Data object.

Return type:

List[Data]

bioneuralnet.downstream_task.dpmon.run_hyperparameter_tuning(X_train, y_train, adjacency_matrix, clinical_data, dpmon_params) Dict[str, Any][source]

Run Ray Tune hyperparameter search with inner k-fold CV.

Each trial trains one model per inner fold, epoch-synchronised, and reports the mean validation metrics. Asha early-stops on the averaged signal, which is far more stable than a single split.

Parameters:
  • X_train – Training features for this outer fold (pd.DataFrame).

  • y_train – Training labels for this outer fold (pd.Series).

  • adjacency_matrix – Feature-level adjacency matrix.

  • clinical_data – Clinical covariates for the training fold.

  • dpmon_params – Full DPMON parameter dictionary.

Returns:

Dict with the best hyperparameter configuration.

bioneuralnet.downstream_task.dpmon.run_standard_training(dpmon_params, adjacency_matrix, combined_omics, clinical_data, seed, cv=False, output_dir=None)[source]
bioneuralnet.downstream_task.dpmon.setup_device(gpu, cuda)[source]
bioneuralnet.downstream_task.dpmon.slice_omics_datasets(omics_dataset: DataFrame, adjacency_matrix: DataFrame, phenotype_col: str = 'phenotype') List[DataFrame][source]
bioneuralnet.downstream_task.dpmon.train_model(model, criterion, optimizer, train_features, train_labels, epoch_num)[source]