bioneuralnet.downstream_task¶

Downstream task pipelines for BioNeuralNet.

This module implements high-level workflows for analyzing patient data using network-derived insights. It includes DPMON (Disease Prediction using Multi-Omics Networks), an end-to-end pipeline that leverages GNNs (GCN, GAT, SAGE, GIN) to learn feature importance weights for supervised phenotype prediction. Additionally, it provides SubjectRepresentation, a class for fusing learned network embeddings with raw omics data via dimensionality reduction (AutoEncoder or PCA) to generate enriched patient profiles.

Classes

`DPMON`(adjacency_matrix, omics_list, ...[, ...])	DPMON (Disease Prediction using Multi-Omics Networks) end-to-end pipeline for multi-omics disease prediction.
`SubjectRepresentation`(omics_data, embeddings)	SubjectRepresentation Class for Integrating Network Embeddings into Omics Data.

class bioneuralnet.downstream_task.DPMON(adjacency_matrix: DataFrame, omics_list: List[DataFrame], phenotype_data: DataFrame, clinical_data: DataFrame | None = None, correlation_mode: str = 'abs_pearson', model: str = 'GAT', phenotype_col: str = 'phenotype', gnn_hidden_dim: int = 16, gnn_layer_num: int = 4, gnn_dropout: float = 0.1, gnn_activation: str = 'relu', dim_reduction: str = 'ae', ae_architecture: str = 'original', ae_encoding_dim: int = 8, nn_hidden_dim1: int = 16, nn_hidden_dim2: int = 8, num_epochs: int = 100, repeat_num: int = 1, n_folds: int = 5, lr: float = 0.1, weight_decay: float = 0.0001, gat_heads: int = 1, tune: bool = False, tune_trials: int = 20, gpu: bool = False, cv: bool = False, cuda: int = 0, seed: int = 1804, seed_trials: bool = False, output_dir: str | None = None)[source]¶

Bases: object

DPMON (Disease Prediction using Multi-Omics Networks) end-to-end pipeline for multi-omics disease prediction.

Instead of node-level MSE regression, DPMON aggregates node embeddings with patient-level omics data and feeds them to a downstream classification head (e.g., a softmax layer with CrossEntropyLoss) for sample-level disease prediction. This end-to-end setup leverages both local (node-level) and global (patient-level) network information.

adjacency_matrix¶

Adjacency matrix of the feature-level network; index/columns are feature names.

Type:: pd.DataFrame

omics_list¶

List of omics data matrices or a single merged omics DataFrame (samples x features).

Type:: List[pd.DataFrame] | pd.DataFrame

phenotype_data¶

Phenotype labels used for supervision.

Type:: pd.DataFrame | pd.Series

clinical_data¶

Optional clinical covariates (samples x clinical features); may be None.

Type:: Optional[pd.DataFrame]

phenotype_col¶

Column name in phenotype_data that stores the target labels.

Type:: str

model¶

GNN backbone; one of {“GCN”, “GAT”, “SAGE”, “GIN”}.

Type:: str

gnn_hidden_dim¶

Hidden dimension size of GNN layers.

Type:: int

gnn_layer_num¶

Number of stacked GNN layers.

Type:: int

gnn_dropout¶

Dropout rate applied within the GNN.

Type:: float

gnn_activation¶

Non-linear activation used in GNN layers (e.g., “relu”).

Type:: str

dim_reduction¶

Dimensionality reduction strategy for omics input (e.g., “ae” for autoencoder).

Type:: str

ae_encoding_dim¶

Encoding dimension of the autoencoder bottleneck if dim_reduction=”ae”.

Type:: int

nn_hidden_dim1¶

Hidden dimension of the first fully connected layer in the downstream classifier.

Type:: int

nn_hidden_dim2¶

Hidden dimension of the second fully connected layer in the downstream classifier.

Type:: int

num_epochs¶

Number of training epochs per run.

Type:: int

repeat_num¶

Number of repeated training runs (for repeated train/test splits or repeated CV).

Type:: int

n_folds¶

Number of folds to use when cv=True.

Type:: int

lr¶

Learning rate for the optimizer.

Type:: float

weight_decay¶

L2 weight decay (regularization) coefficient.

Type:: float

tune¶

If True, perform hyperparameter tuning before final training.

Type:: bool

tune_trials¶

Number of trials to perform if tune=True.

Type:: int

gpu¶

If True, use GPU if available.

Type:: bool

cv¶

If True, use K-fold cross-validation; otherwise use repeated train/test splits.

Type:: bool

cuda¶

CUDA device index to use when gpu=True.

Type:: int

seed¶

Random seed for reproducibility.

Type:: int

seed_trials¶

If True, use a fixed seed for hyperparameter sampling to ensure reproducibility across trials.

Type:: bool

output_dir¶

Directory where logs, checkpoints, and results are written.

Type:: Path

run() → Tuple[pd.DataFrame, object, torch.Tensor | None][source]¶

Execute the DPMON pipeline.

This method aligns the graph and omics features, optionally performs hyperparameter tuning, and then trains and evaluates the chosen GNN model using either K-fold cross-validation (cv=True) or repeated train/test splits (cv=False). It returns prediction outputs, a metrics/config object, and optionally the learned embeddings.

Returns:

A tuple (predictions_df, metrics, embeddings) where:: predictions_df (pd.DataFrame): If cv=False, per-sample predictions with actual vs predicted labels; if cv=True, aggregated CV performance or fold-level results depending on the backend metrics (object): Dictionary or configuration object containing evaluation metrics and, when tuning is enabled, information about the selected hyperparameters. embeddings (torch.Tensor | None): Learned embedding tensor (e.g., node or sample embeddings) if produced by the training routine, otherwise None.

Return type:

Tuple[pd.DataFrame, object, torch.Tensor | None]

class bioneuralnet.downstream_task.SubjectRepresentation(omics_data: DataFrame, embeddings: DataFrame, phenotype_data: DataFrame | None = None, phenotype_col: str = 'phenotype', reduce_method: str = 'AE', seed: int | None = None, tune: bool | None = False, output_dir: str | Path | None = None)[source]¶

Bases: object

SubjectRepresentation Class for Integrating Network Embeddings into Omics Data.

This class integrates network-derived embeddings with raw omics data to create enriched subject-level profiles. It supports dimensionality reduction of embeddings (via Autoencoders or other methods) and subsequent fusion with original omics features.

omics_data¶

DataFrame of omics features (columns).

Type:: pd.DataFrame

embeddings¶

DataFrame with embeddings (indexed by feature names).

Type:: pd.DataFrame

phenotype_data¶

Optional DataFrame with phenotype labels.

Type:: Optional[pd.DataFrame]

phenotype_col¶

Name of the phenotype column.

Type:: str

reduce_method¶

Method used for dimensionality reduction (e.g., “AE”).

Type:: str

seed¶

Random seed for reproducibility.

Type:: Optional[int]

tune¶

Whether to run hyperparameter tuning.

Type:: bool

output_dir¶

Directory where results are written.

Type:: Path

run() → DataFrame[source]¶

Executes the Subject Representation workflow.

If tuning is enabled, runs hyperparameter tuning and uses the best config to reduce embeddings. Otherwise, uses the default reduction method.

Returns:: Enhanced omics data as a DataFrame.
Return type:: pd.DataFrame

Modules

`dpmon`	DPMON: Optimized Network Embedding and Fusion for Disease Prediction.
`subject_representation`