bioneuralnet.downstream_task¶
Downstream task pipelines for BioNeuralNet.
This module implements high-level workflows for analyzing patient data using network-derived insights. It includes DPMON (Disease Prediction using Multi-Omics Networks), an end-to-end pipeline that leverages GNNs (GCN, GAT, SAGE, GIN) to learn feature importance weights for supervised phenotype prediction. Additionally, it provides SubjectRepresentation, a class for fusing learned network embeddings with raw omics data via dimensionality reduction (AutoEncoder or PCA) to generate enriched patient profiles.
Classes
|
DPMON (Disease Prediction using Multi-Omics Networks) end-to-end pipeline for multi-omics disease prediction. |
|
SubjectRepresentation Class for Integrating Network Embeddings into Omics Data. |
- class bioneuralnet.downstream_task.DPMON(adjacency_matrix: DataFrame, omics_list: List[DataFrame], phenotype_data: DataFrame, clinical_data: DataFrame | None = None, correlation_mode: str = 'abs_pearson', model: str = 'GAT', phenotype_col: str = 'phenotype', gnn_hidden_dim: int = 16, gnn_layer_num: int = 4, gnn_dropout: float = 0.1, gnn_activation: str = 'relu', dim_reduction: str = 'ae', ae_architecture: str = 'original', ae_encoding_dim: int = 8, nn_hidden_dim1: int = 16, nn_hidden_dim2: int = 8, num_epochs: int = 100, repeat_num: int = 1, n_folds: int = 5, lr: float = 0.1, weight_decay: float = 0.0001, gat_heads: int = 1, tune: bool = False, tune_trials: int = 20, gpu: bool = False, cv: bool = False, cuda: int = 0, seed: int = 1804, seed_trials: bool = False, output_dir: str | None = None)[source]¶
Bases:
objectDPMON (Disease Prediction using Multi-Omics Networks) end-to-end pipeline for multi-omics disease prediction.
Instead of node-level MSE regression, DPMON aggregates node embeddings with patient-level omics data and feeds them to a downstream classification head (e.g., a softmax layer with CrossEntropyLoss) for sample-level disease prediction. This end-to-end setup leverages both local (node-level) and global (patient-level) network information.
- adjacency_matrix¶
Adjacency matrix of the feature-level network; index/columns are feature names.
- Type:
pd.DataFrame
- omics_list¶
List of omics data matrices or a single merged omics DataFrame (samples x features).
- Type:
List[pd.DataFrame] | pd.DataFrame
- phenotype_data¶
Phenotype labels used for supervision.
- Type:
pd.DataFrame | pd.Series
- clinical_data¶
Optional clinical covariates (samples x clinical features); may be None.
- Type:
Optional[pd.DataFrame]
Hidden dimension size of GNN layers.
- Type:
- dim_reduction¶
Dimensionality reduction strategy for omics input (e.g., “ae” for autoencoder).
- Type:
Hidden dimension of the first fully connected layer in the downstream classifier.
- Type:
Hidden dimension of the second fully connected layer in the downstream classifier.
- Type:
- repeat_num¶
Number of repeated training runs (for repeated train/test splits or repeated CV).
- Type:
- seed_trials¶
If True, use a fixed seed for hyperparameter sampling to ensure reproducibility across trials.
- Type:
- output_dir¶
Directory where logs, checkpoints, and results are written.
- Type:
Path
- run() Tuple[pd.DataFrame, object, torch.Tensor | None][source]¶
Execute the DPMON pipeline.
This method aligns the graph and omics features, optionally performs hyperparameter tuning, and then trains and evaluates the chosen GNN model using either K-fold cross-validation (cv=True) or repeated train/test splits (cv=False). It returns prediction outputs, a metrics/config object, and optionally the learned embeddings.
- Returns:
- A tuple (predictions_df, metrics, embeddings) where:
predictions_df (pd.DataFrame): If cv=False, per-sample predictions with actual vs predicted labels; if cv=True, aggregated CV performance or fold-level results depending on the backend metrics (object): Dictionary or configuration object containing evaluation metrics and, when tuning is enabled, information about the selected hyperparameters. embeddings (torch.Tensor | None): Learned embedding tensor (e.g., node or sample embeddings) if produced by the training routine, otherwise None.
- Return type:
Tuple[pd.DataFrame, object, torch.Tensor | None]
- class bioneuralnet.downstream_task.SubjectRepresentation(omics_data: DataFrame, embeddings: DataFrame, phenotype_data: DataFrame | None = None, phenotype_col: str = 'phenotype', reduce_method: str = 'AE', seed: int | None = None, tune: bool | None = False, output_dir: str | Path | None = None)[source]¶
Bases:
objectSubjectRepresentation Class for Integrating Network Embeddings into Omics Data.
This class integrates network-derived embeddings with raw omics data to create enriched subject-level profiles. It supports dimensionality reduction of embeddings (via Autoencoders or other methods) and subsequent fusion with original omics features.
- omics_data¶
DataFrame of omics features (columns).
- Type:
pd.DataFrame
- embeddings¶
DataFrame with embeddings (indexed by feature names).
- Type:
pd.DataFrame
- phenotype_data¶
Optional DataFrame with phenotype labels.
- Type:
Optional[pd.DataFrame]
- output_dir¶
Directory where results are written.
- Type:
Path
Modules
DPMON: Optimized Network Embedding and Fusion for Disease Prediction. |
|