bioneuralnet.network.pysmccnet.analysis¶
Functions
|
PyTorch version of data_preprocess for omics dataset preparation. |
|
Form flat clusters from the hierarchical clustering defined by the given linkage matrix. |
|
PyTorch equivalent of get_abar performing matrix multiplication on GPU. |
|
PyTorch version of get_can_cor_multi calculating canonical correlation value on GPU. |
|
Extract omics modules via hierarchical clustering on the similarity matrix. |
|
Perform hierarchical/agglomerative clustering. |
|
Pearson correlation coefficient and p-value for testing non-correlation. |
|
Prune network modules to target size range and compute summarization scores. |
|
Computes element-wise multiplication and sum with vector recycling. |
|
Convert a vector-form distance vector to a square-form distance matrix, and vice-versa. |
|
NetSHy network summarization via hybrid approach leveraging topological properties. |
Classes
|
Principal component analysis (PCA). |
|
Standardize features by removing the mean and scaling to unit variance. |
|
defaultdict(default_factory=None, /, [...]) --> dict with default factory |
- bioneuralnet.network.pysmccnet.analysis.data_preprocess(X: DataFrame | ndarray, covariates: DataFrame | ndarray | None = None, is_cv: bool = False, cv_quantile: float = 0.0, center: bool = True, scale: bool = True, device: torch.device | None = None, dtype: torch.dtype = torch.float64) DataFrame[source]¶
PyTorch version of data_preprocess for omics dataset preparation.
- Parameters:
X (pd.DataFrame | np.ndarray) – Input omics data matrix.
covariates (pd.DataFrame | np.ndarray | None) – Optional covariates to regress out.
is_cv (bool) – If True, filter features based on coefficient of variation.
cv_quantile (float) – Quantile threshold for CV filtering; required if is_cv is True.
center (bool) – If True, center columns to mean zero.
scale (bool) – If True, scale columns to unit variance.
device (torch.device | None) – Calculation device; defaults to GPU if available.
dtype (torch.dtype) – Data type for tensor computations.
- Returns:
Preprocessed data frame.
- Return type:
pd.DataFrame
- bioneuralnet.network.pysmccnet.analysis.get_abar(ws: DataFrame | ndarray | torch.Tensor | List[float], feature_label: List[str] | None = None) DataFrame[source]¶
PyTorch equivalent of get_abar performing matrix multiplication on GPU.
- Parameters:
ws (pd.DataFrame | np.ndarray | torch.Tensor | List[float]) – Weight matrix or vector.
feature_label (List[str] | None) – List of feature names for the output DataFrame.
- Returns:
Adjacency matrix (A-bar) representing feature similarity.
- Return type:
pd.DataFrame
- bioneuralnet.network.pysmccnet.analysis.get_can_cor_multi(X: List[torch.Tensor], cc_coef: ndarray | torch.Tensor | List[float], cc_weight: List[torch.Tensor | ndarray], Y: torch.Tensor | ndarray) float[source]¶
PyTorch version of get_can_cor_multi calculating canonical correlation value on GPU.
- Parameters:
X (List[torch.Tensor]) – List of data matrices.
cc_coef (np.ndarray | torch.Tensor | List[float]) – Correlation coefficients / weights.
cc_weight (List[torch.Tensor | np.ndarray]) – List of weight vectors for projection.
Y (torch.Tensor | np.ndarray) – Phenotype data vector.
- Returns:
Total canonical correlation (between-omics + omics-phenotype).
- Return type:
- bioneuralnet.network.pysmccnet.analysis.get_omics_modules(Abar: DataFrame, cut_height: float = 0.9999999999) List[List[int]][source]¶
Extract omics modules via hierarchical clustering on the similarity matrix.
- bioneuralnet.network.pysmccnet.analysis.prune_modules(Abar: DataFrame, X_combined: ndarray, Y: ndarray, modules: List[List[int]], feature_labels: List[str], min_size: int = 10, max_size: int = 100, summarization: str = 'NetSHy', saving_dir: str = '.') List[dict][source]¶
Prune network modules to target size range and compute summarization scores.
For each module from hierarchical clustering, iteratively removes the lowest-degree node until the module fits within [min_size, max_size]. Then computes NetSHy summarization scores and per-feature phenotype correlations.
- Parameters:
Abar (pd.DataFrame) – Global adjacency matrix.
X_combined (np.ndarray) – Column-bound omics data of shape (n_samples, n_total_features).
Y (np.ndarray) – Phenotype vector of shape (n_samples,).
modules (List[List[int]]) – Feature index groups from get_omics_modules.
feature_labels (List[str]) – Feature names matching columns of X_combined.
min_size (int) – Minimum module size to retain.
max_size (int) – Maximum module size; larger modules are pruned down.
summarization (str) – Summarization method. Currently only ‘NetSHy’ is supported.
saving_dir (str) – Directory to save per-module pickle files.
- Returns:
One dict per valid module with keys: module_id, nodes, node_indices, adjacency, correlation, pc_correlations, netshy, omics_correlation.
- Return type:
List[dict]
- bioneuralnet.network.pysmccnet.analysis.summarize_netshy(X: DataFrame | ndarray, A: DataFrame | ndarray, npc: int = 1) dict[source]¶
NetSHy network summarization via hybrid approach leveraging topological properties.
Summarizes a subnetwork by projecting omics data through the graph Laplacian, then extracting principal components from the projected space.
Source: summarizeNetSHy (Vu et al., Bioinformatics 2023)
- Parameters:
X (pd.DataFrame | np.ndarray) – Data matrix of shape (n_samples, n_features).
A (pd.DataFrame | np.ndarray) – Adjacency matrix of shape (n_features, n_features).
npc (int) – Number of principal components for summarization.
- Returns:
Keys are ‘scores’ (n, npc), ‘importance’ (sdev, variance_pct, cumulative_pct), and ‘loadings’ (n_features, npc).
- Return type: