bioneuralnet.network.pysmccnet.analysis¶

Functions

`data_preprocess`(X[, covariates, is_cv, ...])	PyTorch version of data_preprocess for omics dataset preparation.
`fcluster`(Z, t[, criterion, depth, R, monocrit])	Form flat clusters from the hierarchical clustering defined by the given linkage matrix.
`get_abar`(ws[, feature_label])	PyTorch equivalent of get_abar performing matrix multiplication on GPU.
`get_can_cor_multi`(X, cc_coef, cc_weight, Y)	PyTorch version of get_can_cor_multi calculating canonical correlation value on GPU.
`get_omics_modules`(Abar[, cut_height])	Extract omics modules via hierarchical clustering on the similarity matrix.
`linkage`(y[, method, metric, optimal_ordering])	Perform hierarchical/agglomerative clustering.
`pearsonr`(x, y, *[, alternative, method, axis])	Pearson correlation coefficient and p-value for testing non-correlation.
`prune_modules`(Abar, X_combined, Y, modules, ...)	Prune network modules to target size range and compute summarization scores.
`r_vec_mult_sum`(v1, v2)	Computes element-wise multiplication and sum with vector recycling.
`squareform`(X[, force, checks])	Convert a vector-form distance vector to a square-form distance matrix, and vice-versa.
`summarize_netshy`(X, A[, npc])	NetSHy network summarization via hybrid approach leveraging topological properties.

Classes

`PCA`([n_components, copy, whiten, ...])	Principal component analysis (PCA).
`StandardScaler`(*[, copy, with_mean, with_std])	Standardize features by removing the mean and scaling to unit variance.
`defaultdict`	defaultdict(default_factory=None, /, [...]) --> dict with default factory

bioneuralnet.network.pysmccnet.analysis.data_preprocess(X: DataFrame | ndarray, covariates: DataFrame | ndarray | None = None, is_cv: bool = False, cv_quantile: float = 0.0, center: bool = True, scale: bool = True, device: torch.device | None = None, dtype: torch.dtype = torch.float64) → DataFrame[source]¶

PyTorch version of data_preprocess for omics dataset preparation.

Parameters:

X (pd.DataFrame | np.ndarray) – Input omics data matrix.
covariates (pd.DataFrame | np.ndarray | None) – Optional covariates to regress out.
is_cv (bool) – If True, filter features based on coefficient of variation.
cv_quantile (float) – Quantile threshold for CV filtering; required if is_cv is True.
center (bool) – If True, center columns to mean zero.
scale (bool) – If True, scale columns to unit variance.
device (torch.device | None) – Calculation device; defaults to GPU if available.
dtype (torch.dtype) – Data type for tensor computations.

Returns:

Preprocessed data frame.

Return type:

pd.DataFrame

bioneuralnet.network.pysmccnet.analysis.get_abar(ws: DataFrame | ndarray | torch.Tensor | List[float], feature_label: List[str] | None = None) → DataFrame[source]¶

PyTorch equivalent of get_abar performing matrix multiplication on GPU.

Parameters:

ws (pd.DataFrame | np.ndarray | torch.Tensor | List[float]) – Weight matrix or vector.
feature_label (List[str] | None) – List of feature names for the output DataFrame.

Returns:

Adjacency matrix (A-bar) representing feature similarity.

Return type:

pd.DataFrame

bioneuralnet.network.pysmccnet.analysis.get_can_cor_multi(X: List[torch.Tensor], cc_coef: ndarray | torch.Tensor | List[float], cc_weight: List[torch.Tensor | ndarray], Y: torch.Tensor | ndarray) → float[source]¶

PyTorch version of get_can_cor_multi calculating canonical correlation value on GPU.

Parameters:

X (List[torch.Tensor]) – List of data matrices.
cc_coef (np.ndarray | torch.Tensor | List[float]) – Correlation coefficients / weights.
cc_weight (List[torch.Tensor | np.ndarray]) – List of weight vectors for projection.
Y (torch.Tensor | np.ndarray) – Phenotype data vector.

Returns:

Total canonical correlation (between-omics + omics-phenotype).

Return type:

float

bioneuralnet.network.pysmccnet.analysis.get_omics_modules(Abar: DataFrame, cut_height: float = 0.9999999999) → List[List[int]][source]¶

Extract omics modules via hierarchical clustering on the similarity matrix.

Parameters:

Abar (pd.DataFrame) – Similarity/adjacency matrix for all features.
cut_height (float) – Height threshold for hierarchical tree cutting.

Returns:

Each inner list contains 0-based feature indices belonging to a module.

Return type:

List[List[int]]

bioneuralnet.network.pysmccnet.analysis.prune_modules(Abar: DataFrame, X_combined: ndarray, Y: ndarray, modules: List[List[int]], feature_labels: List[str], min_size: int = 10, max_size: int = 100, summarization: str = 'NetSHy', saving_dir: str = '.') → List[dict][source]¶

Prune network modules to target size range and compute summarization scores.

For each module from hierarchical clustering, iteratively removes the lowest-degree node until the module fits within [min_size, max_size]. Then computes NetSHy summarization scores and per-feature phenotype correlations.

Parameters:

Abar (pd.DataFrame) – Global adjacency matrix.
X_combined (np.ndarray) – Column-bound omics data of shape (n_samples, n_total_features).
Y (np.ndarray) – Phenotype vector of shape (n_samples,).
modules (List[List[int]]) – Feature index groups from get_omics_modules.
feature_labels (List[str]) – Feature names matching columns of X_combined.
min_size (int) – Minimum module size to retain.
max_size (int) – Maximum module size; larger modules are pruned down.
summarization (str) – Summarization method. Currently only ‘NetSHy’ is supported.
saving_dir (str) – Directory to save per-module pickle files.

Returns:

One dict per valid module with keys: module_id, nodes, node_indices, adjacency, correlation, pc_correlations, netshy, omics_correlation.

Return type:

List[dict]

bioneuralnet.network.pysmccnet.analysis.summarize_netshy(X: DataFrame | ndarray, A: DataFrame | ndarray, npc: int = 1) → dict[source]¶

NetSHy network summarization via hybrid approach leveraging topological properties.

Summarizes a subnetwork by projecting omics data through the graph Laplacian, then extracting principal components from the projected space.

Source: summarizeNetSHy (Vu et al., Bioinformatics 2023)

Parameters:

X (pd.DataFrame | np.ndarray) – Data matrix of shape (n_samples, n_features).
A (pd.DataFrame | np.ndarray) – Adjacency matrix of shape (n_features, n_features).
npc (int) – Number of principal components for summarization.

Returns:

Keys are ‘scores’ (n, npc), ‘importance’ (sdev, variance_pct, cumulative_pct), and ‘loadings’ (n_features, npc).

Return type:

dict