bioneuralnet.network.pysmccnet.analysis

Functions

data_preprocess(X[, covariates, is_cv, ...])

PyTorch version of data_preprocess for omics dataset preparation.

fcluster(Z, t[, criterion, depth, R, monocrit])

Form flat clusters from the hierarchical clustering defined by the given linkage matrix.

get_abar(ws[, feature_label])

PyTorch equivalent of get_abar performing matrix multiplication on GPU.

get_can_cor_multi(X, cc_coef, cc_weight, Y)

PyTorch version of get_can_cor_multi calculating canonical correlation value on GPU.

get_omics_modules(Abar[, cut_height])

Extract omics modules via hierarchical clustering on the similarity matrix.

linkage(y[, method, metric, optimal_ordering])

Perform hierarchical/agglomerative clustering.

pearsonr(x, y, *[, alternative, method, axis])

Pearson correlation coefficient and p-value for testing non-correlation.

prune_modules(Abar, X_combined, Y, modules, ...)

Prune network modules to target size range and compute summarization scores.

r_vec_mult_sum(v1, v2)

Computes element-wise multiplication and sum with vector recycling.

squareform(X[, force, checks])

Convert a vector-form distance vector to a square-form distance matrix, and vice-versa.

summarize_netshy(X, A[, npc])

NetSHy network summarization via hybrid approach leveraging topological properties.

Classes

PCA([n_components, copy, whiten, ...])

Principal component analysis (PCA).

StandardScaler(*[, copy, with_mean, with_std])

Standardize features by removing the mean and scaling to unit variance.

defaultdict

defaultdict(default_factory=None, /, [...]) --> dict with default factory

bioneuralnet.network.pysmccnet.analysis.data_preprocess(X: DataFrame | ndarray, covariates: DataFrame | ndarray | None = None, is_cv: bool = False, cv_quantile: float = 0.0, center: bool = True, scale: bool = True, device: torch.device | None = None, dtype: torch.dtype = torch.float64) DataFrame[source]

PyTorch version of data_preprocess for omics dataset preparation.

Parameters:
  • X (pd.DataFrame | np.ndarray) – Input omics data matrix.

  • covariates (pd.DataFrame | np.ndarray | None) – Optional covariates to regress out.

  • is_cv (bool) – If True, filter features based on coefficient of variation.

  • cv_quantile (float) – Quantile threshold for CV filtering; required if is_cv is True.

  • center (bool) – If True, center columns to mean zero.

  • scale (bool) – If True, scale columns to unit variance.

  • device (torch.device | None) – Calculation device; defaults to GPU if available.

  • dtype (torch.dtype) – Data type for tensor computations.

Returns:

Preprocessed data frame.

Return type:

pd.DataFrame

bioneuralnet.network.pysmccnet.analysis.get_abar(ws: DataFrame | ndarray | torch.Tensor | List[float], feature_label: List[str] | None = None) DataFrame[source]

PyTorch equivalent of get_abar performing matrix multiplication on GPU.

Parameters:
  • ws (pd.DataFrame | np.ndarray | torch.Tensor | List[float]) – Weight matrix or vector.

  • feature_label (List[str] | None) – List of feature names for the output DataFrame.

Returns:

Adjacency matrix (A-bar) representing feature similarity.

Return type:

pd.DataFrame

bioneuralnet.network.pysmccnet.analysis.get_can_cor_multi(X: List[torch.Tensor], cc_coef: ndarray | torch.Tensor | List[float], cc_weight: List[torch.Tensor | ndarray], Y: torch.Tensor | ndarray) float[source]

PyTorch version of get_can_cor_multi calculating canonical correlation value on GPU.

Parameters:
  • X (List[torch.Tensor]) – List of data matrices.

  • cc_coef (np.ndarray | torch.Tensor | List[float]) – Correlation coefficients / weights.

  • cc_weight (List[torch.Tensor | np.ndarray]) – List of weight vectors for projection.

  • Y (torch.Tensor | np.ndarray) – Phenotype data vector.

Returns:

Total canonical correlation (between-omics + omics-phenotype).

Return type:

float

bioneuralnet.network.pysmccnet.analysis.get_omics_modules(Abar: DataFrame, cut_height: float = 0.9999999999) List[List[int]][source]

Extract omics modules via hierarchical clustering on the similarity matrix.

Parameters:
  • Abar (pd.DataFrame) – Similarity/adjacency matrix for all features.

  • cut_height (float) – Height threshold for hierarchical tree cutting.

Returns:

Each inner list contains 0-based feature indices belonging to a module.

Return type:

List[List[int]]

bioneuralnet.network.pysmccnet.analysis.prune_modules(Abar: DataFrame, X_combined: ndarray, Y: ndarray, modules: List[List[int]], feature_labels: List[str], min_size: int = 10, max_size: int = 100, summarization: str = 'NetSHy', saving_dir: str = '.') List[dict][source]

Prune network modules to target size range and compute summarization scores.

For each module from hierarchical clustering, iteratively removes the lowest-degree node until the module fits within [min_size, max_size]. Then computes NetSHy summarization scores and per-feature phenotype correlations.

Parameters:
  • Abar (pd.DataFrame) – Global adjacency matrix.

  • X_combined (np.ndarray) – Column-bound omics data of shape (n_samples, n_total_features).

  • Y (np.ndarray) – Phenotype vector of shape (n_samples,).

  • modules (List[List[int]]) – Feature index groups from get_omics_modules.

  • feature_labels (List[str]) – Feature names matching columns of X_combined.

  • min_size (int) – Minimum module size to retain.

  • max_size (int) – Maximum module size; larger modules are pruned down.

  • summarization (str) – Summarization method. Currently only ‘NetSHy’ is supported.

  • saving_dir (str) – Directory to save per-module pickle files.

Returns:

One dict per valid module with keys: module_id, nodes, node_indices, adjacency, correlation, pc_correlations, netshy, omics_correlation.

Return type:

List[dict]

bioneuralnet.network.pysmccnet.analysis.summarize_netshy(X: DataFrame | ndarray, A: DataFrame | ndarray, npc: int = 1) dict[source]

NetSHy network summarization via hybrid approach leveraging topological properties.

Summarizes a subnetwork by projecting omics data through the graph Laplacian, then extracting principal components from the projected space.

Source: summarizeNetSHy (Vu et al., Bioinformatics 2023)

Parameters:
  • X (pd.DataFrame | np.ndarray) – Data matrix of shape (n_samples, n_features).

  • A (pd.DataFrame | np.ndarray) – Adjacency matrix of shape (n_features, n_features).

  • npc (int) – Number of principal components for summarization.

Returns:

Keys are ‘scores’ (n, npc), ‘importance’ (sdev, variance_pct, cumulative_pct), and ‘loadings’ (n_features, npc).

Return type:

dict