bioneuralnet.network

Network Construction and Analysis.

This module provides tools for generating, searching, and analyzing multi-omics networks. It includes methods for building networks from raw tabular data using similarity, correlation, thresholding, and Gaussian KNN, as well as phenotype-driven strategies like PySmCCNet.

Functions

auto_pysmccnet(X, Y[, AdjustedCovar, ...])

Automated SmCCNet workflow with GPU acceleration.

correlation_network(X[, k, method, signed, ...])

Build a correlation-based graph from feature vectors with optional kNN sparsification.

gaussian_knn_network(X[, k, sigma, mutual, ...])

Build a Gaussian (RBF) kNN similarity graph from feature vectors.

network_search(omics_data, y_labels[, ...])

Search over graph-construction hyperparameters using a structural proxy.

similarity_network(X[, k, metric, mutual, ...])

Build a k-nearest neighbors similarity graph from feature vectors.

threshold_network(X[, b, k, mutual, ...])

Build a soft-thresholded kNN co-expression graph, similar to WGCNA-style networks.

Classes

NetworkAnalyzer(adjacency_matrix[, ...])

Performs GPU-accelerated network analysis.

class bioneuralnet.network.NetworkAnalyzer(adjacency_matrix: DataFrame, source_omics: list | None = None, device: str = 'cuda')[source]

Bases: object

Performs GPU-accelerated network analysis.

This class leverages PyTorch tensors to speed up graph statistics, clustering computations, and edge analysis for large-scale omics networks.

Parameters:
  • adjacency_matrix (pd.DataFrame) – The input weighted adjacency matrix representing network connections.

  • source_omics (list) – Optional list of original DataFrames used to build the network to dynamically assign omics types.

  • device (str) – The target computing device, defaulting to ‘cuda’ if available.

basic_statistics(threshold: float = 0.5) Dict[str, float | int | ndarray][source]

Computes fundamental graph metrics including density, degree statistics, and node isolation counts.

This provides a high-level overview of the network topology and connectivity at a specific threshold.

Parameters:

threshold (float) – The threshold used to binarize the network before analysis.

Returns:

A dictionary containing node count, edge count, density, average/max/min degree, and isolated node count.

Return type:

dict

clustering_coefficient_gpu(threshold: float = 0.5, sample_size: int | None = None) Dict[str, float | ndarray][source]

Computes the local clustering coefficient for nodes using GPU-optimized matrix operations.

This measures the degree to which nodes tend to cluster together, using random sampling for efficiency on large graphs.

Parameters:
  • threshold (float) – The threshold used to define valid edges.

  • sample_size (Optional[int]) – The number of nodes to sample for calculation to save memory on massive graphs.

Returns:

Statistics including average, max, and min clustering coefficients, plus raw values and sample indices.

Return type:

dict

connected_components(threshold: float = 0.5) Dict[str, int | ndarray | List[int]][source]

Identifies isolated subgraphs within the network using Breadth-First Search logic.

This computation is performed on the CPU using scipy due to the sequential nature of traversal algorithms.

Parameters:

threshold (float) – The threshold used to define connectivity.

Returns:

Contains the count of components, label assignments for each node, and a size distribution list.

Return type:

dict

cross_omics_analysis(threshold: float = 0.5) Dict[tuple, Dict][source]

Quantifies the connectivity density between different omics layers (e.g., RNA vs Protein).

This reveals whether the network structure is driven by within-omics correlations or cross-omics interactions.

Parameters:

threshold (float) – The threshold used to count valid edges between features.

Returns:

A nested dictionary mapping omics pairs to their edge counts and density statistics.

Return type:

dict

degree_distribution(threshold: float = 0.5) DataFrame[source]

Calculates the frequency distribution of node degrees across the entire network.

This helps identify if the network follows a scale-free power law or a random graph distribution.

Parameters:

threshold (float) – The threshold used to binarize the network.

Returns:

A DataFrame with columns for degree, count, and percentage of total nodes.

Return type:

pd.DataFrame

edge_weight_analysis() ndarray | None[source]

Analyzes the statistical distribution of edge weights across the entire network.

This is useful for determining appropriate threshold values and understanding signal strength distribution.

Parameters:

None.

Returns:

An array of all non-zero edge weights, or None if no edges exist.

Return type:

Optional[np.ndarray]

find_strongest_edges(top_n: int = 50) DataFrame[source]

Retrieves the strongest edges in the network sorted by weight magnitude.

This isolates the most significant pairwise interactions between features.

Parameters:

top_n (int) – The number of top weighted edges to return.

Returns:

A DataFrame detailing the top interactions, including feature names and weights.

Return type:

pd.DataFrame

hub_analysis(threshold: float = 0.5, top_n: int = 10) DataFrame[source]

Identifies and ranks the most highly connected ‘hub’ nodes in the network.

This is critical for finding central regulatory features or bottlenecks in the omics network.

Parameters:
  • threshold (float) – The threshold used to define network edges.

  • top_n (int) – The number of top degree nodes to retrieve.

Returns:

A table of the top N nodes including their rank, feature name, omics type, and degree.

Return type:

pd.DataFrame

threshold_network(threshold: float) torch.Tensor[source]

Generates a binary adjacency matrix by applying a hard threshold to the connection weights.

This converts continuous edge weights into a binary structure suitable for standard graph topology metrics.

Parameters:

threshold (float) – The cutoff value above which an edge is considered to exist.

Returns:

A binary tensor where 1 indicates an edge and 0 indicates no edge.

Return type:

torch.Tensor

bioneuralnet.network.auto_pysmccnet(X: List[DataFrame | ndarray], Y: DataFrame | ndarray, AdjustedCovar: DataFrame | None = None, preprocess: bool = False, Kfold: int = 5, subSampNum: int = 100, DataType: List[str] | None = None, BetweenShrinkage: float = 2.0, ScalingPen: List[float] = [0.1, 0.1], saving_dir: str = '/home/docs/checkouts/readthedocs.org/user_builds/bioneuralnet/checkouts/latest/docs/source', tuneLength: int = 5, tuneRangeCCA: List[float] = [0.1, 0.5], tuneRangePLS: List[float] = [0.5, 0.9], EvalMethod: str = 'accuracy', ncomp_pls: int = 3, seed: int = 123, CutHeight: float = 0.9999999999, min_size: int = 10, max_size: int = 100, summarization: str = 'NetSHy', precomputed_fold_data: dict | None = None, device: torch.device | None = 'cpu', dtype: torch.dtype = torch.float64, rename: bool = True) dict[source]

Automated SmCCNet workflow with GPU acceleration.

Runs the complete SmCCNet pipeline supporting both CCA (continuous phenotype) and PLS (binary phenotype) modes. The workflow includes optional preprocessing, cross-validation for penalty tuning, subsampling for stability selection, and final network construction.

Parameters:
  • X (List[pd.DataFrame | np.ndarray]) – Input data matrices (omics layers) for integration.

  • Y (pd.DataFrame | np.ndarray) – Phenotype vector; numeric for CCA or binary (0/1) for PLS.

  • AdjustedCovar (pd.DataFrame | None) – Optional covariates to regress out from X before analysis.

  • preprocess (bool) – If True, center and scale data; if False, use raw input.

  • Kfold (int) – Number of cross-validation folds for penalty parameter tuning.

  • subSampNum (int) – Number of subsampling iterations for stability selection.

  • DataType (List[str] | None) – Names for each omics layer in X; defaults to generic names if None.

  • BetweenShrinkage (float) – Shrinkage factor for between-omics scaling weights.

  • ScalingPen (List[float]) – Penalty terms used for determining scaling factors.

  • saving_dir (str) – Directory path for saving output results.

  • tuneLength (int) – Number of candidate penalty parameters to test per omics layer.

  • tuneRangeCCA (List[float]) – Min and max penalty values for CCA (continuous phenotype).

  • tuneRangePLS (List[float]) – Min and max penalty values for PLS (binary phenotype).

  • EvalMethod (str) – Metric for PLS evaluation; one of ‘accuracy’, ‘auc’, ‘precision’, ‘recall’, or ‘f1’.

  • ncomp_pls (int) – Number of latent components to use for PLS models.

  • CutHeight (float) – Height threshold for hierarchical tree cutting in module extraction.

  • min_size (int) – Minimum number of nodes to retain a network module.

  • max_size (int) – Maximum module size; larger modules are pruned down.

  • summarization (str) – Network summarization method. Currently only ‘NetSHy’ is supported.

  • seed (int) – Random seed for reproducibility.

  • precomputed_fold_data (dict | None) – Precomputed CV folds to bypass internal fold generation.

  • device (torch.device | cpu) – PyTorch device; if None, automatically selects GPU if available.

  • dtype (torch.dtype) – PyTorch data type for computations.

  • rename (bool) – If True, prefix datatype to column names; if False, use original column names.

Returns:

Dictionary containing results for ‘CCA’ or ‘PLS’ including adjacency matrices, processed data, and CV results.

Return type:

dict

bioneuralnet.network.correlation_network(X: DataFrame, k: int | None = 15, method: str = 'pearson', signed: bool = True, normalize: bool = True, mutual: bool = False, per_node: bool = True, threshold: float | None = None, self_loops: bool = False) DataFrame[source]

Build a correlation-based graph from feature vectors with optional kNN sparsification.

Pairwise correlations (Pearson or Spearman) are computed between features, mapped to similarity scores in [0, 1], and then optionally sparsified using per-node kNN or a global cutoff. Mutual pruning, self-loops, and row-normalization can be applied to obtain a final adjacency matrix.

Parameters:
  • X (pd.DataFrame) – Input data of shape (N, D) where N is the number of samples and D is the number of features.

  • k (int | None) – Number of neighbors for sparsification; when per_node is True this is per node, otherwise used to approximate k*N edges globally, and if None with threshold=None a fully connected graph is returned subject to self_loops.

  • method (str) – Correlation method; “pearson” for standard correlation or “spearman” for rank-based correlation.

  • signed (bool) – If True, use signed correlations mapped to [0, 1] via (C + 1)/2; if False, use absolute correlations in [0, 1].

  • normalize (bool) – If True, row-normalize the adjacency; if False, keep raw similarity weights.

  • mutual (bool) – If True, retain only edges that are present in both directions (i->j and j->i).

  • per_node (bool) – If True, apply kNN per node; if False, use a global cutoff determined by k or threshold.

  • threshold (float | None) – Similarity cutoff; when provided and per_node is False, overrides the k-based global cutoff.

  • self_loops (bool) – If True, add self-loop weights of 1 on the diagonal of the adjacency matrix.

Returns:

Adjacency matrix of shape (D, D) representing the feature-feature correlation graph.

Return type:

pd.DataFrame

bioneuralnet.network.gaussian_knn_network(X: DataFrame, k: int = 15, sigma: float | None = None, mutual: bool = False, self_loops: bool = True, normalize: bool = True) DataFrame[source]

Build a Gaussian (RBF) kNN similarity graph from feature vectors.

Pairwise Euclidean distances between features are converted to similarities using a Gaussian kernel with bandwidth sigma (or a median-distance heuristic). The graph is sparsified by keeping top-k neighbors per node, optionally restricted to mutual neighbors, with optional self-loops and row-normalization.

Parameters:
  • X (pd.DataFrame) – Input data of shape (N, D) where N is the number of samples and D is the number of features.

  • k (int) – Number of neighbors to keep per node in the kNN graph.

  • sigma (float | None) – Bandwidth parameter for the Gaussian kernel; if None, a median squared distance heuristic is used.

  • mutual (bool) – If True, retain only edges where i and j are mutual kNN neighbors.

  • self_loops (bool) – If True, add self-loop weights of 1 on the diagonal of the adjacency matrix.

  • normalize (bool) – If True, row-normalize the adjacency so each row sums to 1.

Returns:

Adjacency matrix of shape (D, D) representing the Gaussian-kernel feature similarity graph.

Return type:

pd.DataFrame

Search over graph-construction hyperparameters using a structural proxy.

Each candidate configuration builds a graph, scores it with a fast centrality-weighted Ridge classifier proxy, and blends that score with a topological quality term (average clustering coefficient) to favour well-connected, informative graphs.

Parameters:
  • omics_data – Feature matrix of shape (n_samples, n_features).

  • y_labels – Target labels for stratified CV evaluation.

  • methods – Graph-construction methods to search over.

  • seed – Random seed for reproducibility.

  • verbose – Log per-configuration progress.

  • trials – Optional cap on evaluated configurations (random subset).

  • centrality_mode – Centrality used for feature weighting in the proxy; one of "eigenvector" or "degree".

  • topology_weight – Blending factor in [0, 1] that controls how much the topological quality term contributes to the final score. 0 ignores topology; 1 ignores the proxy F1.

Returns:

A 3-tuple of (best_graph, best_params, results_df).

Raises:

RuntimeError – If every configuration fails.

bioneuralnet.network.similarity_network(X: DataFrame, k: int = 15, metric: str = 'cosine', mutual: bool = False, per_node: bool = True, self_loops: bool = False, normalize: bool = True) DataFrame[source]

Build a k-nearest neighbors similarity graph from feature vectors.

Pairwise similarities are computed using either cosine similarity or a Gaussian kernel on Euclidean distances. The similarity matrix is sparsified by keeping top-k neighbors per node (or via a global cutoff), optionally restricted to mutual neighbors, with optional self-loops and row-normalization.

Parameters:
  • X (pd.DataFrame) – Input data of shape (N, D) where N is the number of samples and D is the number of features.

  • k (int) – Number of neighbors to keep per node, or approximate neighbors per node when using a global cutoff.

  • metric (str) – Similarity metric; either “cosine” or “euclidean” (case-insensitive) where the latter uses a Gaussian kernel on squared distances.

  • mutual (bool) – If True, retain only edges where i is in the kNN of j and j is in the kNN of i.

  • per_node (bool) – If True, apply kNN per node; if False, apply a global threshold to keep approximately k edges per node.

  • self_loops (bool) – If True, add self-loop weights of 1 on the diagonal of the adjacency matrix.

  • normalize (bool) – If True, row-normalize the adjacency so each row sums to 1.

Returns:

Adjacency matrix of shape (D, D) representing the feature-feature similarity graph.

Return type:

pd.DataFrame

bioneuralnet.network.threshold_network(X: DataFrame, b: float = 6.0, k: int = 15, mutual: bool = False, self_loops: bool = False, normalize: bool = True) DataFrame[source]

Build a soft-thresholded kNN co-expression graph, similar to WGCNA-style networks.

Absolute Pearson correlations between features are raised to a power b to obtain soft-thresholded similarities. A kNN mask keeps the top-k neighbors per node, optionally restricted to mutual neighbors, with optional self-loops and row-normalization.

Parameters:
  • X (pd.DataFrame) – Input data of shape (N, D) where N is the number of samples and D is the number of features.

  • b (float) – Soft-threshold exponent applied to absolute correlations to control network sparsity and hub emphasis.

  • k (int) – Number of neighbors to keep per node in the kNN graph.

  • mutual (bool) – If True, retain only edges where i and j are mutual kNN neighbors.

  • self_loops (bool) – If True, add self-loop weights of 1 on the diagonal of the adjacency matrix.

  • normalize (bool) – If True, row-normalize the adjacency so each row sums to 1.

Returns:

Adjacency matrix of shape (D, D) representing the soft-thresholded co-expression graph.

Return type:

pd.DataFrame

Modules

generate

pysmccnet

Sparse Multiple Canonical Correlation Network (SmCCNet 2.0).

tools