bioneuralnet.network¶

Network Construction and Analysis.

This module provides tools for generating, searching, and analyzing multi-omics networks. It includes methods for building networks from raw tabular data using similarity, correlation, thresholding, and Gaussian KNN, as well as phenotype-driven strategies like PySmCCNet.

Functions

`auto_pysmccnet`(X, Y[, AdjustedCovar, ...])	Automated SmCCNet workflow with GPU acceleration.
`correlation_network`(X[, k, method, signed, ...])	Build a correlation-based graph from feature vectors with optional kNN sparsification.
`gaussian_knn_network`(X[, k, sigma, mutual, ...])	Build a Gaussian (RBF) kNN similarity graph from feature vectors.
`network_search`(omics_data, y_labels[, ...])	Search over graph-construction hyperparameters using a structural proxy.
`similarity_network`(X[, k, metric, mutual, ...])	Build a k-nearest neighbors similarity graph from feature vectors.
`threshold_network`(X[, b, k, mutual, ...])	Build a soft-thresholded kNN co-expression graph, similar to WGCNA-style networks.

Classes

NetworkAnalyzer(adjacency_matrix[, ...])

Performs GPU-accelerated network analysis.

class bioneuralnet.network.NetworkAnalyzer(adjacency_matrix: DataFrame, source_omics: list | None = None, device: str = 'cuda')[source]¶

Bases: object

Performs GPU-accelerated network analysis.

This class leverages PyTorch tensors to speed up graph statistics, clustering computations, and edge analysis for large-scale omics networks.

Parameters:

adjacency_matrix (pd.DataFrame) – The input weighted adjacency matrix representing network connections.
source_omics (list) – Optional list of original DataFrames used to build the network to dynamically assign omics types.
device (str) – The target computing device, defaulting to ‘cuda’ if available.

basic_statistics(threshold: float = 0.5) → Dict[str, float | int | ndarray][source]¶

Computes fundamental graph metrics including density, degree statistics, and node isolation counts.

This provides a high-level overview of the network topology and connectivity at a specific threshold.

Parameters:: threshold (float) – The threshold used to binarize the network before analysis.
Returns:: A dictionary containing node count, edge count, density, average/max/min degree, and isolated node count.
Return type:: dict

clustering_coefficient_gpu(threshold: float = 0.5, sample_size: int | None = None) → Dict[str, float | ndarray][source]¶

Computes the local clustering coefficient for nodes using GPU-optimized matrix operations.

This measures the degree to which nodes tend to cluster together, using random sampling for efficiency on large graphs.

Parameters:

threshold (float) – The threshold used to define valid edges.
sample_size (Optional[int]) – The number of nodes to sample for calculation to save memory on massive graphs.

Returns:

Statistics including average, max, and min clustering coefficients, plus raw values and sample indices.

Return type:

dict

connected_components(threshold: float = 0.5) → Dict[str, int | ndarray | List[int]][source]¶

Identifies isolated subgraphs within the network using Breadth-First Search logic.

This computation is performed on the CPU using scipy due to the sequential nature of traversal algorithms.

Parameters:: threshold (float) – The threshold used to define connectivity.
Returns:: Contains the count of components, label assignments for each node, and a size distribution list.
Return type:: dict

cross_omics_analysis(threshold: float = 0.5) → Dict[tuple, Dict][source]¶

Quantifies the connectivity density between different omics layers (e.g., RNA vs Protein).

This reveals whether the network structure is driven by within-omics correlations or cross-omics interactions.

Parameters:: threshold (float) – The threshold used to count valid edges between features.
Returns:: A nested dictionary mapping omics pairs to their edge counts and density statistics.
Return type:: dict

degree_distribution(threshold: float = 0.5) → DataFrame[source]¶

Calculates the frequency distribution of node degrees across the entire network.

This helps identify if the network follows a scale-free power law or a random graph distribution.

Parameters:: threshold (float) – The threshold used to binarize the network.
Returns:: A DataFrame with columns for degree, count, and percentage of total nodes.
Return type:: pd.DataFrame

edge_weight_analysis() → ndarray | None[source]¶

Analyzes the statistical distribution of edge weights across the entire network.

This is useful for determining appropriate threshold values and understanding signal strength distribution.

Parameters:: None.
Returns:: An array of all non-zero edge weights, or None if no edges exist.
Return type:: Optional[np.ndarray]

find_strongest_edges(top_n: int = 50) → DataFrame[source]¶

Retrieves the strongest edges in the network sorted by weight magnitude.

This isolates the most significant pairwise interactions between features.

Parameters:: top_n (int) – The number of top weighted edges to return.
Returns:: A DataFrame detailing the top interactions, including feature names and weights.
Return type:: pd.DataFrame

hub_analysis(threshold: float = 0.5, top_n: int = 10) → DataFrame[source]¶

Identifies and ranks the most highly connected ‘hub’ nodes in the network.

This is critical for finding central regulatory features or bottlenecks in the omics network.

Parameters:

threshold (float) – The threshold used to define network edges.
top_n (int) – The number of top degree nodes to retrieve.

Returns:

A table of the top N nodes including their rank, feature name, omics type, and degree.

Return type:

pd.DataFrame

threshold_network(threshold: float) → torch.Tensor[source]¶

Generates a binary adjacency matrix by applying a hard threshold to the connection weights.

This converts continuous edge weights into a binary structure suitable for standard graph topology metrics.

Parameters:: threshold (float) – The cutoff value above which an edge is considered to exist.
Returns:: A binary tensor where 1 indicates an edge and 0 indicates no edge.
Return type:: torch.Tensor

bioneuralnet.network.auto_pysmccnet(X: List[DataFrame | ndarray], Y: DataFrame | ndarray, AdjustedCovar: DataFrame | None = None, preprocess: bool = False, Kfold: int = 5, subSampNum: int = 100, DataType: List[str] | None = None, BetweenShrinkage: float = 2.0, ScalingPen: List[float] = [0.1, 0.1], saving_dir: str = '/home/docs/checkouts/readthedocs.org/user_builds/bioneuralnet/checkouts/latest/docs/source', tuneLength: int = 5, tuneRangeCCA: List[float] = [0.1, 0.5], tuneRangePLS: List[float] = [0.5, 0.9], EvalMethod: str = 'accuracy', ncomp_pls: int = 3, seed: int = 123, CutHeight: float = 0.9999999999, min_size: int = 10, max_size: int = 100, summarization: str = 'NetSHy', precomputed_fold_data: dict | None = None, device: torch.device | None = 'cpu', dtype: torch.dtype = torch.float64, rename: bool = True) → dict[source]¶

Automated SmCCNet workflow with GPU acceleration.

Runs the complete SmCCNet pipeline supporting both CCA (continuous phenotype) and PLS (binary phenotype) modes. The workflow includes optional preprocessing, cross-validation for penalty tuning, subsampling for stability selection, and final network construction.

Parameters:

X (List[pd.DataFrame | np.ndarray]) – Input data matrices (omics layers) for integration.
Y (pd.DataFrame | np.ndarray) – Phenotype vector; numeric for CCA or binary (0/1) for PLS.
AdjustedCovar (pd.DataFrame | None) – Optional covariates to regress out from X before analysis.
preprocess (bool) – If True, center and scale data; if False, use raw input.
Kfold (int) – Number of cross-validation folds for penalty parameter tuning.
subSampNum (int) – Number of subsampling iterations for stability selection.
DataType (List[str] | None) – Names for each omics layer in X; defaults to generic names if None.
BetweenShrinkage (float) – Shrinkage factor for between-omics scaling weights.
ScalingPen (List[float]) – Penalty terms used for determining scaling factors.
saving_dir (str) – Directory path for saving output results.
tuneLength (int) – Number of candidate penalty parameters to test per omics layer.
tuneRangeCCA (List[float]) – Min and max penalty values for CCA (continuous phenotype).
tuneRangePLS (List[float]) – Min and max penalty values for PLS (binary phenotype).
EvalMethod (str) – Metric for PLS evaluation; one of ‘accuracy’, ‘auc’, ‘precision’, ‘recall’, or ‘f1’.
ncomp_pls (int) – Number of latent components to use for PLS models.
CutHeight (float) – Height threshold for hierarchical tree cutting in module extraction.
min_size (int) – Minimum number of nodes to retain a network module.
max_size (int) – Maximum module size; larger modules are pruned down.
summarization (str) – Network summarization method. Currently only ‘NetSHy’ is supported.
seed (int) – Random seed for reproducibility.
precomputed_fold_data (dict | None) – Precomputed CV folds to bypass internal fold generation.
device (torch.device | cpu) – PyTorch device; if None, automatically selects GPU if available.
dtype (torch.dtype) – PyTorch data type for computations.
rename (bool) – If True, prefix datatype to column names; if False, use original column names.

Returns:

Dictionary containing results for ‘CCA’ or ‘PLS’ including adjacency matrices, processed data, and CV results.

Return type:

dict

bioneuralnet.network.correlation_network(X: DataFrame, k: int | None = 15, method: str = 'pearson', signed: bool = True, normalize: bool = True, mutual: bool = False, per_node: bool = True, threshold: float | None = None, self_loops: bool = False) → DataFrame[source]¶

Build a correlation-based graph from feature vectors with optional kNN sparsification.

Pairwise correlations (Pearson or Spearman) are computed between features, mapped to similarity scores in [0, 1], and then optionally sparsified using per-node kNN or a global cutoff. Mutual pruning, self-loops, and row-normalization can be applied to obtain a final adjacency matrix.

Parameters:

X (pd.DataFrame) – Input data of shape (N, D) where N is the number of samples and D is the number of features.
k (int | None) – Number of neighbors for sparsification; when per_node is True this is per node, otherwise used to approximate k*N edges globally, and if None with threshold=None a fully connected graph is returned subject to self_loops.
method (str) – Correlation method; “pearson” for standard correlation or “spearman” for rank-based correlation.
signed (bool) – If True, use signed correlations mapped to [0, 1] via (C + 1)/2; if False, use absolute correlations in [0, 1].
normalize (bool) – If True, row-normalize the adjacency; if False, keep raw similarity weights.
mutual (bool) – If True, retain only edges that are present in both directions (i->j and j->i).
per_node (bool) – If True, apply kNN per node; if False, use a global cutoff determined by k or threshold.
threshold (float | None) – Similarity cutoff; when provided and per_node is False, overrides the k-based global cutoff.
self_loops (bool) – If True, add self-loop weights of 1 on the diagonal of the adjacency matrix.

Returns:

Adjacency matrix of shape (D, D) representing the feature-feature correlation graph.

Return type:

pd.DataFrame

bioneuralnet.network.gaussian_knn_network(X: DataFrame, k: int = 15, sigma: float | None = None, mutual: bool = False, self_loops: bool = True, normalize: bool = True) → DataFrame[source]¶

Build a Gaussian (RBF) kNN similarity graph from feature vectors.

Pairwise Euclidean distances between features are converted to similarities using a Gaussian kernel with bandwidth sigma (or a median-distance heuristic). The graph is sparsified by keeping top-k neighbors per node, optionally restricted to mutual neighbors, with optional self-loops and row-normalization.

Parameters:

X (pd.DataFrame) – Input data of shape (N, D) where N is the number of samples and D is the number of features.
k (int) – Number of neighbors to keep per node in the kNN graph.
sigma (float | None) – Bandwidth parameter for the Gaussian kernel; if None, a median squared distance heuristic is used.
mutual (bool) – If True, retain only edges where i and j are mutual kNN neighbors.
self_loops (bool) – If True, add self-loop weights of 1 on the diagonal of the adjacency matrix.
normalize (bool) – If True, row-normalize the adjacency so each row sums to 1.

Returns:

Adjacency matrix of shape (D, D) representing the Gaussian-kernel feature similarity graph.

Return type:

pd.DataFrame

bioneuralnet.network.network_search(omics_data: DataFrame, y_labels, methods: list = ['correlation', 'threshold', 'similarity', 'gaussian'], seed: int = 1883, verbose: bool = True, trials: int | None = None, centrality_mode: str = 'eigenvector', topology_weight: float = 0.15, scoring: str = 'f1_macro') → tuple[DataFrame, dict, DataFrame][source]¶

Search over graph-construction hyperparameters using a structural proxy.

Each candidate configuration builds a graph, scores it with a fast centrality-weighted Ridge classifier proxy, and blends that score with a topological quality term (average clustering coefficient) to favour well-connected, informative graphs.

Parameters:

omics_data – Feature matrix of shape (n_samples, n_features).
y_labels – Target labels for stratified CV evaluation.
methods – Graph-construction methods to search over.
seed – Random seed for reproducibility.
verbose – Log per-configuration progress.
trials – Optional cap on evaluated configurations (random subset).
centrality_mode – Centrality used for feature weighting in the proxy; one of "eigenvector" or "degree".
topology_weight – Blending factor in [0, 1] that controls how much the topological quality term contributes to the final score. 0 ignores topology; 1 ignores the proxy F1.

Returns:

A 3-tuple of (best_graph, best_params, results_df).

Raises:

RuntimeError – If every configuration fails.

bioneuralnet.network.similarity_network(X: DataFrame, k: int = 15, metric: str = 'cosine', mutual: bool = False, per_node: bool = True, self_loops: bool = False, normalize: bool = True) → DataFrame[source]¶

Build a k-nearest neighbors similarity graph from feature vectors.

Pairwise similarities are computed using either cosine similarity or a Gaussian kernel on Euclidean distances. The similarity matrix is sparsified by keeping top-k neighbors per node (or via a global cutoff), optionally restricted to mutual neighbors, with optional self-loops and row-normalization.

Parameters:

X (pd.DataFrame) – Input data of shape (N, D) where N is the number of samples and D is the number of features.
k (int) – Number of neighbors to keep per node, or approximate neighbors per node when using a global cutoff.
metric (str) – Similarity metric; either “cosine” or “euclidean” (case-insensitive) where the latter uses a Gaussian kernel on squared distances.
mutual (bool) – If True, retain only edges where i is in the kNN of j and j is in the kNN of i.
per_node (bool) – If True, apply kNN per node; if False, apply a global threshold to keep approximately k edges per node.
self_loops (bool) – If True, add self-loop weights of 1 on the diagonal of the adjacency matrix.
normalize (bool) – If True, row-normalize the adjacency so each row sums to 1.

Returns:

Adjacency matrix of shape (D, D) representing the feature-feature similarity graph.

Return type:

pd.DataFrame

bioneuralnet.network.threshold_network(X: DataFrame, b: float = 6.0, k: int = 15, mutual: bool = False, self_loops: bool = False, normalize: bool = True) → DataFrame[source]¶

Build a soft-thresholded kNN co-expression graph, similar to WGCNA-style networks.

Absolute Pearson correlations between features are raised to a power b to obtain soft-thresholded similarities. A kNN mask keeps the top-k neighbors per node, optionally restricted to mutual neighbors, with optional self-loops and row-normalization.

Parameters:

X (pd.DataFrame) – Input data of shape (N, D) where N is the number of samples and D is the number of features.
b (float) – Soft-threshold exponent applied to absolute correlations to control network sparsity and hub emphasis.
k (int) – Number of neighbors to keep per node in the kNN graph.
mutual (bool) – If True, retain only edges where i and j are mutual kNN neighbors.
self_loops (bool) – If True, add self-loop weights of 1 on the diagonal of the adjacency matrix.
normalize (bool) – If True, row-normalize the adjacency so each row sums to 1.

Returns:

Adjacency matrix of shape (D, D) representing the soft-thresholded co-expression graph.

Return type:

pd.DataFrame

Modules

`generate`
`pysmccnet`	Sparse Multiple Canonical Correlation Network (SmCCNet 2.0).
`tools`