bioneuralnet.network¶
Network Construction and Analysis.
This module provides tools for generating, searching, and analyzing multi-omics networks. It includes methods for building networks from raw tabular data using similarity, correlation, thresholding, and Gaussian KNN, as well as phenotype-driven strategies like PySmCCNet.
Functions
|
Automated SmCCNet workflow with GPU acceleration. |
|
Build a correlation-based graph from feature vectors with optional kNN sparsification. |
|
Build a Gaussian (RBF) kNN similarity graph from feature vectors. |
|
Search over graph-construction hyperparameters using a structural proxy. |
|
Build a k-nearest neighbors similarity graph from feature vectors. |
|
Build a soft-thresholded kNN co-expression graph, similar to WGCNA-style networks. |
Classes
|
Performs GPU-accelerated network analysis. |
- class bioneuralnet.network.NetworkAnalyzer(adjacency_matrix: DataFrame, source_omics: list | None = None, device: str = 'cuda')[source]¶
Bases:
objectPerforms GPU-accelerated network analysis.
This class leverages PyTorch tensors to speed up graph statistics, clustering computations, and edge analysis for large-scale omics networks.
- Parameters:
adjacency_matrix (pd.DataFrame) – The input weighted adjacency matrix representing network connections.
source_omics (list) – Optional list of original DataFrames used to build the network to dynamically assign omics types.
device (str) – The target computing device, defaulting to ‘cuda’ if available.
- basic_statistics(threshold: float = 0.5) Dict[str, float | int | ndarray][source]¶
Computes fundamental graph metrics including density, degree statistics, and node isolation counts.
This provides a high-level overview of the network topology and connectivity at a specific threshold.
- clustering_coefficient_gpu(threshold: float = 0.5, sample_size: int | None = None) Dict[str, float | ndarray][source]¶
Computes the local clustering coefficient for nodes using GPU-optimized matrix operations.
This measures the degree to which nodes tend to cluster together, using random sampling for efficiency on large graphs.
- Parameters:
- Returns:
Statistics including average, max, and min clustering coefficients, plus raw values and sample indices.
- Return type:
- connected_components(threshold: float = 0.5) Dict[str, int | ndarray | List[int]][source]¶
Identifies isolated subgraphs within the network using Breadth-First Search logic.
This computation is performed on the CPU using scipy due to the sequential nature of traversal algorithms.
- cross_omics_analysis(threshold: float = 0.5) Dict[tuple, Dict][source]¶
Quantifies the connectivity density between different omics layers (e.g., RNA vs Protein).
This reveals whether the network structure is driven by within-omics correlations or cross-omics interactions.
- degree_distribution(threshold: float = 0.5) DataFrame[source]¶
Calculates the frequency distribution of node degrees across the entire network.
This helps identify if the network follows a scale-free power law or a random graph distribution.
- Parameters:
threshold (float) – The threshold used to binarize the network.
- Returns:
A DataFrame with columns for degree, count, and percentage of total nodes.
- Return type:
pd.DataFrame
- edge_weight_analysis() ndarray | None[source]¶
Analyzes the statistical distribution of edge weights across the entire network.
This is useful for determining appropriate threshold values and understanding signal strength distribution.
- Parameters:
None.
- Returns:
An array of all non-zero edge weights, or None if no edges exist.
- Return type:
Optional[np.ndarray]
- find_strongest_edges(top_n: int = 50) DataFrame[source]¶
Retrieves the strongest edges in the network sorted by weight magnitude.
This isolates the most significant pairwise interactions between features.
- Parameters:
top_n (int) – The number of top weighted edges to return.
- Returns:
A DataFrame detailing the top interactions, including feature names and weights.
- Return type:
pd.DataFrame
- hub_analysis(threshold: float = 0.5, top_n: int = 10) DataFrame[source]¶
Identifies and ranks the most highly connected ‘hub’ nodes in the network.
This is critical for finding central regulatory features or bottlenecks in the omics network.
- threshold_network(threshold: float) torch.Tensor[source]¶
Generates a binary adjacency matrix by applying a hard threshold to the connection weights.
This converts continuous edge weights into a binary structure suitable for standard graph topology metrics.
- Parameters:
threshold (float) – The cutoff value above which an edge is considered to exist.
- Returns:
A binary tensor where 1 indicates an edge and 0 indicates no edge.
- Return type:
- bioneuralnet.network.auto_pysmccnet(X: List[DataFrame | ndarray], Y: DataFrame | ndarray, AdjustedCovar: DataFrame | None = None, preprocess: bool = False, Kfold: int = 5, subSampNum: int = 100, DataType: List[str] | None = None, BetweenShrinkage: float = 2.0, ScalingPen: List[float] = [0.1, 0.1], saving_dir: str = '/home/docs/checkouts/readthedocs.org/user_builds/bioneuralnet/checkouts/latest/docs/source', tuneLength: int = 5, tuneRangeCCA: List[float] = [0.1, 0.5], tuneRangePLS: List[float] = [0.5, 0.9], EvalMethod: str = 'accuracy', ncomp_pls: int = 3, seed: int = 123, CutHeight: float = 0.9999999999, min_size: int = 10, max_size: int = 100, summarization: str = 'NetSHy', precomputed_fold_data: dict | None = None, device: torch.device | None = 'cpu', dtype: torch.dtype = torch.float64, rename: bool = True) dict[source]¶
Automated SmCCNet workflow with GPU acceleration.
Runs the complete SmCCNet pipeline supporting both CCA (continuous phenotype) and PLS (binary phenotype) modes. The workflow includes optional preprocessing, cross-validation for penalty tuning, subsampling for stability selection, and final network construction.
- Parameters:
X (List[pd.DataFrame | np.ndarray]) – Input data matrices (omics layers) for integration.
Y (pd.DataFrame | np.ndarray) – Phenotype vector; numeric for CCA or binary (0/1) for PLS.
AdjustedCovar (pd.DataFrame | None) – Optional covariates to regress out from X before analysis.
preprocess (bool) – If True, center and scale data; if False, use raw input.
Kfold (int) – Number of cross-validation folds for penalty parameter tuning.
subSampNum (int) – Number of subsampling iterations for stability selection.
DataType (List[str] | None) – Names for each omics layer in X; defaults to generic names if None.
BetweenShrinkage (float) – Shrinkage factor for between-omics scaling weights.
ScalingPen (List[float]) – Penalty terms used for determining scaling factors.
saving_dir (str) – Directory path for saving output results.
tuneLength (int) – Number of candidate penalty parameters to test per omics layer.
tuneRangeCCA (List[float]) – Min and max penalty values for CCA (continuous phenotype).
tuneRangePLS (List[float]) – Min and max penalty values for PLS (binary phenotype).
EvalMethod (str) – Metric for PLS evaluation; one of ‘accuracy’, ‘auc’, ‘precision’, ‘recall’, or ‘f1’.
ncomp_pls (int) – Number of latent components to use for PLS models.
CutHeight (float) – Height threshold for hierarchical tree cutting in module extraction.
min_size (int) – Minimum number of nodes to retain a network module.
max_size (int) – Maximum module size; larger modules are pruned down.
summarization (str) – Network summarization method. Currently only ‘NetSHy’ is supported.
seed (int) – Random seed for reproducibility.
precomputed_fold_data (dict | None) – Precomputed CV folds to bypass internal fold generation.
device (torch.device | cpu) – PyTorch device; if None, automatically selects GPU if available.
dtype (torch.dtype) – PyTorch data type for computations.
rename (bool) – If True, prefix datatype to column names; if False, use original column names.
- Returns:
Dictionary containing results for ‘CCA’ or ‘PLS’ including adjacency matrices, processed data, and CV results.
- Return type:
- bioneuralnet.network.correlation_network(X: DataFrame, k: int | None = 15, method: str = 'pearson', signed: bool = True, normalize: bool = True, mutual: bool = False, per_node: bool = True, threshold: float | None = None, self_loops: bool = False) DataFrame[source]¶
Build a correlation-based graph from feature vectors with optional kNN sparsification.
Pairwise correlations (Pearson or Spearman) are computed between features, mapped to similarity scores in [0, 1], and then optionally sparsified using per-node kNN or a global cutoff. Mutual pruning, self-loops, and row-normalization can be applied to obtain a final adjacency matrix.
- Parameters:
X (pd.DataFrame) – Input data of shape (N, D) where N is the number of samples and D is the number of features.
k (int | None) – Number of neighbors for sparsification; when per_node is True this is per node, otherwise used to approximate k*N edges globally, and if None with threshold=None a fully connected graph is returned subject to self_loops.
method (str) – Correlation method; “pearson” for standard correlation or “spearman” for rank-based correlation.
signed (bool) – If True, use signed correlations mapped to [0, 1] via (C + 1)/2; if False, use absolute correlations in [0, 1].
normalize (bool) – If True, row-normalize the adjacency; if False, keep raw similarity weights.
mutual (bool) – If True, retain only edges that are present in both directions (i->j and j->i).
per_node (bool) – If True, apply kNN per node; if False, use a global cutoff determined by k or threshold.
threshold (float | None) – Similarity cutoff; when provided and per_node is False, overrides the k-based global cutoff.
self_loops (bool) – If True, add self-loop weights of 1 on the diagonal of the adjacency matrix.
- Returns:
Adjacency matrix of shape (D, D) representing the feature-feature correlation graph.
- Return type:
pd.DataFrame
- bioneuralnet.network.gaussian_knn_network(X: DataFrame, k: int = 15, sigma: float | None = None, mutual: bool = False, self_loops: bool = True, normalize: bool = True) DataFrame[source]¶
Build a Gaussian (RBF) kNN similarity graph from feature vectors.
Pairwise Euclidean distances between features are converted to similarities using a Gaussian kernel with bandwidth sigma (or a median-distance heuristic). The graph is sparsified by keeping top-k neighbors per node, optionally restricted to mutual neighbors, with optional self-loops and row-normalization.
- Parameters:
X (pd.DataFrame) – Input data of shape (N, D) where N is the number of samples and D is the number of features.
k (int) – Number of neighbors to keep per node in the kNN graph.
sigma (float | None) – Bandwidth parameter for the Gaussian kernel; if None, a median squared distance heuristic is used.
mutual (bool) – If True, retain only edges where i and j are mutual kNN neighbors.
self_loops (bool) – If True, add self-loop weights of 1 on the diagonal of the adjacency matrix.
normalize (bool) – If True, row-normalize the adjacency so each row sums to 1.
- Returns:
Adjacency matrix of shape (D, D) representing the Gaussian-kernel feature similarity graph.
- Return type:
pd.DataFrame
- bioneuralnet.network.network_search(omics_data: DataFrame, y_labels, methods: list = ['correlation', 'threshold', 'similarity', 'gaussian'], seed: int = 1883, verbose: bool = True, trials: int | None = None, centrality_mode: str = 'eigenvector', topology_weight: float = 0.15, scoring: str = 'f1_macro') tuple[DataFrame, dict, DataFrame][source]¶
Search over graph-construction hyperparameters using a structural proxy.
Each candidate configuration builds a graph, scores it with a fast centrality-weighted Ridge classifier proxy, and blends that score with a topological quality term (average clustering coefficient) to favour well-connected, informative graphs.
- Parameters:
omics_data – Feature matrix of shape (n_samples, n_features).
y_labels – Target labels for stratified CV evaluation.
methods – Graph-construction methods to search over.
seed – Random seed for reproducibility.
verbose – Log per-configuration progress.
trials – Optional cap on evaluated configurations (random subset).
centrality_mode – Centrality used for feature weighting in the proxy; one of
"eigenvector"or"degree".topology_weight – Blending factor in [0, 1] that controls how much the topological quality term contributes to the final score.
0ignores topology;1ignores the proxy F1.
- Returns:
A 3-tuple of (best_graph, best_params, results_df).
- Raises:
RuntimeError – If every configuration fails.
- bioneuralnet.network.similarity_network(X: DataFrame, k: int = 15, metric: str = 'cosine', mutual: bool = False, per_node: bool = True, self_loops: bool = False, normalize: bool = True) DataFrame[source]¶
Build a k-nearest neighbors similarity graph from feature vectors.
Pairwise similarities are computed using either cosine similarity or a Gaussian kernel on Euclidean distances. The similarity matrix is sparsified by keeping top-k neighbors per node (or via a global cutoff), optionally restricted to mutual neighbors, with optional self-loops and row-normalization.
- Parameters:
X (pd.DataFrame) – Input data of shape (N, D) where N is the number of samples and D is the number of features.
k (int) – Number of neighbors to keep per node, or approximate neighbors per node when using a global cutoff.
metric (str) – Similarity metric; either “cosine” or “euclidean” (case-insensitive) where the latter uses a Gaussian kernel on squared distances.
mutual (bool) – If True, retain only edges where i is in the kNN of j and j is in the kNN of i.
per_node (bool) – If True, apply kNN per node; if False, apply a global threshold to keep approximately k edges per node.
self_loops (bool) – If True, add self-loop weights of 1 on the diagonal of the adjacency matrix.
normalize (bool) – If True, row-normalize the adjacency so each row sums to 1.
- Returns:
Adjacency matrix of shape (D, D) representing the feature-feature similarity graph.
- Return type:
pd.DataFrame
- bioneuralnet.network.threshold_network(X: DataFrame, b: float = 6.0, k: int = 15, mutual: bool = False, self_loops: bool = False, normalize: bool = True) DataFrame[source]¶
Build a soft-thresholded kNN co-expression graph, similar to WGCNA-style networks.
Absolute Pearson correlations between features are raised to a power b to obtain soft-thresholded similarities. A kNN mask keeps the top-k neighbors per node, optionally restricted to mutual neighbors, with optional self-loops and row-normalization.
- Parameters:
X (pd.DataFrame) – Input data of shape (N, D) where N is the number of samples and D is the number of features.
b (float) – Soft-threshold exponent applied to absolute correlations to control network sparsity and hub emphasis.
k (int) – Number of neighbors to keep per node in the kNN graph.
mutual (bool) – If True, retain only edges where i and j are mutual kNN neighbors.
self_loops (bool) – If True, add self-loop weights of 1 on the diagonal of the adjacency matrix.
normalize (bool) – If True, row-normalize the adjacency so each row sums to 1.
- Returns:
Adjacency matrix of shape (D, D) representing the soft-thresholded co-expression graph.
- Return type:
pd.DataFrame
Modules