bioneuralnet.network.tools¶
Functions
|
connected_components(csgraph, directed=True, connection='weak', |
|
Build a correlation-based graph from feature vectors with optional kNN sparsification. |
|
Evaluate a score by cross-validation. |
|
Build a Gaussian (RBF) kNN similarity graph from feature vectors. |
|
Retrieves a global logger configured to write to 'bioneuralnet.log'. |
|
Search over graph-construction hyperparameters using a structural proxy. |
|
Build a k-nearest neighbors similarity graph from feature vectors. |
|
Build a soft-thresholded kNN co-expression graph, similar to WGCNA-style networks. |
Classes
|
Performs GPU-accelerated network analysis. |
|
Grid of parameters with a discrete number of values for each. |
|
Classifier using Ridge regression. |
|
Standardize features by removing the mean and scaling to unit variance. |
|
Class-wise stratified K-Fold cross-validator. |
|
Compressed Sparse Row matrix. |
- class bioneuralnet.network.tools.NetworkAnalyzer(adjacency_matrix: DataFrame, source_omics: list | None = None, device: str = 'cuda')[source]¶
Bases:
objectPerforms GPU-accelerated network analysis.
This class leverages PyTorch tensors to speed up graph statistics, clustering computations, and edge analysis for large-scale omics networks.
- Parameters:
adjacency_matrix (pd.DataFrame) – The input weighted adjacency matrix representing network connections.
source_omics (list) – Optional list of original DataFrames used to build the network to dynamically assign omics types.
device (str) – The target computing device, defaulting to ‘cuda’ if available.
- basic_statistics(threshold: float = 0.5) Dict[str, float | int | ndarray][source]¶
Computes fundamental graph metrics including density, degree statistics, and node isolation counts.
This provides a high-level overview of the network topology and connectivity at a specific threshold.
- clustering_coefficient_gpu(threshold: float = 0.5, sample_size: int | None = None) Dict[str, float | ndarray][source]¶
Computes the local clustering coefficient for nodes using GPU-optimized matrix operations.
This measures the degree to which nodes tend to cluster together, using random sampling for efficiency on large graphs.
- Parameters:
- Returns:
Statistics including average, max, and min clustering coefficients, plus raw values and sample indices.
- Return type:
- connected_components(threshold: float = 0.5) Dict[str, int | ndarray | List[int]][source]¶
Identifies isolated subgraphs within the network using Breadth-First Search logic.
This computation is performed on the CPU using scipy due to the sequential nature of traversal algorithms.
- cross_omics_analysis(threshold: float = 0.5) Dict[tuple, Dict][source]¶
Quantifies the connectivity density between different omics layers (e.g., RNA vs Protein).
This reveals whether the network structure is driven by within-omics correlations or cross-omics interactions.
- degree_distribution(threshold: float = 0.5) DataFrame[source]¶
Calculates the frequency distribution of node degrees across the entire network.
This helps identify if the network follows a scale-free power law or a random graph distribution.
- Parameters:
threshold (float) – The threshold used to binarize the network.
- Returns:
A DataFrame with columns for degree, count, and percentage of total nodes.
- Return type:
pd.DataFrame
- edge_weight_analysis() ndarray | None[source]¶
Analyzes the statistical distribution of edge weights across the entire network.
This is useful for determining appropriate threshold values and understanding signal strength distribution.
- Parameters:
None.
- Returns:
An array of all non-zero edge weights, or None if no edges exist.
- Return type:
Optional[np.ndarray]
- find_strongest_edges(top_n: int = 50) DataFrame[source]¶
Retrieves the strongest edges in the network sorted by weight magnitude.
This isolates the most significant pairwise interactions between features.
- Parameters:
top_n (int) – The number of top weighted edges to return.
- Returns:
A DataFrame detailing the top interactions, including feature names and weights.
- Return type:
pd.DataFrame
- hub_analysis(threshold: float = 0.5, top_n: int = 10) DataFrame[source]¶
Identifies and ranks the most highly connected ‘hub’ nodes in the network.
This is critical for finding central regulatory features or bottlenecks in the omics network.
- threshold_network(threshold: float) torch.Tensor[source]¶
Generates a binary adjacency matrix by applying a hard threshold to the connection weights.
This converts continuous edge weights into a binary structure suitable for standard graph topology metrics.
- Parameters:
threshold (float) – The cutoff value above which an edge is considered to exist.
- Returns:
A binary tensor where 1 indicates an edge and 0 indicates no edge.
- Return type:
- bioneuralnet.network.tools.network_search(omics_data: DataFrame, y_labels, methods: list = ['correlation', 'threshold', 'similarity', 'gaussian'], seed: int = 1883, verbose: bool = True, trials: int | None = None, centrality_mode: str = 'eigenvector', topology_weight: float = 0.15, scoring: str = 'f1_macro') tuple[DataFrame, dict, DataFrame][source]¶
Search over graph-construction hyperparameters using a structural proxy.
Each candidate configuration builds a graph, scores it with a fast centrality-weighted Ridge classifier proxy, and blends that score with a topological quality term (average clustering coefficient) to favour well-connected, informative graphs.
- Parameters:
omics_data – Feature matrix of shape (n_samples, n_features).
y_labels – Target labels for stratified CV evaluation.
methods – Graph-construction methods to search over.
seed – Random seed for reproducibility.
verbose – Log per-configuration progress.
trials – Optional cap on evaluated configurations (random subset).
centrality_mode – Centrality used for feature weighting in the proxy; one of
"eigenvector"or"degree".topology_weight – Blending factor in [0, 1] that controls how much the topological quality term contributes to the final score.
0ignores topology;1ignores the proxy F1.
- Returns:
A 3-tuple of (best_graph, best_params, results_df).
- Raises:
RuntimeError – If every configuration fails.