bioneuralnet.clustering¶
Network Clustering and Subgraph Detection.
This module implements hybrid algorithms for identifying phenotype-associated subgraphs in multi-omics networks. It combines global modularity optimization with local random-walk refinement, weighted by phenotypic correlation.
- Classes:
- HybridLouvain: The primary pipeline. Iteratively alternates between global partitioning
(Louvain) and local refinement (PageRank) to find the most significant subgraph associated with a phenotype.
- CorrelatedLouvain: Extends standard Louvain by optimizing a hybrid objective:
Q_hybrid = k_L * Modularity + (1 - k_L) * Correlation.
- CorrelatedPageRank: Performs a biased random walk (PageRank) followed by a sweep cut to
minimize a hybrid conductance objective: Phi_hybrid = k_P * Conductance + (1 - k_P) * Correlation.
- Louvain: Standard Louvain community detection (based on modularity maximization).
Serves as the base class and baseline method.
Classes
|
Correlated Louvain community detection. |
|
Correlated PageRank clustering on a multi-omics network. |
|
Hybrid Louvain-PageRank for significant subgraph detection. |
|
Standard Louvain community detection. |
Bases:
LouvainCorrelated Louvain community detection.
Inherits from
Louvain.- Parameters:
G (nx.Graph) – The input graph for community detection.
B (pd.DataFrame) – Omics data (n_samples x n_features). Column names must match nodes.
Y (Union[pd.Series, pd.DataFrame]) – Phenotype vector aligned with rows of B.
k_L (float) – Weight on modularity in combined objective (Eq. 9).
weight (str) – Edge attribute name for weights.
max_passes (int) – Maximum number of passes for Phase 1 optimization.
min_delta (float) – Convergence tolerance for objective gain.
seed (Optional[int]) – Random seed for reproducibility.
Retrieves the communities grouped by community ID.
Convenient for iterating over sets of nodes belonging to the same community.
- Returns:
A dictionary mapping community IDs to lists of nodes.
- Return type:
Dict[int, List[Any]]
Access the calculated combined quality score.
- Returns:
The Q* score.
- Return type:
Retrieve the top communities based on absolute correlation.
Retrieves the history of the algorithm’s execution levels.
Provides insight into the convergence process and reduction of graph size.
- Returns:
A list of dictionaries containing stats for each level.
- Return type:
List[Dict[str, Any]]
Retrieves the final modularity score of the computed partition.
Requires that the run() method has been executed previously.
- Returns:
The modularity score.
- Return type:
Retrieves the final partition of the graph.
Requires that the run() method has been executed previously.
- Returns:
A dictionary mapping nodes to community IDs.
- Return type:
Dict[Any, int]
Execute the Correlated Louvain algorithm.
- Returns:
Mapping of original nodes to community IDs.
- Return type:
Dict[Any, int]
Bases:
objectCorrelated PageRank clustering on a multi-omics network.
- Parameters:
graph (nx.Graph) – Weighted undirected NetworkX graph.
omics_data (pd.DataFrame) – Omics matrix (n_samples x n_features), columns = node ids.
phenotype_data (Union[pd.DataFrame, pd.Series]) – Phenotype vector aligned with rows of omics_data.
teleport_prob (float) – Teleportation probability (alpha). Default 0.10.
k_P (float) – Weight on conductance in combined objective (Eq. 5).
max_iter (int) – Max iterations for PageRank power iteration.
tol (float) – Convergence tolerance for PageRank.
min_cluster (int) – Minimum cluster size for sweep cut consideration.
seed (Optional[int]) – Random seed for reproducibility.
Build personalization vector based on each node’s correlation contribution.
Compute Pearson(PC1(omics[:, nodes]), phenotype).
Execute Correlated PageRank clustering.
- Parameters:
seed_nodes (List[Any]) – Nodes to use as the teleport set.
- Returns:
Cluster performance and node list.
- Return type:
Dict
Identify the best cluster via sweep cut on PageRank scores.
- Parameters:
pr_scores (Dict[Any, float]) – Mapping of nodes to PageRank scores.
- Returns:
Best cluster details including nodes, conductance, and composite score.
- Return type:
Dict
- class bioneuralnet.clustering.HybridLouvain(G: Graph | DataFrame, B: DataFrame, Y: DataFrame | Series, k_L: float = 0.8, teleport_prob: float = 0.05, k_P: float = 0.7, max_iter: int = 10, min_nodes: int = 3, weight: str = 'weight', seed: int | None = None)[source]¶
Bases:
objectHybrid Louvain-PageRank for significant subgraph detection.
Iteratively refines a multi-omics network by alternating:
Correlated Louvain to find the most phenotype-associated community
Correlated PageRank to refine that community via sweep cut
The graph shrinks each iteration. The best subgraph by rho is tracked across all iterations and returned.
- Parameters:
G (Union[nx.Graph, pd.DataFrame]) – Weighted undirected graph or adjacency matrix DataFrame.
B (pd.DataFrame) – Omics data (n_samples x n_features).
Y (Union[pd.DataFrame, pd.Series]) – Phenotype vector.
k_L (float) – Weight on modularity for Correlated Louvain).
teleport_prob (float) – Teleportation probability for PageRank (alpha).
k_P (float) – Weight on conductance for PageRank sweep cut.
max_iter (int) – Maximum Hybrid iterations.
min_nodes (int) – Stop if graph shrinks below this size.
weight (str) – Edge attribute name for weights.
seed (Optional[int]) – Random seed.
- property best_subgraph: Tuple[List[Any], float, int]¶
Retrieves the nodes and performance metrics of the best subgraph found.
- property iterations: List[Dict[str, Any]]¶
Provides access to per-iteration details from the most recent run.
- Returns:
A list of result dictionaries for each iteration.
- Return type:
List[Dict[str, Any]]
- run(as_dfs: bool = False) Dict[str, Any] | List[DataFrame][source]¶
Execute the Hybrid Louvain-PageRank algorithm.
- Returns:
best_nodes: nodes of the highest rho subgraph
best_correlation: float
best_iteration: int
iterations: full per-iteration metadata
all_subgraphs: {iteration_index: [nodes]}
- Return type:
Dict
- class bioneuralnet.clustering.Louvain(G: Graph, weight: str = 'weight', max_passes: int = 100, min_delta: float = 1e-10, seed: int | None = None)[source]¶
Bases:
objectStandard Louvain community detection.
This class encapsulates the multi-phase optimization algorithm for detecting communities in weighted graphs.
- property communities: Dict[int, List[Any]]¶
Retrieves the communities grouped by community ID.
Convenient for iterating over sets of nodes belonging to the same community.
- Returns:
A dictionary mapping community IDs to lists of nodes.
- Return type:
Dict[int, List[Any]]
- property history: List[Dict[str, Any]]¶
Retrieves the history of the algorithm’s execution levels.
Provides insight into the convergence process and reduction of graph size.
- Returns:
A list of dictionaries containing stats for each level.
- Return type:
List[Dict[str, Any]]
- property modularity: float¶
Retrieves the final modularity score of the computed partition.
Requires that the run() method has been executed previously.
- Returns:
The modularity score.
- Return type:
- property partition: Dict[Any, int]¶
Retrieves the final partition of the graph.
Requires that the run() method has been executed previously.
- Returns:
A dictionary mapping nodes to community IDs.
- Return type:
Dict[Any, int]
- run() Dict[Any, int][source]¶
Executes the full Louvain algorithm by alternating between local optimization and graph aggregation.
Loops until the modularity converges or the graph cannot be aggregated further.
- Returns:
A dictionary mapping original node identifiers to their final community IDs.
- Return type:
Dict[Any, int]
Modules
Correlated Louvain Community Detection. |
|
Correlated PageRank Clustering. |
|
Hybrid Louvain-PageRank - Significant Subgraph Detection. |
|
Standard Louvain Method for Community Detection - NumPy Implementation. |