bioneuralnet.clustering¶

Network Clustering and Subgraph Detection.

This module implements hybrid algorithms for identifying phenotype-associated subgraphs in multi-omics networks. It combines global modularity optimization with local random-walk refinement, weighted by phenotypic correlation.

Classes:

HybridLouvain: The primary pipeline. Iteratively alternates between global partitioning: (Louvain) and local refinement (PageRank) to find the most significant subgraph associated with a phenotype.
CorrelatedLouvain: Extends standard Louvain by optimizing a hybrid objective:: Q_hybrid = k_L * Modularity + (1 - k_L) * Correlation.
CorrelatedPageRank: Performs a biased random walk (PageRank) followed by a sweep cut to: minimize a hybrid conductance objective: Phi_hybrid = k_P * Conductance + (1 - k_P) * Correlation.
Louvain: Standard Louvain community detection (based on modularity maximization).: Serves as the base class and baseline method.

Classes

`CorrelatedLouvain`(G, B, Y[, k_L, weight, ...])	Correlated Louvain community detection.
`CorrelatedPageRank`(graph, omics_data, ...[, ...])	Correlated PageRank clustering on a multi-omics network.
`HybridLouvain`(G, B, Y[, k_L, teleport_prob, ...])	Hybrid Louvain-PageRank for significant subgraph detection.
`Louvain`(G[, weight, max_passes, min_delta, seed])	Standard Louvain community detection.

class bioneuralnet.clustering.CorrelatedLouvain(G: Graph, B: DataFrame, Y: Series | DataFrame, k_L: float = 0.2, weight: str = 'weight', max_passes: int = 50, min_delta: float = 1e-06, seed: int | None = None)[source]¶

Bases: Louvain

Correlated Louvain community detection.

Inherits from Louvain.

Parameters:

G (nx.Graph) – The input graph for community detection.
B (pd.DataFrame) – Omics data (n_samples x n_features). Column names must match nodes.
Y (Union[pd.Series, pd.DataFrame]) – Phenotype vector aligned with rows of B.
k_L (float) – Weight on modularity in combined objective (Eq. 9).
weight (str) – Edge attribute name for weights.
max_passes (int) – Maximum number of passes for Phase 1 optimization.
min_delta (float) – Convergence tolerance for objective gain.
seed (Optional[int]) – Random seed for reproducibility.

property communities: Dict[int, List[Any]]¶

Retrieves the communities grouped by community ID.

Convenient for iterating over sets of nodes belonging to the same community.

Returns:: A dictionary mapping community IDs to lists of nodes.
Return type:: Dict[int, List[Any]]

get_combined_quality() → float[source]¶

Access the calculated combined quality score.

Returns:: The Q* score.
Return type:: float

get_top_communities(n: int = 1) → List[Tuple[int, float, List[Any]]][source]¶

Retrieve the top communities based on absolute correlation.

Parameters:: n (int) – Number of top communities to return.
Returns:: Community data sorted by rho .
Return type:: List[Tuple[int, float, List[Any]]]

property history: List[Dict[str, Any]]¶

Retrieves the history of the algorithm’s execution levels.

Provides insight into the convergence process and reduction of graph size.

Returns:: A list of dictionaries containing stats for each level.
Return type:: List[Dict[str, Any]]

property modularity: float¶

Retrieves the final modularity score of the computed partition.

Requires that the run() method has been executed previously.

Returns:: The modularity score.
Return type:: float

property partition: Dict[Any, int]¶

Retrieves the final partition of the graph.

Requires that the run() method has been executed previously.

Returns:: A dictionary mapping nodes to community IDs.
Return type:: Dict[Any, int]

run() → Dict[Any, int][source]¶

Execute the Correlated Louvain algorithm.

Returns:: Mapping of original nodes to community IDs.
Return type:: Dict[Any, int]

class bioneuralnet.clustering.CorrelatedPageRank(graph: Graph, omics_data: DataFrame, phenotype_data: DataFrame | Series, teleport_prob: float = 0.1, k_P: float = 0.5, max_iter: int = 100, tol: float = 1e-06, min_cluster: int = 2, seed: int | None = None)[source]¶

Bases: object

Correlated PageRank clustering on a multi-omics network.

Parameters:

graph (nx.Graph) – Weighted undirected NetworkX graph.
omics_data (pd.DataFrame) – Omics matrix (n_samples x n_features), columns = node ids.
phenotype_data (Union[pd.DataFrame, pd.Series]) – Phenotype vector aligned with rows of omics_data.
teleport_prob (float) – Teleportation probability (alpha). Default 0.10.
k_P (float) – Weight on conductance in combined objective (Eq. 5).
max_iter (int) – Max iterations for PageRank power iteration.
tol (float) – Convergence tolerance for PageRank.
min_cluster (int) – Minimum cluster size for sweep cut consideration.
seed (Optional[int]) – Random seed for reproducibility.

generate_weighted_personalization(nodes: List[Any], alpha_max: float | None = None) → Dict[Any, float][source]¶

Build personalization vector based on each node’s correlation contribution.

Parameters:

nodes (List[Any]) – Seed node list.
alpha_max (Optional[float]) – Maximum teleportation weight.

Returns:

Personalization mapping {node: weight}.

Return type:

Dict[Any, float]

phen_omics_corr(nodes: List[Any]) → Tuple[float, float][source]¶

Compute Pearson(PC1(omics[:, nodes]), phenotype).

Parameters:: nodes (List[Any]) – List of node identifiers.
Returns:: (correlation, p_value). Returns (0.0, 1.0) on failure.
Return type:: Tuple[float, float]

run(seed_nodes: List[Any]) → Dict[str, Any][source]¶

Execute Correlated PageRank clustering.

Parameters:: seed_nodes (List[Any]) – Nodes to use as the teleport set.
Returns:: Cluster performance and node list.
Return type:: Dict

sweep_cut(pr_scores: Dict[Any, float]) → Dict[str, Any][source]¶

Identify the best cluster via sweep cut on PageRank scores.

Parameters:: pr_scores (Dict[Any, float]) – Mapping of nodes to PageRank scores.
Returns:: Best cluster details including nodes, conductance, and composite score.
Return type:: Dict

class bioneuralnet.clustering.HybridLouvain(G: Graph | DataFrame, B: DataFrame, Y: DataFrame | Series, k_L: float = 0.8, teleport_prob: float = 0.05, k_P: float = 0.7, max_iter: int = 10, min_nodes: int = 3, weight: str = 'weight', seed: int | None = None)[source]¶

Bases: object

Hybrid Louvain-PageRank for significant subgraph detection.

Iteratively refines a multi-omics network by alternating:

Correlated Louvain to find the most phenotype-associated community
Correlated PageRank to refine that community via sweep cut

The graph shrinks each iteration. The best subgraph by rho is tracked across all iterations and returned.

Parameters:

G (Union[nx.Graph, pd.DataFrame]) – Weighted undirected graph or adjacency matrix DataFrame.
B (pd.DataFrame) – Omics data (n_samples x n_features).
Y (Union[pd.DataFrame, pd.Series]) – Phenotype vector.
k_L (float) – Weight on modularity for Correlated Louvain).
teleport_prob (float) – Teleportation probability for PageRank (alpha).
k_P (float) – Weight on conductance for PageRank sweep cut.
max_iter (int) – Maximum Hybrid iterations.
min_nodes (int) – Stop if graph shrinks below this size.
weight (str) – Edge attribute name for weights.
seed (Optional[int]) – Random seed.

property best_subgraph: Tuple[List[Any], float, int]¶

Retrieves the nodes and performance metrics of the best subgraph found.

Returns:: (nodes, rho , iteration_index).
Return type:: Tuple[List[Any], float, int]

property iterations: List[Dict[str, Any]]¶

Provides access to per-iteration details from the most recent run.

Returns:: A list of result dictionaries for each iteration.
Return type:: List[Dict[str, Any]]

run(as_dfs: bool = False) → Dict[str, Any] | List[DataFrame][source]¶

Execute the Hybrid Louvain-PageRank algorithm.

Returns:

best_nodes: nodes of the highest rho subgraph
best_correlation: float
best_iteration: int
iterations: full per-iteration metadata
all_subgraphs: {iteration_index: [nodes]}

Return type:

Dict

class bioneuralnet.clustering.Louvain(G: Graph, weight: str = 'weight', max_passes: int = 100, min_delta: float = 1e-10, seed: int | None = None)[source]¶

Bases: object

Standard Louvain community detection.

This class encapsulates the multi-phase optimization algorithm for detecting communities in weighted graphs.

property communities: Dict[int, List[Any]]¶

Retrieves the communities grouped by community ID.

Convenient for iterating over sets of nodes belonging to the same community.

Returns:: A dictionary mapping community IDs to lists of nodes.
Return type:: Dict[int, List[Any]]

property history: List[Dict[str, Any]]¶

Retrieves the history of the algorithm’s execution levels.

Provides insight into the convergence process and reduction of graph size.

Returns:: A list of dictionaries containing stats for each level.
Return type:: List[Dict[str, Any]]

property modularity: float¶

Retrieves the final modularity score of the computed partition.

Requires that the run() method has been executed previously.

Returns:: The modularity score.
Return type:: float

property partition: Dict[Any, int]¶

Retrieves the final partition of the graph.

Requires that the run() method has been executed previously.

Returns:: A dictionary mapping nodes to community IDs.
Return type:: Dict[Any, int]

run() → Dict[Any, int][source]¶

Executes the full Louvain algorithm by alternating between local optimization and graph aggregation.

Loops until the modularity converges or the graph cannot be aggregated further.

Returns:: A dictionary mapping original node identifiers to their final community IDs.
Return type:: Dict[Any, int]

Modules

`correlated_louvain`	Correlated Louvain Community Detection.
`correlated_pagerank`	Correlated PageRank Clustering.
`hybrid_louvain`	Hybrid Louvain-PageRank - Significant Subgraph Detection.
`louvain`	Standard Louvain Method for Community Detection - NumPy Implementation.