bioneuralnet.clustering

Network Clustering and Subgraph Detection.

This module implements hybrid algorithms for identifying phenotype-associated subgraphs in multi-omics networks. It combines global modularity optimization with local random-walk refinement, weighted by phenotypic correlation.

Classes:
HybridLouvain: The primary pipeline. Iteratively alternates between global partitioning

(Louvain) and local refinement (PageRank) to find the most significant subgraph associated with a phenotype.

CorrelatedLouvain: Extends standard Louvain by optimizing a hybrid objective:

Q_hybrid = k_L * Modularity + (1 - k_L) * Correlation.

CorrelatedPageRank: Performs a biased random walk (PageRank) followed by a sweep cut to

minimize a hybrid conductance objective: Phi_hybrid = k_P * Conductance + (1 - k_P) * Correlation.

Louvain: Standard Louvain community detection (based on modularity maximization).

Serves as the base class and baseline method.

Classes

CorrelatedLouvain(G, B, Y[, k_L, weight, ...])

Correlated Louvain community detection.

CorrelatedPageRank(graph, omics_data, ...[, ...])

Correlated PageRank clustering on a multi-omics network.

HybridLouvain(G, B, Y[, k_L, teleport_prob, ...])

Hybrid Louvain-PageRank for significant subgraph detection.

Louvain(G[, weight, max_passes, min_delta, seed])

Standard Louvain community detection.

class bioneuralnet.clustering.CorrelatedLouvain(G: Graph, B: DataFrame, Y: Series | DataFrame, k_L: float = 0.2, weight: str = 'weight', max_passes: int = 50, min_delta: float = 1e-06, seed: int | None = None)[source]

Bases: Louvain

Correlated Louvain community detection.

Inherits from Louvain.

Parameters:
  • G (nx.Graph) – The input graph for community detection.

  • B (pd.DataFrame) – Omics data (n_samples x n_features). Column names must match nodes.

  • Y (Union[pd.Series, pd.DataFrame]) – Phenotype vector aligned with rows of B.

  • k_L (float) – Weight on modularity in combined objective (Eq. 9).

  • weight (str) – Edge attribute name for weights.

  • max_passes (int) – Maximum number of passes for Phase 1 optimization.

  • min_delta (float) – Convergence tolerance for objective gain.

  • seed (Optional[int]) – Random seed for reproducibility.

property communities: Dict[int, List[Any]]

Retrieves the communities grouped by community ID.

Convenient for iterating over sets of nodes belonging to the same community.

Returns:

A dictionary mapping community IDs to lists of nodes.

Return type:

Dict[int, List[Any]]

get_combined_quality() float[source]

Access the calculated combined quality score.

Returns:

The Q* score.

Return type:

float

get_top_communities(n: int = 1) List[Tuple[int, float, List[Any]]][source]

Retrieve the top communities based on absolute correlation.

Parameters:

n (int) – Number of top communities to return.

Returns:

Community data sorted by rho .

Return type:

List[Tuple[int, float, List[Any]]]

property history: List[Dict[str, Any]]

Retrieves the history of the algorithm’s execution levels.

Provides insight into the convergence process and reduction of graph size.

Returns:

A list of dictionaries containing stats for each level.

Return type:

List[Dict[str, Any]]

property modularity: float

Retrieves the final modularity score of the computed partition.

Requires that the run() method has been executed previously.

Returns:

The modularity score.

Return type:

float

property partition: Dict[Any, int]

Retrieves the final partition of the graph.

Requires that the run() method has been executed previously.

Returns:

A dictionary mapping nodes to community IDs.

Return type:

Dict[Any, int]

run() Dict[Any, int][source]

Execute the Correlated Louvain algorithm.

Returns:

Mapping of original nodes to community IDs.

Return type:

Dict[Any, int]

class bioneuralnet.clustering.CorrelatedPageRank(graph: Graph, omics_data: DataFrame, phenotype_data: DataFrame | Series, teleport_prob: float = 0.1, k_P: float = 0.5, max_iter: int = 100, tol: float = 1e-06, min_cluster: int = 2, seed: int | None = None)[source]

Bases: object

Correlated PageRank clustering on a multi-omics network.

Parameters:
  • graph (nx.Graph) – Weighted undirected NetworkX graph.

  • omics_data (pd.DataFrame) – Omics matrix (n_samples x n_features), columns = node ids.

  • phenotype_data (Union[pd.DataFrame, pd.Series]) – Phenotype vector aligned with rows of omics_data.

  • teleport_prob (float) – Teleportation probability (alpha). Default 0.10.

  • k_P (float) – Weight on conductance in combined objective (Eq. 5).

  • max_iter (int) – Max iterations for PageRank power iteration.

  • tol (float) – Convergence tolerance for PageRank.

  • min_cluster (int) – Minimum cluster size for sweep cut consideration.

  • seed (Optional[int]) – Random seed for reproducibility.

generate_weighted_personalization(nodes: List[Any], alpha_max: float | None = None) Dict[Any, float][source]

Build personalization vector based on each node’s correlation contribution.

Parameters:
  • nodes (List[Any]) – Seed node list.

  • alpha_max (Optional[float]) – Maximum teleportation weight.

Returns:

Personalization mapping {node: weight}.

Return type:

Dict[Any, float]

phen_omics_corr(nodes: List[Any]) Tuple[float, float][source]

Compute Pearson(PC1(omics[:, nodes]), phenotype).

Parameters:

nodes (List[Any]) – List of node identifiers.

Returns:

(correlation, p_value). Returns (0.0, 1.0) on failure.

Return type:

Tuple[float, float]

run(seed_nodes: List[Any]) Dict[str, Any][source]

Execute Correlated PageRank clustering.

Parameters:

seed_nodes (List[Any]) – Nodes to use as the teleport set.

Returns:

Cluster performance and node list.

Return type:

Dict

sweep_cut(pr_scores: Dict[Any, float]) Dict[str, Any][source]

Identify the best cluster via sweep cut on PageRank scores.

Parameters:

pr_scores (Dict[Any, float]) – Mapping of nodes to PageRank scores.

Returns:

Best cluster details including nodes, conductance, and composite score.

Return type:

Dict

class bioneuralnet.clustering.HybridLouvain(G: Graph | DataFrame, B: DataFrame, Y: DataFrame | Series, k_L: float = 0.8, teleport_prob: float = 0.05, k_P: float = 0.7, max_iter: int = 10, min_nodes: int = 3, weight: str = 'weight', seed: int | None = None)[source]

Bases: object

Hybrid Louvain-PageRank for significant subgraph detection.

Iteratively refines a multi-omics network by alternating:

  1. Correlated Louvain to find the most phenotype-associated community

  2. Correlated PageRank to refine that community via sweep cut

The graph shrinks each iteration. The best subgraph by rho is tracked across all iterations and returned.

Parameters:
  • G (Union[nx.Graph, pd.DataFrame]) – Weighted undirected graph or adjacency matrix DataFrame.

  • B (pd.DataFrame) – Omics data (n_samples x n_features).

  • Y (Union[pd.DataFrame, pd.Series]) – Phenotype vector.

  • k_L (float) – Weight on modularity for Correlated Louvain).

  • teleport_prob (float) – Teleportation probability for PageRank (alpha).

  • k_P (float) – Weight on conductance for PageRank sweep cut.

  • max_iter (int) – Maximum Hybrid iterations.

  • min_nodes (int) – Stop if graph shrinks below this size.

  • weight (str) – Edge attribute name for weights.

  • seed (Optional[int]) – Random seed.

property best_subgraph: Tuple[List[Any], float, int]

Retrieves the nodes and performance metrics of the best subgraph found.

Returns:

(nodes, rho , iteration_index).

Return type:

Tuple[List[Any], float, int]

property iterations: List[Dict[str, Any]]

Provides access to per-iteration details from the most recent run.

Returns:

A list of result dictionaries for each iteration.

Return type:

List[Dict[str, Any]]

run(as_dfs: bool = False) Dict[str, Any] | List[DataFrame][source]

Execute the Hybrid Louvain-PageRank algorithm.

Returns:

  • best_nodes: nodes of the highest rho subgraph

  • best_correlation: float

  • best_iteration: int

  • iterations: full per-iteration metadata

  • all_subgraphs: {iteration_index: [nodes]}

Return type:

Dict

class bioneuralnet.clustering.Louvain(G: Graph, weight: str = 'weight', max_passes: int = 100, min_delta: float = 1e-10, seed: int | None = None)[source]

Bases: object

Standard Louvain community detection.

This class encapsulates the multi-phase optimization algorithm for detecting communities in weighted graphs.

property communities: Dict[int, List[Any]]

Retrieves the communities grouped by community ID.

Convenient for iterating over sets of nodes belonging to the same community.

Returns:

A dictionary mapping community IDs to lists of nodes.

Return type:

Dict[int, List[Any]]

property history: List[Dict[str, Any]]

Retrieves the history of the algorithm’s execution levels.

Provides insight into the convergence process and reduction of graph size.

Returns:

A list of dictionaries containing stats for each level.

Return type:

List[Dict[str, Any]]

property modularity: float

Retrieves the final modularity score of the computed partition.

Requires that the run() method has been executed previously.

Returns:

The modularity score.

Return type:

float

property partition: Dict[Any, int]

Retrieves the final partition of the graph.

Requires that the run() method has been executed previously.

Returns:

A dictionary mapping nodes to community IDs.

Return type:

Dict[Any, int]

run() Dict[Any, int][source]

Executes the full Louvain algorithm by alternating between local optimization and graph aggregation.

Loops until the modularity converges or the graph cannot be aggregated further.

Returns:

A dictionary mapping original node identifiers to their final community IDs.

Return type:

Dict[Any, int]

Modules

correlated_louvain

Correlated Louvain Community Detection.

correlated_pagerank

Correlated PageRank Clustering.

hybrid_louvain

Hybrid Louvain-PageRank - Significant Subgraph Detection.

louvain

Standard Louvain Method for Community Detection - NumPy Implementation.