bioneuralnet.clustering.correlated_pagerank¶
Correlated PageRank Clustering.
This module implements a personalized PageRank algorithm combined with a phenotype-aware sweep cut to detect significant subgraphs.
References
Abdel-Hafiz et al. (2022), “Significant Subgraph Detection in Multi-omics Networks for Disease Pathway Identification,” Frontiers in Big Data.
- Algorithm:
The PageRank vector is computed as the stationary distribution of:
\[\begin{split}pr_{\\alpha}(s) = \\alpha s + (1 - \\alpha) pr_{\\alpha}(s) W\end{split}\]- Where:
\(\\alpha\): Teleportation (restart) probability.
\(s\): Personalization vector (seed weights).
\(W\): Transition matrix.
Important
The networkx.pagerank implementation uses a alpha parameter representing the damping factor (link-following probability). Therefore, \(\\text{nx_alpha} = 1 - \\alpha_{theoretical}\).
Notes
Sweep Cut Optimization Nodes are sorted by PageRank-per-degree in descending order. For each prefix set \(S_i\), the algorithm minimizes the Hybrid Conductance:
- Where:
\(\\Phi\): Standard conductance (\(cut / \min(vol(S), vol(V \setminus S))\)).
\(\\rho\): Negative absolute Pearson correlation (\(-|\\rho|\)).
\(k_P\): Trade-off weight (Default: ~0.5).
Personalization Vector (Seed Weighting) Teleportation probabilities for seeds are weighted by their marginal contribution to correlation:
Where \(\\rho_i = |\\rho(S)| - |\\rho(S \setminus \{i\})|\). Values where \(\\rho_i < 0\) are clamped to 0.
Functions
|
Retrieves a global logger configured to write to 'bioneuralnet.log'. |
|
Pearson correlation coefficient and p-value for testing non-correlation. |
Classes
|
Correlated PageRank clustering on a multi-omics network. |
|
Principal component analysis (PCA). |
|
Standardize features by removing the mean and scaling to unit variance. |
- class bioneuralnet.clustering.correlated_pagerank.CorrelatedPageRank(graph: Graph, omics_data: DataFrame, phenotype_data: DataFrame | Series, teleport_prob: float = 0.1, k_P: float = 0.5, max_iter: int = 100, tol: float = 1e-06, min_cluster: int = 2, seed: int | None = None)[source]¶
Bases:
objectCorrelated PageRank clustering on a multi-omics network.
- Parameters:
graph (nx.Graph) – Weighted undirected NetworkX graph.
omics_data (pd.DataFrame) – Omics matrix (n_samples x n_features), columns = node ids.
phenotype_data (Union[pd.DataFrame, pd.Series]) – Phenotype vector aligned with rows of omics_data.
teleport_prob (float) – Teleportation probability (alpha). Default 0.10.
k_P (float) – Weight on conductance in combined objective (Eq. 5).
max_iter (int) – Max iterations for PageRank power iteration.
tol (float) – Convergence tolerance for PageRank.
min_cluster (int) – Minimum cluster size for sweep cut consideration.
seed (Optional[int]) – Random seed for reproducibility.
- generate_weighted_personalization(nodes: List[Any], alpha_max: float | None = None) → Dict[Any, float][source]¶
Build personalization vector based on each node’s correlation contribution.
- phen_omics_corr(nodes: List[Any]) → Tuple[float, float][source]¶
Compute Pearson(PC1(omics[:, nodes]), phenotype).
- run(seed_nodes: List[Any]) → Dict[str, Any][source]¶
Execute Correlated PageRank clustering.
- Parameters:
seed_nodes (List[Any]) – Nodes to use as the teleport set.
- Returns:
Cluster performance and node list.
- Return type:
Dict
- sweep_cut(pr_scores: Dict[Any, float]) → Dict[str, Any][source]¶
Identify the best cluster via sweep cut on PageRank scores.
- Parameters:
pr_scores (Dict[Any, float]) – Mapping of nodes to PageRank scores.
- Returns:
Best cluster details including nodes, conductance, and composite score.
- Return type:
Dict