bioneuralnet.clustering.correlated_pagerank¶

Correlated PageRank Clustering.

This module implements a personalized PageRank algorithm combined with a phenotype-aware sweep cut to detect significant subgraphs.

References

Abdel-Hafiz et al. (2022), “Significant Subgraph Detection in Multi-omics Networks for Disease Pathway Identification,” Frontiers in Big Data.

Algorithm:

The PageRank vector is computed as the stationary distribution of:

\[\begin{split}pr_{\\alpha}(s) = \\alpha s + (1 - \\alpha) pr_{\\alpha}(s) W\end{split}\]

Where:

\(\\alpha\): Teleportation (restart) probability.
\(s\): Personalization vector (seed weights).
\(W\): Transition matrix.

Important

The networkx.pagerank implementation uses a alpha parameter representing the damping factor (link-following probability). Therefore, \(\\text{nx_alpha} = 1 - \\alpha_{theoretical}\).

Notes

Sweep Cut Optimization Nodes are sorted by PageRank-per-degree in descending order. For each prefix set \(S_i\), the algorithm minimizes the Hybrid Conductance:

\[\begin{split}\\Phi_{hybrid} = k_P \\Phi + (1 - k_P) \\rho\end{split}\]

Where:

\(\\Phi\): Standard conductance (\(cut / \min(vol(S), vol(V \setminus S))\)).
\(\\rho\): Negative absolute Pearson correlation (\(-|\\rho|\)).
\(k_P\): Trade-off weight (Default: ~0.5).

Personalization Vector (Seed Weighting) Teleportation probabilities for seeds are weighted by their marginal contribution to correlation:

\[\begin{split}\\alpha_i = \\frac{\\rho_i}{\\max(\\rho_{seeds})} \\cdot \\alpha_{max}\end{split}\]

Where \(\\rho_i = |\\rho(S)| - |\\rho(S \setminus \{i\})|\). Values where \(\\rho_i < 0\) are clamped to 0.

Functions

`get_logger`(name)	Retrieves a global logger configured to write to 'bioneuralnet.log'.
`pearsonr`(x, y, *[, alternative, method, axis])	Pearson correlation coefficient and p-value for testing non-correlation.

Classes

`CorrelatedPageRank`(graph, omics_data, ...[, ...])	Correlated PageRank clustering on a multi-omics network.
`PCA`([n_components, copy, whiten, ...])	Principal component analysis (PCA).
`StandardScaler`(*[, copy, with_mean, with_std])	Standardize features by removing the mean and scaling to unit variance.

class bioneuralnet.clustering.correlated_pagerank.CorrelatedPageRank(graph: Graph, omics_data: DataFrame, phenotype_data: DataFrame | Series, teleport_prob: float = 0.1, k_P: float = 0.5, max_iter: int = 100, tol: float = 1e-06, min_cluster: int = 2, seed: int | None = None)[source]¶

Bases: object

Correlated PageRank clustering on a multi-omics network.

Parameters:

graph (nx.Graph) – Weighted undirected NetworkX graph.
omics_data (pd.DataFrame) – Omics matrix (n_samples x n_features), columns = node ids.
phenotype_data (Union[pd.DataFrame, pd.Series]) – Phenotype vector aligned with rows of omics_data.
teleport_prob (float) – Teleportation probability (alpha). Default 0.10.
k_P (float) – Weight on conductance in combined objective (Eq. 5).
max_iter (int) – Max iterations for PageRank power iteration.
tol (float) – Convergence tolerance for PageRank.
min_cluster (int) – Minimum cluster size for sweep cut consideration.
seed (Optional[int]) – Random seed for reproducibility.

generate_weighted_personalization(nodes: List[Any], alpha_max: float | None = None) → Dict[Any, float][source]¶

Build personalization vector based on each node’s correlation contribution.

Parameters:

nodes (List[Any]) – Seed node list.
alpha_max (Optional[float]) – Maximum teleportation weight.

Returns:

Personalization mapping {node: weight}.

Return type:

Dict[Any, float]

phen_omics_corr(nodes: List[Any]) → Tuple[float, float][source]¶

Compute Pearson(PC1(omics[:, nodes]), phenotype).

Parameters:: nodes (List[Any]) – List of node identifiers.
Returns:: (correlation, p_value). Returns (0.0, 1.0) on failure.
Return type:: Tuple[float, float]

run(seed_nodes: List[Any]) → Dict[str, Any][source]¶

Execute Correlated PageRank clustering.

Parameters:: seed_nodes (List[Any]) – Nodes to use as the teleport set.
Returns:: Cluster performance and node list.
Return type:: Dict

sweep_cut(pr_scores: Dict[Any, float]) → Dict[str, Any][source]¶

Identify the best cluster via sweep cut on PageRank scores.

Parameters:: pr_scores (Dict[Any, float]) – Mapping of nodes to PageRank scores.
Returns:: Best cluster details including nodes, conductance, and composite score.
Return type:: Dict