bioneuralnet.clustering.correlated_pagerank

Correlated PageRank Clustering.

This module implements a personalized PageRank algorithm combined with a phenotype-aware sweep cut to detect significant subgraphs.

References

Abdel-Hafiz et al. (2022), “Significant Subgraph Detection in Multi-omics Networks for Disease Pathway Identification,” Frontiers in Big Data.

Algorithm:

The PageRank vector is computed as the stationary distribution of:

\[\begin{split}pr_{\\alpha}(s) = \\alpha s + (1 - \\alpha) pr_{\\alpha}(s) W\end{split}\]
Where:
  • \(\\alpha\): Teleportation (restart) probability.

  • \(s\): Personalization vector (seed weights).

  • \(W\): Transition matrix.

Important

The networkx.pagerank implementation uses a alpha parameter representing the damping factor (link-following probability). Therefore, \(\\text{nx_alpha} = 1 - \\alpha_{theoretical}\).

Notes

Sweep Cut Optimization Nodes are sorted by PageRank-per-degree in descending order. For each prefix set \(S_i\), the algorithm minimizes the Hybrid Conductance:

\[\begin{split}\\Phi_{hybrid} = k_P \\Phi + (1 - k_P) \\rho\end{split}\]
Where:
  • \(\\Phi\): Standard conductance (\(cut / \min(vol(S), vol(V \setminus S))\)).

  • \(\\rho\): Negative absolute Pearson correlation (\(-|\\rho|\)).

  • \(k_P\): Trade-off weight (Default: ~0.5).

Personalization Vector (Seed Weighting) Teleportation probabilities for seeds are weighted by their marginal contribution to correlation:

\[\begin{split}\\alpha_i = \\frac{\\rho_i}{\\max(\\rho_{seeds})} \\cdot \\alpha_{max}\end{split}\]

Where \(\\rho_i = |\\rho(S)| - |\\rho(S \setminus \{i\})|\). Values where \(\\rho_i < 0\) are clamped to 0.

Functions

get_logger(name)

Retrieves a global logger configured to write to 'bioneuralnet.log'.

pearsonr(x, y, *[, alternative, method, axis])

Pearson correlation coefficient and p-value for testing non-correlation.

Classes

CorrelatedPageRank(graph, omics_data, ...[, ...])

Correlated PageRank clustering on a multi-omics network.

PCA([n_components, copy, whiten, ...])

Principal component analysis (PCA).

StandardScaler(*[, copy, with_mean, with_std])

Standardize features by removing the mean and scaling to unit variance.

class bioneuralnet.clustering.correlated_pagerank.CorrelatedPageRank(graph: Graph, omics_data: DataFrame, phenotype_data: DataFrame | Series, teleport_prob: float = 0.1, k_P: float = 0.5, max_iter: int = 100, tol: float = 1e-06, min_cluster: int = 2, seed: int | None = None)[source]

Bases: object

Correlated PageRank clustering on a multi-omics network.

Parameters:
  • graph (nx.Graph) – Weighted undirected NetworkX graph.

  • omics_data (pd.DataFrame) – Omics matrix (n_samples x n_features), columns = node ids.

  • phenotype_data (Union[pd.DataFrame, pd.Series]) – Phenotype vector aligned with rows of omics_data.

  • teleport_prob (float) – Teleportation probability (alpha). Default 0.10.

  • k_P (float) – Weight on conductance in combined objective (Eq. 5).

  • max_iter (int) – Max iterations for PageRank power iteration.

  • tol (float) – Convergence tolerance for PageRank.

  • min_cluster (int) – Minimum cluster size for sweep cut consideration.

  • seed (Optional[int]) – Random seed for reproducibility.

generate_weighted_personalization(nodes: List[Any], alpha_max: float | None = None) Dict[Any, float][source]

Build personalization vector based on each node’s correlation contribution.

Parameters:
  • nodes (List[Any]) – Seed node list.

  • alpha_max (Optional[float]) – Maximum teleportation weight.

Returns:

Personalization mapping {node: weight}.

Return type:

Dict[Any, float]

phen_omics_corr(nodes: List[Any]) Tuple[float, float][source]

Compute Pearson(PC1(omics[:, nodes]), phenotype).

Parameters:

nodes (List[Any]) – List of node identifiers.

Returns:

(correlation, p_value). Returns (0.0, 1.0) on failure.

Return type:

Tuple[float, float]

run(seed_nodes: List[Any]) Dict[str, Any][source]

Execute Correlated PageRank clustering.

Parameters:

seed_nodes (List[Any]) – Nodes to use as the teleport set.

Returns:

Cluster performance and node list.

Return type:

Dict

sweep_cut(pr_scores: Dict[Any, float]) Dict[str, Any][source]

Identify the best cluster via sweep cut on PageRank scores.

Parameters:

pr_scores (Dict[Any, float]) – Mapping of nodes to PageRank scores.

Returns:

Best cluster details including nodes, conductance, and composite score.

Return type:

Dict