Data for PyFLANK, a Graph Neural Network Based Null Distribution Inference Model for FST Outlier Detection

Themes: Conversion, Feedstock Production

Keywords: AI/ML, Genetics, Genomics, Software

Citation

Zhang, Z., Jia, W., Gomes Viana, J.P., Hsieh, P., Yoshikuni, Y., Hudson, M. April 6, 2026. Data for: “PyFLANK, a Graph Neural Network Based Null Distribution Inference Model for FST Outlier Detection.” GitHub.

Overview

Type II error rate of different FST outlier detectors. pyFLANK demonstrates a Type II error rate comparable to that of other detection methods.

pyFLANK is an open-source and automated Python implementation which detects FST outliers using a null distribution inferred from quasi-independent loci inspired by the R package OutFLANK(https://doi.org/10.1086/682949). Our tool integrates three approaches to identify loci obeying a null distribution: graph neural network (GNN) inference, linkage disequilibrium (LD)-based inference, and user-defined input. Because pyFLANK uses GNN-based inference of quasi-independent loci, it yields a more accurate null model with less need for user parameter input.

FST calculation is based on Weir and Cockerham (1984).

Key Features

  1. Graph-based representation of local dependency context of loci and their dependence structure,
  2. GNN-based null distribution inference,
  3. Compatible with standard FST-based workflows,
  4. Designed to complement, not replace, LD pruning and clumping.

Data

GitHub: Software and necessary datasets

Related Publications