Our paper "Deep Unsupervised Identification of Selected Genes and SNPs in Pool-Seq Data from Evolving Populations", which was a joint work of Julia Siekiera and Stefan Kramer has been accepted as poster presentation at RECOMB 2022-Genetics.
Abstract:
The exploration of selected single nucleotide polymorphisms (SNPs) to identify genetic diversity between populations under selection pressure is a fundamental task in population genetics. As underlying sequence reads and their alignment are error-prone and univariate statistical solutions like the Cochran-Mantel-Haenszel test (CMH) only take individual positions of the genome into account, the identification of selected SNPs remains a challenging process. Deep learning models, by contrast, are able to consider large input areas to integrate the decision of individual positions in the context of (hidden) neighboring patterns. We suggest an unsupervised deep learning pipeline to detect selected SNPs or genes between different types of population pairs by the application of both active learning and explainable AI methods. To provide a solution for various experimental designs, the effectiveness of direct genomic population comparison and the integration of drift simulation is investigated. In addition, we demonstrate how the extension of an autoencoder architecture can support the mapping of the genotype into a hidden representation upon which optimized selection detection is possible. The performance of the proposed method configurations is investigated on different simulated sequencing pools of individuals (Pool-Seq)datasets of Drosophila melanogaster and compared to an univariate baseline. The evaluation demonstrates that deep neural networks offer the potential to recognize hidden patterns in the allele frequencies of evolved populations and to enhance the information given by univariate statistics.