Boosting the power of next-generation sequencing analysis with deep learning
We are now living in an era of rapid technological revolution. On one hand, due to the breakthrough of deep learning (DL) recently, Artificial Intelligence (AI) becomes practical to solve complicated real-world problems and even exceeded humans in several tasks, such as playing GO. On the other hand, next-generation sequencing (NGS) technologies have achieved extraordinary progress, and there is an explosive increase of the NGS data in both scale and sample diversity.
To analyze NGS data, a technical barrier remains, as systematic biases and batch effects in NGS data usually confound real biological signals and lead to false-positive results. Currently, biases and batch effects are characterized empirically and separately for different NGS data from different platforms, which result in quite a lot efforts and experiences are required. In this study, we propose a general solution to characterize and correct systematic biases for NGS data in an artificial intelligent way using deep learning. Based on curated NGS data pairs, we try to learn features associated with potential biases automatically and correct them through a mapping to less biased data signals.
In addition, we proposed a framework to reconstruct high read-depth whole genome sequencing (WGS) signals from low-depth WGS data, which contributes to enhancing the robustness of using low-depth WGS data in downstream analysis, and consequently providing a solution to reduce the cost of a large scale high-depth WGS study.