METHODS AND SYSTEMS FOR PREDICTING DNA ACCESSIBILITY IN THE PAN-CANCER GENOME

    公开(公告)号:US20190392288A1

    公开(公告)日:2019-12-26

    申请号:US16559592

    申请日:2019-09-03

    摘要: Techniques are provided for predicting DNA accessibility. DNase-seq data files and RNA-seq data files for a plurality of cell types are paired by assigning DNase-seq data files to RNA-seq data files that are at least within a same biotype. A neural network is configured to be trained using batches of the paired data files, where configuring the neural network comprises configuring convolutional layers to process a first input comprising DNA sequence data from a paired data file to generate a convolved output, and fully connected layers following the convolutional layers to concatenate the convolved output with a second input comprising gene expression levels derived from RNA-seq data from the paired data file and process the concatenation to generate a DNA accessibility prediction output. The trained neural network is used to predict DNA accessibility in a genomic sample input comprising RNA-seq data and whole genome sequencing for a new cell type.