Duplication and deletion detection using transformation processing of depth vectors

    公开(公告)号:US09773031B1

    公开(公告)日:2017-09-26

    申请号:US15489473

    申请日:2017-04-17

    IPC分类号: G06F17/30

    摘要: Techniques for accurately identifying duplications and deletions using depth vectors. A depth vector is generated for each of multiple clients based on a set of reads that is received and aligned to a reference data set. A transformation processing of the depth vectors is performed to produce multiple components. Each of the components is assigned an order based on the extent to which it accounts for cross-client differences in the depth vectors. Each of the components includes an intensity, multiple values, and multiple client weights. A subset of the components is identified based on the order. A sparse indicator and positional data for the sparse indicator can be determined from the components in the subset, and one or more clients can be identified as being associated with the components.

    Load balancing and conflict processing in workflow with task dependencies

    公开(公告)号:US09811391B1

    公开(公告)日:2017-11-07

    申请号:US15449579

    申请日:2017-03-03

    摘要: Embodiments in the disclosure are directed to the use of distributed computing to align reads against multiple portions of a reference dataset. Aligned portions of the reference dataset that correspond with an above-threshold alignment score can be assessed for the presence of sparse indicators that can be categorized and used to influence a determination of a state transition likelihood. Various tasks associated with the processing of reads (e.g., alignment, sparse indicator detection, and/or determination of a state transition likelihood) may be able to take advantage of parallel processing and can be distributed among the machines while considering the resource utilization of those machines. Different load-balancing mechanisms can be employed in order to achieve even resource utilization across the machines, and in some cases may involve assessing various processing characteristics that reflect a predicted resource expenditure and/or time profile for each task to be processed by a machine.