Quality score compression
    71.
    发明授权

    公开(公告)号:US11776663B2

    公开(公告)日:2023-10-03

    申请号:US17974978

    申请日:2022-10-27

    Applicant: Illumina, Inc.

    CPC classification number: G16B50/50 H03M7/3071 H03M7/6011

    Abstract: Methods, systems, and computer programs for compressing nucleic acid sequence data. A method can include obtaining nucleic acid sequence data representing: (i) a read sequence, and (ii) a plurality of quality scores, determining whether the read sequence includes at least one “N” base, based on a determination that the read sequence includes at least one “N” base, generating, by one or more computers, a first encoding data set by using a first encoding process to encode each set of four quality scores of the read sequence into a single byte of memory, and using a second encoding process to encode the first encoded data set, thereby compressing the data to be compressed.

    Data compression system and method of using

    公开(公告)号:US11764806B2

    公开(公告)日:2023-09-19

    申请号:US17467282

    申请日:2021-09-06

    Abstract: A system includes a non-transitory computer readable medium configured to store instructions thereon; and a processor connected to the non-transitory computer readable medium. The processor is configured to execute the instructions for generating a mask based on received data from a sensor, wherein the mask includes a plurality of importance values, and each region of the received data is designated a corresponding importance value of the plurality of importance values. The processor is configured to execute the instructions for encoding the received data based on the mask; and transmitting the encoded data to a decoder for defining reconstructed data. The processor is configured to execute the instructions for computing a loss based on the reconstructed data, the received data and the mask. The processor is configured to execute the instructions for providing training to an encoder for encoding the received data based on the computed loss.

    FILE COMPRESSION USING SEQUENCE SPLITS AND SEQUENCE ALIGNMENT

    公开(公告)号:US20230229632A1

    公开(公告)日:2023-07-20

    申请号:US17658928

    申请日:2022-04-12

    CPC classification number: G06F16/1744 H03M7/6011

    Abstract: Compressing files is disclosed. An input file to be compressed is first aligned. Aligning the file includes splitting the file into sequences that can be aligned. When splitting the file into sequences or when performing subsequent recursive splitting, the splitting is based on a longest sequence match. The result is a compression matrix, where each row of the matrix corresponds to part of the file. A consensus sequence is determined from the compression matrix. Using the consensus sequence, pointer pairs are generated. Each pointer pair identifies a subsequence of the consensus matrix. The compressed file includes the pointer pairs and the consensus sequence.

    SYSTEM AND METHOD FOR DATA COMPACTION AND ENCRYPTION OF ANONYMIZED DATASETS

    公开(公告)号:US20230195311A1

    公开(公告)日:2023-06-22

    申请号:US18178556

    申请日:2023-03-06

    Abstract: A system and method for encoding anonymized dataset. A dataset may be pre-processed by dividing into a plurality of sourceblocks at all reasonable sourceblock lengths, and then counting how many times each sourceblock occurs in the dataset, resulting in a tally record of tokens and their count value. This tally record may then be anonymized and transmitted as an anonymized tally record to a data deconstruction engine which combined with a library manager creates a codebook and performs optimization techniques on the codebook. The received anonymized tally record may be parsed into individual tokens by identifying the tokens with the highest count value. The tokens may then be sent, in descending order of count value, to the library manger where each token may be assigned a codeword. Then a half-backed codebook is created using the tokens and each token's unique codeword, before sending the half-backed codebook to a system user.

    Domain adaptation
    79.
    发明授权

    公开(公告)号:US11664820B2

    公开(公告)日:2023-05-30

    申请号:US16891697

    申请日:2020-06-03

    CPC classification number: H03M7/6064 G06N3/088 H03M7/6011

    Abstract: An apparatus, method and computer program is described comprising: initialising weights of a target encoder based on a source encoder; initialising weights of a target discriminator associated with the target encoder such that the target discriminator is initialised to match a source discriminator associated with the source encoder; applying some of a target data set to the target encoder to generate target encoder outputs; applying the target encoder outputs to the target discriminator to generate a first local loss function output; training the target encoder to seek to increase the first local loss function output; training the target discriminator to seek to decrease the first local loss function output; and synchronising weights of the target discriminator and the source discriminator.

Patent Agency Ranking