Multiscale Quantization for Fast Similarity Search

    公开(公告)号:US20230123941A1

    公开(公告)日:2023-04-20

    申请号:US18081376

    申请日:2022-12-14

    申请人: Google LLC

    IPC分类号: G06F16/33 G06F16/31 G06N20/00

    摘要: The present disclosure provides systems and methods that include or otherwise leverage use of a multiscale quantization model that is configured to provide a quantized dataset. In particular, the multiscale quantization model can receive and perform vector quantization of a first dataset. The multiscale quantization model can generate a residual dataset based at least in part on a result of the vector quantization. The multiscale quantization model can apply a rotation matrix to the residual dataset to generate a rotated residual dataset that includes a plurality of rotated residuals. The multiscale quantization model can perform reparameterization of each rotated residual in the rotated residual dataset into a direction component and a scale component. The multiscale quantization model can perform product quantization of the direction components of the plurality of rotated residuals, and perform scalar quantization of the scale components of the plurality of rotated residuals.

    ENTERPRISE KNOWLEDGE ASSISTANT WITH PERMISSIONS-AWARE AUTOMATED RESPONSES

    公开(公告)号:US20230103076A1

    公开(公告)日:2023-03-30

    申请号:US17489787

    申请日:2021-09-30

    摘要: Methods and apparatuses for providing a real-time enterprise knowledge assistant that automatically responds to user comments and questions via a graphical user interface are described. The enterprise knowledge assistant may display automated responses to questions provided by users within a persistent chat channel (or other communications channel). The information displayed or referenced (e.g., via a linked electronic document) within an automated response to a user's factual question may be determined based on access rights to linked documents and the number of electronic interactions between users, such as the number of times that users co-edited or collaborated on documents (e.g., programming code). Upon detection that at least a portion of a user's message within a chat channel has been classified as a factual question, the enterprise knowledge assistant may access question and answer pairings stored within a frequently asked questions database and display an authorized answer.

    DNA alignment using a hierarchical inverted index table

    公开(公告)号:US11594301B2

    公开(公告)日:2023-02-28

    申请号:US15331239

    申请日:2016-10-21

    摘要: System and method for constructing a hierarchical index table usable for matching a search sequence to reference data. The index table may be constructed to contain entries associated with an exhaustive list of all subsequences of a given length, wherein each entry contains the number and locations of matches of each subsequence in the reference data. The hierarchical index table may be constructed in an iterative manner, wherein entries for each lengthened subsequence are selectively and iteratively constructed based on the number of matches being greater than each of a set of respective thresholds. The hierarchical index table may be used to search for matches between a search sequence and reference data, and to perform misfit identification and characterization upon each respective candidate match.

    Database generation from natural language text documents

    公开(公告)号:US11580150B1

    公开(公告)日:2023-02-14

    申请号:US17877321

    申请日:2022-07-29

    申请人: Dsilo, Inc.

    摘要: Some embodiments may perform operations of a process that includes obtaining a natural language text document and use a machine learning model to generate a set of attributes based on a set of machine-learning-model-generated classifications in the document. The process may include performing hierarchical data extraction operations to populate the attributes, where different machine learning models may be used in sequence. The process may include using a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model augmented with a pooling operation to determine a BERT output via a multi-channel transformer model to generate vectors on a per-sentence level or other per-text-section level. The process may include using a finer-grain model to extract quantitative or categorical values of interest, where the context of the per-sentence level may be retained for the finer-grain model.

    Data analytics systems and methods
    10.
    发明授权

    公开(公告)号:US11580149B2

    公开(公告)日:2023-02-14

    申请号:US17347271

    申请日:2021-06-14

    摘要: Data analytics systems and methods are disclosed herein. A parser can parse reference data from various data sources to store in a data structure. An uploader can receive study data designated by a researcher and store the study data in the data structure. A matcher can compare analyte nameset data in the study data with analyte nameset data from the reference data to generate one or more links each correlating an instance of an analyte in the study data with an instance of that analyte in the reference data. Library overlays each include one or more modules to access reference data to generate organized associations of reference data. A calculation engine can receive a selection of one or more library overlay(s) and manipulate the reference data and study data according to the organized associations of the selected library overlay(s) to generate configured data stored in a collection of data caches for presentation to a researcher via a user interface.