Cached updatable top-k index
    3.
    发明授权

    公开(公告)号:US11327980B2

    公开(公告)日:2022-05-10

    申请号:US16854709

    申请日:2020-04-21

    发明人: Issei Yoshida

    摘要: A method is provided that stores, in a second memory, an index structure including, for each given word from among words included in documents, a group of document IDs of documents including the given word. The method stores an index structure subset in a main memory which is faster than secondary memory. The method acquires a keyword and identifies any documents including the keyword. The method finds top-K frequent words among the words included in the identified documents by: identifying, for each given group in descending order of the number of the documents IDs therein, the number of documents IDs of the identified documents in the given group, from the subset when the number of document IDs in the given group is within the range, and from the index structure otherwise; and presenting words of top-K groups with a largest amount of the document IDs identified.

    Word grouping using a plurality of models

    公开(公告)号:US11308274B2

    公开(公告)日:2022-04-19

    申请号:US16415576

    申请日:2019-05-17

    摘要: A computer-implemented method is provided. The method includes acquiring a seed word; calculating a similarity score of each of a plurality of words relative to the seed word for each of a plurality of models to calculate a weighted sum of similarity scores for each of the plurality of words; outputting a plurality of candidate words among the plurality of words; acquiring annotations indicating at least one of preferred words and non-preferred words among the plurality of the candidate words; updating weights of the plurality of models in a manner to cause weighted sums of similarity scores for the preferred words to be relatively larger than the weighted sums of the similarity scores for the non-preferred words, based on the annotations; and grouping the plurality of candidate words output based on the weighted sum of similarity scores calculated with updated weights of the plurality of models.

    ESTIMATING OUTPUT CONFIDENCE FOR BLACK-BOX API

    公开(公告)号:US20210326533A1

    公开(公告)日:2021-10-21

    申请号:US16853420

    申请日:2020-04-20

    摘要: A computer-implemented method is provided for estimating output confidence of a black box Application Programming Interface (API). The method includes generating paraphrases for an input text. The method further includes calculating a distance between the input text and each respective one of the paraphrases. The method also includes sorting the paraphrases in ascending order of the distance. The method additionally includes selecting a top predetermined number of the paraphrases. The method further includes inputting the input text and the selected paraphrases into the API to obtain an output confidence score for each of the input text and the selected paraphrases. The method also includes estimating, by a hardware processor, the output confidence of the input text from a robustness of output scores of the input text and the selected paraphrases.

    System, method, and program for aggregating data

    公开(公告)号:US10733218B2

    公开(公告)日:2020-08-04

    申请号:US14970741

    申请日:2015-12-16

    IPC分类号: G06F16/31

    摘要: A system to reduce a required memory area (storage capacity) and save time and effort for updating target attributes in aggregation processing is disclosed. The system for aggregating data includes an index storing unit for storing DtoK indices arranged in predetermined order, each of the indices specifying a list of attributes included in a target data item from identification information of the target data item, and a word list that is a list of attributes included in a plurality of the target data items, and an aggregation processing unit for finding, for each attribute, target data items including the attribute and executing aggregation processing for aggregating attributes whose relation with the target data items meets a predetermined standard. A link is created for each attribute in the word list for sequentially following an element in the index for each target data item, and target data items are found based thereon.