Three-dimensional probabilistic data structure

    公开(公告)号:US11829398B2

    公开(公告)日:2023-11-28

    申请号:US16845921

    申请日:2020-04-10

    摘要: Techniques are disclosed relating to probabilistic data structures. A database node may maintaining a probabilistic data structure capable of encoding database keys. The probabilistic data structure may include a plurality of levels that are each capable of storing an indication of a transition between successive characters in a database key. The database node may insert a particular database key into the probabilistic data structure and the particular database key may comprise a series of characters. The inserting may include setting, for each transition between successive characters of the series of characters, an indication in a corresponding level of the plurality of levels that is indicative of that transition. The database node may further maintain lineage information specifying one or more lineages that correspond to the transition.

    RANKING EXPLANATORY VARIABLES IN MULTIVARIATE ANALYSIS

    公开(公告)号:US20230252067A1

    公开(公告)日:2023-08-10

    申请号:US17650534

    申请日:2022-02-10

    摘要: A computer-implemented method, a computer program product, and a computer system for ranking explanatory variables in multivariate analysis. A computer system extracts words from documents related to categories, creates a histogram of the words in each category, and selects top words in each histogram, where the top words are used as representing words in each category. A computer system generates respective feature vectors of explanatory variable candidates and a feature vector of an objective variable, where a feature vector of a corresponding variable includes elements corresponding to respective ones of the categories and a value of element indicates whether a name of the corresponding variable is included in the top words. A computer system calculates cosine similarity between each of the respective feature vectors of the explanatory variable candidates and the feature vector of the objective variable. A computer system ranks the explanatory variable candidates, based on the cosine similarity.

    DOCUMENT SEARCH DEVICE, DOCUMENT SEARCH SYSTEM, DOCUMENT SEARCH PROGRAM, AND DOCUMENT SEARCH METHOD

    公开(公告)号:US20230229683A1

    公开(公告)日:2023-07-20

    申请号:US18002105

    申请日:2021-07-19

    申请人: SHOWA DENKO K.K.

    IPC分类号: G06F16/33

    CPC分类号: G06F16/3346 G06F16/3334

    摘要: To improve precision while maintaining a balance between accuracy and comprehensiveness of a document search. According to one embodiment of the present invention, a document search device includes an input reception unit configured to receive an input of a keyword of a document search, a document search unit configured to acquire, from a document, a hit character string matching a character string in which a portion of characters of the keyword is replaced with a wildcard, and character strings before and after the hit character string, and compute a likelihood of the hit character string, based on the hit character string, and the character strings before and after the hit character string, and a search result display unit configured to output a result of the document search based on the likelihood.

    Searching data repositories using pictograms and machine learning

    公开(公告)号:US11663256B2

    公开(公告)日:2023-05-30

    申请号:US17348362

    申请日:2021-06-15

    申请人: Kyndryl, Inc.

    摘要: A pictogram repository is created of pictograms including expressions that are mapped to at least a portion of source code that is stored in a separate source code repository. A score is recorded for developers for the source code that is stored in the source code repository. A source code search inquiry of at least one pictograms for search query elements is conducted, in which the at least one pictogram for the search query elements are matched to the pictograms in the repository of pictograms that includes expressions that are mapped to at least a portion of source code that is stored in the separate source code repository. Matching source code have the score for their developer checked against a threshold value. Source code meeting the search query elements and having a score for their developer meeting the threshold value are retrieved.