Learning thematic similarity metric from article text units

    公开(公告)号:US10831793B2

    公开(公告)日:2020-11-10

    申请号:US16167552

    申请日:2018-10-23

    摘要: A method of estimating a thematic similarity of sentences, comprising receiving a corpus of a plurality of documents describing a plurality of topics where each document comprises a plurality of sentences arranged in a plurality of sections, constructing sentence triplets for at least some of the sentences, each sentence triplet comprising a respective sentence, a respective positive sentence selected randomly from the section comprising the respective sentence and a respective negative sentence selected randomly from another section, training a first neural network with the sentence triplets to identify sentence-sentence vectors mapping each sentence with a shorter distance to its respective positive sentence compared to the distance to its respective negative sentence and outputting the first neural network for estimating thematic similarity between a pair of sentences by computing a distance between the sentence-sentence vectors produced for each sentence of the pair by the first neural network.

    METHOD FOR FINDING FAILING COMPONENTS IN A LARGE DISTRIBUTED STORAGE SYSTEM CONNECTIVITY

    公开(公告)号:US20200374182A1

    公开(公告)日:2020-11-26

    申请号:US16419058

    申请日:2019-05-22

    IPC分类号: H04L12/24 H04L12/26 H04L29/08

    摘要: Embodiments of the present systems and methods may provide techniques for finding failing components in a distributed storage system. For example a method may comprise measuring problems and health of a plurality of physical and logical components in a distributed storage system, the plurality of physical and logical components forming nodes of the distributed storage system, and generating a graph of the nodes organized in a plurality of hierarchical levels, generating, for each node in the graph, a score summarizing the measured problems and health of the node, determining a highest score at a highest hierarchical level of the graph and determining the associated node as a failing component at a most significant level.

    LEARNING THEMATIC SIMILARITY METRIC FROM ARTICLE TEXT UNITS

    公开(公告)号:US20200125673A1

    公开(公告)日:2020-04-23

    申请号:US16167552

    申请日:2018-10-23

    摘要: A method of estimating a thematic similarity of sentences, comprising receiving a corpus of a plurality of documents describing a plurality of topics where each document comprises a plurality of sentences arranged in a plurality of sections, constructing sentence triplets for at least some of the sentences, each sentence triplet comprising a respective sentence, a respective positive sentence selected randomly from the section comprising the respective sentence and a respective negative sentence selected randomly from another section, training a first neural network with the sentence triplets to identify sentence-sentence vectors mapping each sentence with a shorter distance to its respective positive sentence compared to the distance to its respective negative sentence and outputting the first neural network for estimating thematic similarity between a pair of sentences by computing a distance between the sentence-sentence vectors produced for each sentence of the pair by the first neural network.

    LEARNING SENTIMENT COMPOSITION FROM SENTIMENT LEXICONS

    公开(公告)号:US20200065716A1

    公开(公告)日:2020-02-27

    申请号:US16191478

    申请日:2018-11-15

    IPC分类号: G06N99/00 G06F17/18

    摘要: A method including: generating, from a text corpus, a lexicon of unigrams and bigrams comprising an embedding for each of said unigrams and bigrams; training a machine learning classifier on a training set comprising a subset of said lexicon, wherein each of said unigrams and bigrams in said subset has a sentiment label; applying said machine learning classifier to said lexicon, to (i) predict a sentiment of each of said unigrams and bigrams, and (ii) update said lexicon with the predicted sentiments; and performing statistical analysis on said updated lexicon, to extract one or more sentiment composition lexicons, wherein each of said one or more sentiment composition lexicons is associated with a sentiment composition class.