-
公开(公告)号:US12093645B2
公开(公告)日:2024-09-17
申请号:US17474364
申请日:2021-09-14
发明人: Eyal Shnarch , Ariel Gera , Alon Halfon , Lena Dankin , Leshem Choshen , Ranit Aharonov , Noam Slonim
IPC分类号: G06F40/279 , G10L25/30
CPC分类号: G06F40/279 , G10L25/30
摘要: An example system includes a processor to pre-train a transformer-based language model on a general domain. The processor can inter-train the pre-trained transformer-based language model using partitioning and classification to generate an inter-trained transformer-based pre-trained language model. The processor can then fine-tune the inter-trained transformer-based pre-trained language model on a target task to generate a fine-tuned transformer-based language model.
-
公开(公告)号:US10831793B2
公开(公告)日:2020-11-10
申请号:US16167552
申请日:2018-10-23
发明人: Ranit Aharonov , Liat Ein Dor , Alon Halfon , Yosi Mass , Ilya Shnayderman , Noam Slonim , Elad Venezian
摘要: A method of estimating a thematic similarity of sentences, comprising receiving a corpus of a plurality of documents describing a plurality of topics where each document comprises a plurality of sentences arranged in a plurality of sections, constructing sentence triplets for at least some of the sentences, each sentence triplet comprising a respective sentence, a respective positive sentence selected randomly from the section comprising the respective sentence and a respective negative sentence selected randomly from another section, training a first neural network with the sentence triplets to identify sentence-sentence vectors mapping each sentence with a shorter distance to its respective positive sentence compared to the distance to its respective negative sentence and outputting the first neural network for estimating thematic similarity between a pair of sentences by computing a distance between the sentence-sentence vectors produced for each sentence of the pair by the first neural network.
-
公开(公告)号:US11308419B2
公开(公告)日:2022-04-19
申请号:US16191478
申请日:2018-11-15
发明人: Ranit Aharonov , Roy Bar-Haim , Alon Halfon , Charles Arthur Jochim , Amir Menczel , Noam Slonim , Orith Toledo-Ronen
IPC分类号: G06N20/00 , G06F17/18 , G06F40/30 , G06F40/284
摘要: A method including: generating, from a text corpus, a lexicon of unigrams and bigrams comprising an embedding for each of said unigrams and bigrams; training a machine learning classifier on a training set comprising a subset of said lexicon, wherein each of said unigrams and bigrams in said subset has a sentiment label; applying said machine learning classifier to said lexicon, to (i) predict a sentiment of each of said unigrams and bigrams, and (ii) update said lexicon with the predicted sentiments; and performing statistical analysis on said updated lexicon, to extract one or more sentiment composition lexicons, wherein each of said one or more sentiment composition lexicons is associated with a sentiment composition class.
-
4.
公开(公告)号:US20200374182A1
公开(公告)日:2020-11-26
申请号:US16419058
申请日:2019-05-22
发明人: ELLIOT KARL KOLODNER , Anna Levin , Alon Halfon
摘要: Embodiments of the present systems and methods may provide techniques for finding failing components in a distributed storage system. For example a method may comprise measuring problems and health of a plurality of physical and logical components in a distributed storage system, the plurality of physical and logical components forming nodes of the distributed storage system, and generating a graph of the nodes organized in a plurality of hierarchical levels, generating, for each node in the graph, a score summarizing the measured problems and health of the node, determining a highest score at a highest hierarchical level of the graph and determining the associated node as a failing component at a most significant level.
-
公开(公告)号:US20200125673A1
公开(公告)日:2020-04-23
申请号:US16167552
申请日:2018-10-23
发明人: RANIT AHARONOV , Liat Ein Dor , Alon Halfon , Yosi Mass , IIya Shnayderman , Noam Slonim , ELAD VENEZIAN
摘要: A method of estimating a thematic similarity of sentences, comprising receiving a corpus of a plurality of documents describing a plurality of topics where each document comprises a plurality of sentences arranged in a plurality of sections, constructing sentence triplets for at least some of the sentences, each sentence triplet comprising a respective sentence, a respective positive sentence selected randomly from the section comprising the respective sentence and a respective negative sentence selected randomly from another section, training a first neural network with the sentence triplets to identify sentence-sentence vectors mapping each sentence with a shorter distance to its respective positive sentence compared to the distance to its respective negative sentence and outputting the first neural network for estimating thematic similarity between a pair of sentences by computing a distance between the sentence-sentence vectors produced for each sentence of the pair by the first neural network.
-
公开(公告)号:US20200065716A1
公开(公告)日:2020-02-27
申请号:US16191478
申请日:2018-11-15
发明人: Ranit Aharonov , Roy Bar-Haim , Alon Halfon , Charles Arthur Jochim , Amir Menczel , Noam Slonim , Orith Toledo-Ronen
摘要: A method including: generating, from a text corpus, a lexicon of unigrams and bigrams comprising an embedding for each of said unigrams and bigrams; training a machine learning classifier on a training set comprising a subset of said lexicon, wherein each of said unigrams and bigrams in said subset has a sentiment label; applying said machine learning classifier to said lexicon, to (i) predict a sentiment of each of said unigrams and bigrams, and (ii) update said lexicon with the predicted sentiments; and performing statistical analysis on said updated lexicon, to extract one or more sentiment composition lexicons, wherein each of said one or more sentiment composition lexicons is associated with a sentiment composition class.
-
-
-
-
-