Unsupervised topic modeling for short texts
    11.
    发明授权
    Unsupervised topic modeling for short texts 有权
    无监督的短文本主题建模

    公开(公告)号:US09575952B2

    公开(公告)日:2017-02-21

    申请号:US14519427

    申请日:2014-10-21

    CPC classification number: G06F17/2715 G06F17/2785 G10L25/30 H04W4/14

    Abstract: Topics are determined for short text messages using an unsupervised topic model. In a training corpus created from a number of short text messages, a vocabulary of words is identified, and for each word a distributed vector representation is obtained by processing windows of the corpus having a fixed length. The corpus is modeled as a Gaussian mixture model in which Gaussian components represent topics. To determine a topic of a sample short text message, a posterior distribution over the corpus topics is obtained using the Gaussian mixture model.

    Abstract translation: 使用无监督主题模型确定短文本消息的主题。 在从许多短文本消息创建的训练语料库中,识别词汇词,并且对于每个单词,通过处理具有固定长度的语料库的窗口来获得分布式向量表示。 语料库被建模为高斯混合模型,其中高斯分量表示主题。 为了确定样本短文本消息的主题,使用高斯混合模型获得语料库主题的后验分布。

    System and method for enriching spoken language translation with dialog acts

    公开(公告)号:US09501470B2

    公开(公告)日:2016-11-22

    申请号:US13761549

    申请日:2013-02-07

    CPC classification number: G06F17/28 G06F17/279 G06F17/289

    Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for enriching spoken language translation with dialog acts. The method includes receiving a source speech signal, tagging dialog acts associated with the received source speech signal using a classification model, dialog acts being domain independent descriptions of an intended action a speaker carries out by uttering the source speech signal, producing an enriched hypothesis of the source speech signal incorporating the dialog act tags, and outputting a natural language response of the enriched hypothesis in a target language. Tags can be grouped into sets such as statement, acknowledgement, abandoned, agreement, question, appreciation, and other. The step of producing an enriched translation of the source speech signal uses a dialog act specific translation model containing a phrase translation table.

    SYSTEM AND METHOD FOR UNSUPERVISED TEXT NORMALIZATION USING DISTRIBUTED REPRESENTATION OF WORDS

    公开(公告)号:US20230075113A1

    公开(公告)日:2023-03-09

    申请号:US18055338

    申请日:2022-11-14

    Abstract: A system, method and computer-readable storage devices for providing unsupervised normalization of noisy text using distributed representation of words. The system receives, from a social media forum, a word having a non-canonical spelling in a first language. The system determines a context of the word in the social media forum, identifies the word in a vector space model, and selects an “n-best” vector paths in the vector space model, where the n-best vector paths are neighbors to the vector space path based on the context and the non-canonical spelling. The system can then select, based on a similarity cost, a best path from the n-best vector paths and identify a word associated with the best path as the canonical version.

    System and method for unsupervised text normalization using distributed representation of words

    公开(公告)号:US10671807B2

    公开(公告)日:2020-06-02

    申请号:US16139192

    申请日:2018-09-24

    Abstract: A system, method and computer-readable storage devices for providing unsupervised normalization of noisy text using distributed representation of words. The system receives, from a social media forum, a word having a non-canonical spelling in a first language. The system determines a context of the word in the social media forum, identifies the word in a vector space model, and selects an “n-best” vector paths in the vector space model, where the n-best vector paths are neighbors to the vector space path based on the context and the non-canonical spelling. The system can then select, based on a similarity cost, a best path from the n-best vector paths and identify a word associated with the best path as the canonical version.

    Unsupervised Topic Modeling For Short Texts
    15.
    发明申请

    公开(公告)号:US20180157639A1

    公开(公告)日:2018-06-07

    申请号:US15888385

    申请日:2018-02-05

    CPC classification number: G06F17/2715 G06F17/2785 G10L25/30 H04W4/14

    Abstract: Topics are determined for short text messages using an unsupervised topic model. In a training corpus created from a number of short text messages, a vocabulary of words is identified, and for each word a distributed vector representation is obtained by processing windows of the corpus having a fixed length. The corpus is modeled as a Gaussian mixture model in which Gaussian components represent topics. To determine a topic of a sample short text message, a posterior distribution over the corpus topics is obtained using the Gaussian mixture model.

Patent Agency Ranking