-
公开(公告)号:US09575952B2
公开(公告)日:2017-02-21
申请号:US14519427
申请日:2014-10-21
Applicant: AT&T Intellectual Property I, L.P.
Inventor: Vivek Kumar Rangarajan Sridhar
IPC: G06F17/27
CPC classification number: G06F17/2715 , G06F17/2785 , G10L25/30 , H04W4/14
Abstract: Topics are determined for short text messages using an unsupervised topic model. In a training corpus created from a number of short text messages, a vocabulary of words is identified, and for each word a distributed vector representation is obtained by processing windows of the corpus having a fixed length. The corpus is modeled as a Gaussian mixture model in which Gaussian components represent topics. To determine a topic of a sample short text message, a posterior distribution over the corpus topics is obtained using the Gaussian mixture model.
Abstract translation: 使用无监督主题模型确定短文本消息的主题。 在从许多短文本消息创建的训练语料库中,识别词汇词,并且对于每个单词,通过处理具有固定长度的语料库的窗口来获得分布式向量表示。 语料库被建模为高斯混合模型,其中高斯分量表示主题。 为了确定样本短文本消息的主题,使用高斯混合模型获得语料库主题的后验分布。
-
公开(公告)号:US09501470B2
公开(公告)日:2016-11-22
申请号:US13761549
申请日:2013-02-07
Applicant: AT&T Intellectual Property I, L.P.
Inventor: Srinivas Bangalore , Vivek Kumar Rangarajan Sridhar
CPC classification number: G06F17/28 , G06F17/279 , G06F17/289
Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for enriching spoken language translation with dialog acts. The method includes receiving a source speech signal, tagging dialog acts associated with the received source speech signal using a classification model, dialog acts being domain independent descriptions of an intended action a speaker carries out by uttering the source speech signal, producing an enriched hypothesis of the source speech signal incorporating the dialog act tags, and outputting a natural language response of the enriched hypothesis in a target language. Tags can be grouped into sets such as statement, acknowledgement, abandoned, agreement, question, appreciation, and other. The step of producing an enriched translation of the source speech signal uses a dialog act specific translation model containing a phrase translation table.
-
13.
公开(公告)号:US20230075113A1
公开(公告)日:2023-03-09
申请号:US18055338
申请日:2022-11-14
Applicant: AT&T Intellectual Property I, L.P.
Inventor: Vivek Kumar Rangarajan Sridhar
IPC: G06F40/232 , G06F40/58
Abstract: A system, method and computer-readable storage devices for providing unsupervised normalization of noisy text using distributed representation of words. The system receives, from a social media forum, a word having a non-canonical spelling in a first language. The system determines a context of the word in the social media forum, identifies the word in a vector space model, and selects an “n-best” vector paths in the vector space model, where the n-best vector paths are neighbors to the vector space path based on the context and the non-canonical spelling. The system can then select, based on a similarity cost, a best path from the n-best vector paths and identify a word associated with the best path as the canonical version.
-
14.
公开(公告)号:US10671807B2
公开(公告)日:2020-06-02
申请号:US16139192
申请日:2018-09-24
Applicant: AT&T Intellectual Property I, L.P.
Inventor: Vivek Kumar Rangarajan Sridhar
IPC: G06F40/232 , G06F40/58 , G06Q50/00
Abstract: A system, method and computer-readable storage devices for providing unsupervised normalization of noisy text using distributed representation of words. The system receives, from a social media forum, a word having a non-canonical spelling in a first language. The system determines a context of the word in the social media forum, identifies the word in a vector space model, and selects an “n-best” vector paths in the vector space model, where the n-best vector paths are neighbors to the vector space path based on the context and the non-canonical spelling. The system can then select, based on a similarity cost, a best path from the n-best vector paths and identify a word associated with the best path as the canonical version.
-
公开(公告)号:US20180157639A1
公开(公告)日:2018-06-07
申请号:US15888385
申请日:2018-02-05
Applicant: AT&T Intellectual Property I, L.P.
Inventor: Vivek Kumar Rangarajan Sridhar
CPC classification number: G06F17/2715 , G06F17/2785 , G10L25/30 , H04W4/14
Abstract: Topics are determined for short text messages using an unsupervised topic model. In a training corpus created from a number of short text messages, a vocabulary of words is identified, and for each word a distributed vector representation is obtained by processing windows of the corpus having a fixed length. The corpus is modeled as a Gaussian mixture model in which Gaussian components represent topics. To determine a topic of a sample short text message, a posterior distribution over the corpus topics is obtained using the Gaussian mixture model.
-
-
-
-