Patent search ap:("AT&T INTELLECTUAL PROPERTY I Page L.P.") AND inv:"Vivek Kumar Rangarajan Sridhar"

11.

发明授权
Unsupervised topic modeling for short texts 有权
Title translation: 无监督的短文本主题建模

公开(公告)号：US09575952B2

公开(公告)日：2017-02-21

申请号：US14519427

申请日：2014-10-21

Applicant: AT&T Intellectual Property I, L.P.

Inventor： Vivek Kumar Rangarajan Sridhar

IPC: G06F17/27

CPC classification number: G06F17/2715 , G06F17/2785 , G10L25/30 , H04W4/14

Abstract: Topics are determined for short text messages using an unsupervised topic model. In a training corpus created from a number of short text messages, a vocabulary of words is identified, and for each word a distributed vector representation is obtained by processing windows of the corpus having a fixed length. The corpus is modeled as a Gaussian mixture model in which Gaussian components represent topics. To determine a topic of a sample short text message, a posterior distribution over the corpus topics is obtained using the Gaussian mixture model.

Abstract translation: 使用无监督主题模型确定短文本消息的主题。在从许多短文本消息创建的训练语料库中，识别词汇词，并且对于每个单词，通过处理具有固定长度的语料库的窗口来获得分布式向量表示。语料库被建模为高斯混合模型，其中高斯分量表示主题。为了确定样本短文本消息的主题，使用高斯混合模型获得语料库主题的后验分布。

12.

发明授权
System and method for enriching spoken language translation with dialog acts 有权

公开(公告)号：US09501470B2

公开(公告)日：2016-11-22

申请号：US13761549

申请日：2013-02-07

Applicant: AT&T Intellectual Property I, L.P.

Inventor： Srinivas Bangalore , Vivek Kumar Rangarajan Sridhar

IPC: G06F17/28 , G06F17/27

CPC classification number: G06F17/28 , G06F17/279 , G06F17/289

Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for enriching spoken language translation with dialog acts. The method includes receiving a source speech signal, tagging dialog acts associated with the received source speech signal using a classification model, dialog acts being domain independent descriptions of an intended action a speaker carries out by uttering the source speech signal, producing an enriched hypothesis of the source speech signal incorporating the dialog act tags, and outputting a natural language response of the enriched hypothesis in a target language. Tags can be grouped into sets such as statement, acknowledgement, abandoned, agreement, question, appreciation, and other. The step of producing an enriched translation of the source speech signal uses a dialog act specific translation model containing a phrase translation table.

13.

发明申请
SYSTEM AND METHOD FOR UNSUPERVISED TEXT NORMALIZATION USING DISTRIBUTED REPRESENTATION OF WORDS 有权

公开(公告)号：US20230075113A1

公开(公告)日：2023-03-09

申请号：US18055338

申请日：2022-11-14

Applicant: AT&T Intellectual Property I, L.P.

Inventor： Vivek Kumar Rangarajan Sridhar

IPC: G06F40/232 , G06F40/58

Abstract: A system, method and computer-readable storage devices for providing unsupervised normalization of noisy text using distributed representation of words. The system receives, from a social media forum, a word having a non-canonical spelling in a first language. The system determines a context of the word in the social media forum, identifies the word in a vector space model, and selects an “n-best” vector paths in the vector space model, where the n-best vector paths are neighbors to the vector space path based on the context and the non-canonical spelling. The system can then select, based on a similarity cost, a best path from the n-best vector paths and identify a word associated with the best path as the canonical version.

14.

发明授权
System and method for unsupervised text normalization using distributed representation of words 审中-公开

公开(公告)号：US10671807B2

公开(公告)日：2020-06-02

申请号：US16139192

申请日：2018-09-24

Applicant: AT&T Intellectual Property I, L.P.

Inventor： Vivek Kumar Rangarajan Sridhar

IPC: G06F40/232 , G06F40/58 , G06Q50/00

Abstract: A system, method and computer-readable storage devices for providing unsupervised normalization of noisy text using distributed representation of words. The system receives, from a social media forum, a word having a non-canonical spelling in a first language. The system determines a context of the word in the social media forum, identifies the word in a vector space model, and selects an “n-best” vector paths in the vector space model, where the n-best vector paths are neighbors to the vector space path based on the context and the non-canonical spelling. The system can then select, based on a similarity cost, a best path from the n-best vector paths and identify a word associated with the best path as the canonical version.

15.

发明申请
Unsupervised Topic Modeling For Short Texts 审中-公开

公开(公告)号：US20180157639A1

公开(公告)日：2018-06-07

申请号：US15888385

申请日：2018-02-05

Applicant: AT&T Intellectual Property I, L.P.

Inventor： Vivek Kumar Rangarajan Sridhar

IPC: G06F17/27 , H04W4/14 , G10L25/30

CPC classification number: G06F17/2715 , G06F17/2785 , G10L25/30 , H04W4/14

Abstract: Topics are determined for short text messages using an unsupervised topic model. In a training corpus created from a number of short text messages, a vocabulary of words is identified, and for each word a distributed vector representation is obtained by processing windows of the corpus having a fixed length. The corpus is modeled as a Gaussian mixture model in which Gaussian components represent topics. To determine a topic of a sample short text message, a posterior distribution over the corpus topics is obtained using the Gaussian mixture model.

Patent Agency Ranking