专利检索 ap:("Geoffrey G. Zweig" OR "Yun-Cheng Ju") AND inv:"Geoffrey G. Zweig" 第 1 页

1.

发明申请
SEARCHING A DATABASE OF LISTINGS 有权
标题翻译：搜索列表数据库

公开(公告)号：US20080281806A1

公开(公告)日：2008-11-13

申请号：US11746847

申请日：2007-05-10

申请人： Ye-Yi Wang , Dong Yu , Yun-Cheng Ju , Alejandro Acero , Geoffrey G. Zweig

发明人： Ye-Yi Wang , Dong Yu , Yun-Cheng Ju , Alejandro Acero , Geoffrey G. Zweig

IPC分类号： G06F17/30

CPC分类号： G06F17/30663 , G06F3/0641 , G06F17/3069 , G10L15/187 , G10L15/197

摘要： A database having listings rather than long documents is searched using a term frequency-inverse document frequency (Tf/Idf) algorithm.

摘要翻译： 使用术语频率 - 逆文档频率（Tf / Idf）算法搜索具有列表而不是长文档的数据库。

2.

发明授权
Searching a database of listings 有权
标题翻译：搜索列表的数据库

公开(公告)号：US09218412B2

公开(公告)日：2015-12-22

申请号：US11746847

申请日：2007-05-10

申请人： Ye-Yi Wang , Dong Yu , Yun-Cheng Ju , Alejandro Acero , Geoffrey G. Zweig

发明人： Ye-Yi Wang , Dong Yu , Yun-Cheng Ju , Alejandro Acero , Geoffrey G. Zweig

IPC分类号： G06F7/00 , G06F17/30 , G06F3/06 , G10L15/187 , G10L15/197

CPC分类号： G06F17/30663 , G06F3/0641 , G06F17/3069 , G10L15/187 , G10L15/197

摘要： A database having listings rather than long documents is searched using a term frequency-inverse document frequency (Tf/Idf) algorithm.

摘要翻译： 使用术语频率 - 逆文档频率（Tf / Idf）算法搜索具有列表而不是长文档的数据库。

3.

发明申请
Automated Data Cleanup 有权
标题翻译：自动数据清理

公开(公告)号：US20100076752A1

公开(公告)日：2010-03-25

申请号：US12561521

申请日：2009-09-17

申请人： Geoffrey G. Zweig , Yun-Cheng Ju

发明人： Geoffrey G. Zweig , Yun-Cheng Ju

IPC分类号： G06F17/21 , G10L15/26

CPC分类号： G10L15/063 , G06F17/2735 , G10L15/187

摘要： The described implementations relate to automated data cleanup. One system includes a language model generated from language model seed text and a dictionary of possible data substitutions. This system also includes a transducer configured to cleanse a corpus utilizing the language model and the dictionary.

摘要翻译： 所描述的实现涉及自动数据清理。一个系统包括从语言模型种子文本生成的语言模型和可能的数据替换的字典。该系统还包括配置成利用语言模型和词典清理语料库的换能器。

4.

发明授权
Structured models of repetition for speech recognition 有权
标题翻译：用于语音识别的重复结构化模型

公开(公告)号：US08965765B2

公开(公告)日：2015-02-24

申请号：US12233826

申请日：2008-09-19

申请人： Geoffrey G. Zweig , Xiao Li , Dan Bohus , Alejandro Acero , Eric J. Horvitz

发明人： Geoffrey G. Zweig , Xiao Li , Dan Bohus , Alejandro Acero , Eric J. Horvitz

IPC分类号： G10L15/00 , G10L15/18

CPC分类号： G10L15/1822

摘要： Described is a technology by which a structured model of repetition is used to determine the words spoken by a user, and/or a corresponding database entry, based in part on a prior utterance. For a repeated utterance, a joint probability analysis is performed on (at least some of) the corresponding word sequences as recognized by one or more recognizers) and associated acoustic data. For example, a generative probabilistic model, or a maximum entropy model may be used in the analysis. The second utterance may be a repetition of the first utterance using the exact words, or another structural transformation thereof relative to the first utterance, such as an extension that adds one or more words, a truncation that removes one or more words, or a whole or partial spelling of one or more words.

摘要翻译： 描述了一种技术，通过该技术，部分地基于先前的话语，使用结构化重复模型来确定用户说出的单词和/或相应的数据库条目。对于重复的话语，对由一个或多个识别器识别的相应字序列（和至少一些）和相关联的声学数据进行联合概率分析。例如，可以在分析中使用生成概率模型或最大熵模型。第二个发音可以是使用精确的单词或相对于第一个发音的其他结构变换的第一个发音的重复，例如添加一个或多个单词的扩展，删除一个或多个单词的截断或整个或一个或多个单词的部分拼写。

5.

发明申请
SPEECH RECOGNITION UTILIZING MULTITUDE OF SPEECH FEATURES 审中-公开
标题翻译：语音识别利用多种语音特征

公开(公告)号：US20080312921A1

公开(公告)日：2008-12-18

申请号：US12195123

申请日：2008-08-20

申请人： Scott E. Axelrod , Sreeram Viswanath Balakrishnan , Stanley F. Chen , Yuging Gao , Rameah A. Gopinath , Hong-Kwang Kuo , Benoit Maison , David Nahamoo , Michael Alan Picheny , George A. Saon , Geoffrey G. Zweig

发明人： Scott E. Axelrod , Sreeram Viswanath Balakrishnan , Stanley F. Chen , Yuging Gao , Rameah A. Gopinath , Hong-Kwang Kuo , Benoit Maison , David Nahamoo , Michael Alan Picheny , George A. Saon , Geoffrey G. Zweig

IPC分类号： G10L15/00 , G10L15/04

CPC分类号： G10L15/063 , G10L15/02 , G10L15/14 , G10L2015/085

摘要： In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.

摘要翻译： 在语音识别系统中，提供了具有多个语音特征的对数线性模型的组合来识别未知语音语音。语音识别系统使用对数线性模型对与语音识别相关的语言单位的后验概率进行建模。后验模型捕获了语言单位给出观察到的语音特征和后验模型参数的概率。可以使用给定多个语音特征的单词序列假设的概率来确定后验模型。对数线性模型与来自稀疏或不完整数据的特征一起使用。所使用的语音特征可以包括异步，重叠和统计上非独立的语音特征。培训中使用的并非所有功能都需要出现在测试/识别中。

6.

发明申请
DETERMINING SYNONYM-ANTONYM POLARITY IN TERM VECTORS 审中-公开
标题翻译：确定定时矢量中的同步聚焦极化

公开(公告)号：US20140067368A1

公开(公告)日：2014-03-06

申请号：US13597277

申请日：2012-08-29

申请人： Wen-tau Yih , Geoffrey G. Zweig , John C. Platt

发明人： Wen-tau Yih , Geoffrey G. Zweig , John C. Platt

IPC分类号： G06F17/27

CPC分类号： G06F17/2795 , G06F16/3338 , G06F17/2785

摘要： A document-term matrix may be generated based on a corpus. A term representation matrix may be generated based on modifying a plurality of elements of the document-term matrix based on antonym information included in the corpus. Similarities may be determined based on a plurality of elements of the term representation matrix.

摘要翻译： 可以基于语料库生成文档术语矩阵。可以基于基于语料库中包含的反义词信息修改文档项矩阵的多个元素来生成术语表示矩阵。可以基于术语表示矩阵的多个元素来确定相似度。

7.

发明授权
Method for clustering closely resembling data objects 有权
标题翻译：聚类非常类似于数据对象的方法

公开(公告)号：US06349296B1

公开(公告)日：2002-02-19

申请号：US09642017

申请日：2000-08-21

申请人： Andrei Z. Broder , Steven C. Glassman , Charles G. Nelson , Mark S. Manasse , Geoffrey G. Zweig

发明人： Andrei Z. Broder , Steven C. Glassman , Charles G. Nelson , Mark S. Manasse , Geoffrey G. Zweig

IPC分类号： G06F1730

CPC分类号： G06F17/3071 , Y10S707/99932 , Y10S707/99933 , Y10S707/99935 , Y10S707/99944

摘要： A computer-implemented method determines the resemblance of data objects such as Web pages. Each data object is partitioned into a sequence of tokens. The tokens are grouped into overlapping sets of the tokens to form shingles. Each shingle is represented by a unique identification element encoded as a fingerprint. A minimum element from each of the images of the set of fingerprints associated with a document under each of a plurality of pseudo random permutations of the set of all fingerprints are selected to generate a sketch of each data object. The sketches characterize the resemblance of the data objects. The sketches can be further partitioned into a plurality of groups. Each group is fingerprinted to form a feature. Data objects that share more than a certain numbers of features are estimated to be nearly identical.

摘要翻译： 计算机实现的方法确定诸如网页之类的数据对象的相似性。每个数据对象被分成令牌序列。令牌被分组成重叠的令牌组以形成带状疱疹。每个瓦片由编码为指纹的唯一识别元件表示。选择与所有指纹集合的多个伪随机排列中的每一个下的文档相关联的指纹集合的每个图像的最小元素以生成每个数据对象的草图。草图描绘了数据对象的相似之处。草图可以进一步划分成多个组。每组都有指纹识别功能。共享超过一定数量特征的数据对象估计几乎相同。

8.

发明申请
THREE-DIMENSIONAL OBJECT BROWSING IN DOCUMENTS 有权
标题翻译：文件中的三维对象浏览

公开(公告)号：US20140037218A1

公开(公告)日：2014-02-06

申请号：US13567105

申请日：2012-08-06

申请人： Geoffrey G. Zweig , Eric J. Stollnitz , Richard Szeliski , Sudipta Sinha , Johannes Kopf

发明人： Geoffrey G. Zweig , Eric J. Stollnitz , Richard Szeliski , Sudipta Sinha , Johannes Kopf

IPC分类号： G06K9/68

CPC分类号： G06F17/30268

摘要： A document that includes a representation of a two-dimensional (2-D) image may be obtained. A selection indicator indicating a selection of at least a portion of the 2-D image may be obtained. A match correspondence may be determined between the selected portion of the 2-D image and a three-dimensional (3-D) image object stored in an object database, the match correspondence based on a web crawler analysis result. A 3-D rendering of the 3-D image object that corresponds to the selected portion of the 2-D image may be initiated.

摘要翻译： 可以获得包括二维（2-D）图像的表示的文档。可以获得指示选择2-D图像的至少一部分的选择指示符。可以在2-D图像的所选部分和存储在对象数据库中的三维（3-D）图像对象之间确定匹配对应关系，该匹配对应基于网络爬行器分析结果。可以启动对应于2-D图像的所选部分的3-D图像对象的3-D渲染。

9.

发明申请
STRUCTURED MODELS OF REPITITION FOR SPEECH RECOGNITION 有权
标题翻译：用于语音识别的结构化复制模型

公开(公告)号：US20100076765A1

公开(公告)日：2010-03-25

申请号：US12233826

申请日：2008-09-19

申请人： Geoffrey G. Zweig , Xiao Li , Dan Bohus , Alejandro Acero , Eric J. Horvitz

发明人： Geoffrey G. Zweig , Xiao Li , Dan Bohus , Alejandro Acero , Eric J. Horvitz

IPC分类号： G10L15/00

CPC分类号： G10L15/1822

摘要： Described is a technology by which a structured model of repetition is used to determine the words spoken by a user, and/or a corresponding database entry, based in part on a prior utterance. For a repeated utterance, a joint probability analysis is performed on (at least some of) the corresponding word sequences as recognized by one or more recognizers) and associated acoustic data. For example, a generative probabilistic model, or a maximum entropy model may be used in the analysis. The second utterance may be a repetition of the first utterance using the exact words, or another structural transformation thereof relative to the first utterance, such as an extension that adds one or more words, a truncation that removes one or more words, or a whole or partial spelling of one or more words.

摘要翻译： 描述了一种技术，通过该技术，部分地基于先前的话语，使用结构化重复模型来确定用户说出的单词和/或相应的数据库条目。对于重复的话语，对由一个或多个识别器识别的相应字序列（和至少一些）和相关联的声学数据进行联合概率分析。例如，可以在分析中使用生成概率模型或最大熵模型。第二个发音可以是使用精确的单词或相对于第一个发音的其他结构变换的第一个发音的重复，例如添加一个或多个单词的扩展，删除一个或多个单词的截断或整个或一个或多个单词的部分拼写。

10.

发明授权
Automatic construction of unique signatures and confusable sets for database access 有权
标题翻译：自动构建数据库访问的独特签名和混淆集

公开(公告)号：US07251599B2

公开(公告)日：2007-07-31

申请号：US10315411

申请日：2002-12-10

申请人： Benoit Maison , Geoffrey G. Zweig

发明人： Benoit Maison , Geoffrey G. Zweig

IPC分类号： G10L15/02

CPC分类号： G10L15/18

摘要： Methods and arrangements for facilitating database access in speech recognition. A plurality of possible subsequences corresponding to a database entry are ascertained, a record of such subsequences and their correspondence to database entries is created, and either or both of the following are carried out: unique signatures are ascertained via determining whether a subsequence corresponding to a given database entry does not also correspond to at least one other database entry; and/or multiple occurrences of a given subsequence are found, with corresponding database entries being grouped into a confusion set.

摘要翻译： 在语音识别中促进数据库访问的方法和安排。确定对应于数据库条目的多个可能的子序列，创建这样的子序列的记录及其与数据库条目的对应关系，并执行以下任何一个或两者：唯一签名是通过确定对应于给定的数据库条目也不对应于至少一个其他数据库条目; 和/或发现给定子序列的多次出现，其中相应的数据库条目被分组成混淆集合。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类