Lattice-based unsupervised maximum likelihood linear regression for speaker adaptation
    1.
    发明授权
    Lattice-based unsupervised maximum likelihood linear regression for speaker adaptation 有权
    用于说话者适应的基于格子的无监督最大似然线性回归

    公开(公告)号:US07216077B1

    公开(公告)日:2007-05-08

    申请号:US09670251

    申请日:2000-09-26

    IPC分类号: G10L15/06 G10L15/14

    CPC分类号: G10L15/065

    摘要: Methods and arrangements using lattice-based information for unsupervised speaker adaptation. By performing adaptation against a word lattice, correct models are more likely to be used in estimating a transform. Further, a particular type of lattice proposed herein enables the use of a natural confidence measure given by the posterior occupancy probability of a state, that is, the statistics of a particular state will be updated with the current frame only if the a posteriori probability of the state at that particular time is greater than a predetermined threshold.

    摘要翻译: 使用基于网格的信息进行无监督的演讲者适应的方法和安排。 通过对单词格进行调整,正确的模型更有可能用于估计变换。 此外,本文中提出的特定类型的晶格使得能够使用由状态的后占用概率给出的自然置信度度量,即,仅当前一帧的后验概率 该特定时间的状态大于预定阈值。

    Information extraction from documents with regular expression matching
    4.
    发明授权
    Information extraction from documents with regular expression matching 有权
    从具有正则表达式匹配的文档中提取信息

    公开(公告)号:US06842796B2

    公开(公告)日:2005-01-11

    申请号:US09898289

    申请日:2001-07-03

    IPC分类号: G06F17/27 G10L15/00 G06F3/00

    摘要: Techniques are provided for enumerating regularly identifiable or stereotypical phrases that people commonly use to convey particular information, and where exactly in these phrases the particular information is to be found. In one embodiment, such phrases are referred to as “regular expressions.” Using such enumerated phrases, the invention is able to automatically identify them in an input data stream and then identify and extract the particular information associated with the phrase that is being sought, e.g., important or relevant information.

    摘要翻译: 提供了用于列举人们通常用于传达特定信息的常规可识别或定型短语的技术,以及在这些短语中究竟在哪里找到特定信息。 在一个实施例中,这样的短语被称为“正则表达式”。 使用这样的列举的短语,本发明能够在输入数据流中自动识别它们,然后识别和提取与正在寻找的短语(例如重要或相关信息)相关联的特定信息。

    Structured models of repetition for speech recognition
    5.
    发明授权
    Structured models of repetition for speech recognition 有权
    用于语音识别的重复结构化模型

    公开(公告)号:US08965765B2

    公开(公告)日:2015-02-24

    申请号:US12233826

    申请日:2008-09-19

    IPC分类号: G10L15/00 G10L15/18

    CPC分类号: G10L15/1822

    摘要: Described is a technology by which a structured model of repetition is used to determine the words spoken by a user, and/or a corresponding database entry, based in part on a prior utterance. For a repeated utterance, a joint probability analysis is performed on (at least some of) the corresponding word sequences as recognized by one or more recognizers) and associated acoustic data. For example, a generative probabilistic model, or a maximum entropy model may be used in the analysis. The second utterance may be a repetition of the first utterance using the exact words, or another structural transformation thereof relative to the first utterance, such as an extension that adds one or more words, a truncation that removes one or more words, or a whole or partial spelling of one or more words.

    摘要翻译: 描述了一种技术,通过该技术,部分地基于先前的话语,使用结构化重复模型来确定用户说出的单词和/或相应的数据库条目。 对于重复的话语,对由一个或多个识别器识别的相应字序列(和至少一些)和相关联的声学数据进行联合概率分析。 例如,可以在分析中使用生成概率模型或最大熵模型。 第二个发音可以是使用精确的单词或相对于第一个发音的其他结构变换的第一个发音的重复,例如添加一个或多个单词的扩展,删除一个或多个单词的截断或整个 或一个或多个单词的部分拼写。

    DETERMINING SYNONYM-ANTONYM POLARITY IN TERM VECTORS
    7.
    发明申请
    DETERMINING SYNONYM-ANTONYM POLARITY IN TERM VECTORS 审中-公开
    确定定时矢量中的同步聚焦极化

    公开(公告)号:US20140067368A1

    公开(公告)日:2014-03-06

    申请号:US13597277

    申请日:2012-08-29

    IPC分类号: G06F17/27

    摘要: A document-term matrix may be generated based on a corpus. A term representation matrix may be generated based on modifying a plurality of elements of the document-term matrix based on antonym information included in the corpus. Similarities may be determined based on a plurality of elements of the term representation matrix.

    摘要翻译: 可以基于语料库生成文档术语矩阵。 可以基于基于语料库中包含的反义词信息修改文档项矩阵的多个元素来生成术语表示矩阵。 可以基于术语表示矩阵的多个元素来确定相似度。

    Method for clustering closely resembling data objects
    8.
    发明授权
    Method for clustering closely resembling data objects 有权
    聚类非常类似于数据对象的方法

    公开(公告)号:US06349296B1

    公开(公告)日:2002-02-19

    申请号:US09642017

    申请日:2000-08-21

    IPC分类号: G06F1730

    摘要: A computer-implemented method determines the resemblance of data objects such as Web pages. Each data object is partitioned into a sequence of tokens. The tokens are grouped into overlapping sets of the tokens to form shingles. Each shingle is represented by a unique identification element encoded as a fingerprint. A minimum element from each of the images of the set of fingerprints associated with a document under each of a plurality of pseudo random permutations of the set of all fingerprints are selected to generate a sketch of each data object. The sketches characterize the resemblance of the data objects. The sketches can be further partitioned into a plurality of groups. Each group is fingerprinted to form a feature. Data objects that share more than a certain numbers of features are estimated to be nearly identical.

    摘要翻译: 计算机实现的方法确定诸如网页之类的数据对象的相似性。 每个数据对象被分成令牌序列。 令牌被分组成重叠的令牌组以形成带状疱疹。 每个瓦片由编码为指纹的唯一识别元件表示。 选择与所有指纹集合的多个伪随机排列中的每一个下的文档相关联的指纹集合的每个图像的最小元素以生成每个数据对象的草图。 草图描绘了数据对象的相似之处。 草图可以进一步划分成多个组。 每组都有指纹识别功能。 共享超过一定数量特征的数据对象估计几乎相同。

    THREE-DIMENSIONAL OBJECT BROWSING IN DOCUMENTS
    9.
    发明申请
    THREE-DIMENSIONAL OBJECT BROWSING IN DOCUMENTS 有权
    文件中的三维对象浏览

    公开(公告)号:US20140037218A1

    公开(公告)日:2014-02-06

    申请号:US13567105

    申请日:2012-08-06

    IPC分类号: G06K9/68

    CPC分类号: G06F17/30268

    摘要: A document that includes a representation of a two-dimensional (2-D) image may be obtained. A selection indicator indicating a selection of at least a portion of the 2-D image may be obtained. A match correspondence may be determined between the selected portion of the 2-D image and a three-dimensional (3-D) image object stored in an object database, the match correspondence based on a web crawler analysis result. A 3-D rendering of the 3-D image object that corresponds to the selected portion of the 2-D image may be initiated.

    摘要翻译: 可以获得包括二维(2-D)图像的表示的文档。 可以获得指示选择2-D图像的至少一部分的选择指示符。 可以在2-D图像的所选部分和存储在对象数据库中的三维(3-D)图像对象之间确定匹配对应关系,该匹配对应基于网络爬行器分析结果。 可以启动对应于2-D图像的所选部分的3-D图像对象的3-D渲染。

    STRUCTURED MODELS OF REPITITION FOR SPEECH RECOGNITION
    10.
    发明申请
    STRUCTURED MODELS OF REPITITION FOR SPEECH RECOGNITION 有权
    用于语音识别的结构化复制模型

    公开(公告)号:US20100076765A1

    公开(公告)日:2010-03-25

    申请号:US12233826

    申请日:2008-09-19

    IPC分类号: G10L15/00

    CPC分类号: G10L15/1822

    摘要: Described is a technology by which a structured model of repetition is used to determine the words spoken by a user, and/or a corresponding database entry, based in part on a prior utterance. For a repeated utterance, a joint probability analysis is performed on (at least some of) the corresponding word sequences as recognized by one or more recognizers) and associated acoustic data. For example, a generative probabilistic model, or a maximum entropy model may be used in the analysis. The second utterance may be a repetition of the first utterance using the exact words, or another structural transformation thereof relative to the first utterance, such as an extension that adds one or more words, a truncation that removes one or more words, or a whole or partial spelling of one or more words.

    摘要翻译: 描述了一种技术,通过该技术,部分地基于先前的话语,使用结构化重复模型来确定用户说出的单词和/或相应的数据库条目。 对于重复的话语,对由一个或多个识别器识别的相应字序列(和至少一些)和相关联的声学数据进行联合概率分析。 例如,可以在分析中使用生成概率模型或最大熵模型。 第二个发音可以是使用精确的单词或相对于第一个发音的其他结构变换的第一个发音的重复,例如添加一个或多个单词的扩展,删除一个或多个单词的截断或整个 或一个或多个单词的部分拼写。