Determining word boundary likelihoods in potentially incomplete text

    公开(公告)号:US09239888B1

    公开(公告)日:2016-01-19

    申请号:US14560091

    申请日:2014-12-04

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining word boundary likelihoods in potentially incomplete text. In one aspect, a method includes selecting query sequences from the query, each query sequence being at least a portion of a word n-gram, the word n-gram being a subsequence of up to n words selected from the second sequence of words of the query, and for each query sequence: determining one or more query sequence keys for the query sequence; determining at least one of a word boundary count and a non-word boundary count for each query sequence key, each word-boundary count and non-word boundary count being dependent on the context of the query sequence; and associating, in a data storage device, the at least one word boundary count and non-word boundary counts with each query sequence key.

    Determining word boundary likelihoods in potentially incomplete text
    2.
    发明授权
    Determining word boundary likelihoods in potentially incomplete text 有权
    确定潜在不完整文本中的字边界似然

    公开(公告)号:US08930399B1

    公开(公告)日:2015-01-06

    申请号:US13739591

    申请日:2013-01-11

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining word boundary likelihoods in potentially incomplete text. In one aspect, a method includes selecting query sequences from the query, each query sequence being at least a portion of a word n-gram, the word n-gram being a subsequence of up to n words selected from the second sequence of words of the query, and for each query sequence: determining one or more query sequence keys for the query sequence; determining at least one of a word boundary count and a non-word boundary count for each query sequence key, each word-boundary count and non-word boundary count being dependent on the context of the query sequence; and associating, in a data storage device, the at least one word boundary count and non-word boundary counts with each query sequence key.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于确定潜在不完整文本中的字边界可能性。 在一个方面,一种方法包括从查询中选择查询序列,每个查询序列是单词n-gram的至少一部分,单词n-gram是从第二个词序列中选择的多达n个词的子序列, 查询和每个查询序列:确定查询序列的一个或多个查询序列密钥; 确定每个查询序列密钥的字边界计数和非字边界计数中的至少一个,每个字边界计数和非字边界计数取决于查询序列的上下文; 以及在数据存储设备中,与每个查询序列密钥相关联所述至少一个字边界计数和非字边界计数。

Patent Agency Ranking