System and a Method for Generating Semantically Similar Sentences for Building a Robust SLM
    42.
    发明申请
    System and a Method for Generating Semantically Similar Sentences for Building a Robust SLM 有权
    用于生成语义类似句子的系统和方法,用于构建稳健的SLM

    公开(公告)号:US20130018649A1

    公开(公告)日:2013-01-17

    申请号:US13181923

    申请日:2011-07-13

    CPC classification number: G06F17/274 G06F17/2795 G06F17/2881 G10L15/26

    Abstract: A system and method are described for generating semantically similar sentences for a statistical language model. A semantic class generator determines for each word in an input utterance a set of corresponding semantically similar words. A sentence generator computes a set of candidate sentences each containing at most one member from each set of semantically similar words. A sentence verifier grammatically tests each candidate sentence to determine a set of grammatically correct sentences semantically similar to the input utterance. Also note that the generated semantically similar sentences are not restricted to be selected from an existing sentence database.

    Abstract translation: 描述了用于为统计语言模型生成语义上类似的句子的系统和方法。 语义类生成器确定输入语义中的每个单词一组相应的语义上相似的单词。 句子生成器从每个语义上相似的单词集合中计算出一组候选句子,每个候选句子最多包含一个成员。 句子验证器语法测试每个候选句子以确定一组语法上正确的句子,其语义上类似于输入的话语。 还要注意,生成的语义上相似的句子不限于从现有句子数据库中选择。

    E-MAIL THREAD HIERARCHY DETECTION
    43.
    发明申请
    E-MAIL THREAD HIERARCHY DETECTION 有权
    电子邮件螺纹层次检测

    公开(公告)号:US20120066227A1

    公开(公告)日:2012-03-15

    申请号:US12879454

    申请日:2010-09-10

    CPC classification number: G06Q10/107 G06F17/30946

    Abstract: A plurality of segments in an e-mail collection by parsing content of e-mails is generated. Corresponding segment signature for each segment is created and a signature index is populated using the generated segment signatures. After receiving a query e-mail, a plurality of query segments in the query e-mail is generated using content of the query e-mail and corresponding query segment signature for each query segment is generated. A query root segment is identified and corresponding query root segment signature is generated. A set of root segment signatures of the signature index is identified and the query root segment signature is compared with each root segment signature from the signature index. A subset of the signature index is identified, using a match between the root segment signature and the query root segment signature. An e-mail thread hierarchy is built using the identified subset of the signature index.

    Abstract translation: 生成通过解析电子邮件的内容的电子邮件集合中的多个片段。 创建每个段的相应段签名,并使用所生成的段签名来填充签名索引。 在接收到查询电子邮件之后,使用查询电子邮件的内容生成查询电子邮件中的多个查询段,并且生成每个查询段的相应查询段签名。 识别查询根段,生成对应的查询根段签名。 识别签名索引的一组根段签名,并将查询根分段签名与来自签名索引的每个根分段签名进行比较。 使用根段签名和查询根段签名之间的匹配来标识签名索引的子集。 使用识别的签名索引的子集构建电子邮件线程层次结构。

    METHOD FOR ASSESSING PRONUNCIATION ABILITIES
    44.
    发明申请
    METHOD FOR ASSESSING PRONUNCIATION ABILITIES 有权
    评估发明能力的方法

    公开(公告)号:US20090171661A1

    公开(公告)日:2009-07-02

    申请号:US12147898

    申请日:2008-06-27

    CPC classification number: G09B19/04 G10L15/26

    Abstract: Techniques for assessing pronunciation abilities of a user are provided. The techniques include recording a sentence spoken by a user, performing a classification of the spoken sentence, wherein the classification is performed with respect to at least one N-ordered class, and wherein the spoken sentence is represented by a set of at least one acoustic feature extracted from the spoken sentence, and determining a score based on the classification, wherein the score is used to determine an optimal set of at least one question to assess pronunciation ability of the user without human intervention.

    Abstract translation: 提供了用于评估用户发音能力的技术。 这些技术包括记录用户说出的句子,执行口语句子的分类,其中相对于至少一个N阶类执行分类,并且其中所述口语句子由一组至少一个声学 从所述口语句子中提取的特征,以及基于所述分类来确定得分,其中所述分数用于确定至少一个问题的最佳集合以评估用户的语音能力,而无需人为干预。

    Methods, apparatus and computer programs for characterizing web resources
    45.
    发明授权
    Methods, apparatus and computer programs for characterizing web resources 失效
    用于表征网络资源的方法,设备和计算机程序

    公开(公告)号:US07516397B2

    公开(公告)日:2009-04-07

    申请号:US10901275

    申请日:2004-07-28

    CPC classification number: G06F17/30864 G06F17/30896

    Abstract: Methods, apparatus and computer programs are provided for characterizing Web-based information resources based on their interactions. A Web-based information resource is a single Web document or a collection of related Web documents. Unlike simple text documents, Web documents contain hyperlinks and other HTML tags. Different types of interactions, including inbound hyperlinks, outbound hyperlinks and internal links associated with a Web-based information resource, are used to characterize the Web-based information resource. A DOM tree representing the tag structure of a Web-based information resource is used to identify text items likely to be useful as context for a hyperlink anchor text, and the anchor text is combined with the context to generate a representation. The representation of Web-based information resources based on interactions can be used for clustering and classification, and in Web mining applications such as query disambiguation and automatic taxonomy generation.

    Abstract translation: 提供方法,装置和计算机程序,用于基于它们的相互作用来表征基于Web的信息资源。 基于Web的信息资源是单个Web文档或相关Web文档的集合。 与简单的文本文档不同,Web文档包含超链接和其他HTML标签。 使用不同类型的交互,包括入站超链接,出站超链接和与基于Web的信息资源相关联的内部链接,用于表征基于Web的信息资源。 代表基于Web的信息资源的标签结构的DOM树用于识别可能作为超链接锚文本的上下文有用的文本项,并且锚文本与上下文组合以生成表示。 基于互动的基于Web的信息资源的表示可以用于聚类和分类,以及Web挖掘应用程序,如查询消歧和自动分类法生成。

Patent Agency Ranking