Semantic and Text Matching Techniques for Network Search
    21.
    发明申请
    Semantic and Text Matching Techniques for Network Search 有权
    网络搜索的语义和文本匹配技术

    公开(公告)号:US20110072021A1

    公开(公告)日:2011-03-24

    申请号:US12563357

    申请日:2009-09-21

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864

    摘要: In one embodiment, access a search query comprising one or more query words, at least one of the query words representing one or more query concepts; access a network document identified for a search query by a search engine, the network document comprising one or more document words, at least one of the document words representing one or more document concepts; semantic-text match the search query and the network document to determine one or more negative semantic-text matches; and construct one or more negative features based on the negative semantic-text matches.

    摘要翻译: 在一个实施例中,访问包括一个或多个查询词的搜索查询,表示一个或多个查询概念的查询词中的至少一个; 访问由搜索引擎识别为搜索查询的网络文档,所述网络文档包括一个或多个文档字,所述文档字中的至少一个表示一个或多个文档概念; 语义文本匹配搜索查询和网络文档以确定一个或多个否定语义文本匹配; 并基于负面语义文本匹配构造一个或多个负面特征。

    ABBREVIATION HANDLING IN WEB SEARCH
    22.
    发明申请
    ABBREVIATION HANDLING IN WEB SEARCH 有权
    网页搜索缩减处理

    公开(公告)号:US20110010353A1

    公开(公告)日:2011-01-13

    申请号:US12884708

    申请日:2010-09-17

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30672

    摘要: A method for handling abbreviations in web queries includes building a dictionary of possible word expansions for potential abbreviations related to query terms received and anticipated to be received by a search engine; accepting a query including an abbreviation from a searching user, where a probability of finding a most probably-correct expansion in the dictionary is a first probability, and a probability that the expansion is the abbreviation itself is a second probability; determining a ratio between the first and second probabilities; expanding the abbreviation in accordance with the most probably-correct expansion when the ratio is above a first threshold value; and highlighting the abbreviation with a suggested expansion of the most probably-correct expansion for the user so that the user may accept the suggested expansion when the ratio is between a second, lower threshold value and the first threshold value.

    摘要翻译: 一种用于处理网页查询中的缩写的方法包括为与搜索引擎接收并预期接收到的查询词相关的潜在缩写构建可能的词扩展字典; 接受包括来自搜索用户的缩写的查询,其中发现字典中最可能正确的扩展的概率是第一概率,并且扩展是缩写本身的概率是第二概率; 确定第一和第二概率之间的比率; 当比率高于第一阈值时,根据最可能正确的扩展扩展缩写; 并且突出显示缩写,其中建议扩展用户的最可能正确的扩展,使得当比率在第二阈值和下限阈值之间时,用户可以接受建议的扩展。

    SYSTEM AND METHOD FOR IMPROVED SEARCH RELEVANCE USING PROXIMITY BOOSTING
    23.
    发明申请
    SYSTEM AND METHOD FOR IMPROVED SEARCH RELEVANCE USING PROXIMITY BOOSTING 审中-公开
    使用接近推进来改进搜索相关性的系统和方法

    公开(公告)号:US20100191758A1

    公开(公告)日:2010-07-29

    申请号:US12360008

    申请日:2009-01-26

    IPC分类号: G06F17/30

    CPC分类号: G06F16/951 G06F16/353

    摘要: A system and method for improved search relevance using proximity boosting. A query for a web search is received from a user, via a network, wherein the query comprises a plurality of query tokens. One or more concepts are identified in the query wherein each of concepts comprises at least two query tokens. A relative concept strength is determined for each of the identified concepts. The query is then rewritten for submission to a search engine wherein for each of the one or more concepts, a syntax rule associated with the respective relative concept strength of the concept is applied to the query tokens comprising the concept such that the rewritten query represents the one or more concepts whereby the proximity of the one or more concepts in a search result returned by the search engine to the user in response to the rewritten query is boosted.

    摘要翻译: 一种使用邻近度增强来提高搜索相关性的系统和方法。 从用户经由网络接收到针对web搜索的查询,其中所述查询包括多个查询令牌。 在查询中识别一个或多个概念,其中每个概念包括至少两个查询令牌。 确定每个识别的概念的相对概念强度。 然后,该查询被重写以提交给搜索引擎,其中对于一个或多个概念中的每一个,与概念的相应相对概念强度相关联的语法规则被应用于包括概念的查询令牌,使得重写的查询表示 提高了一个或多个概念,由此响应于重写的查询,搜索引擎向用户返回的搜索结果中的一个或多个概念的接近度被提升。

    SYSTEM AND METHOD FOR RANKING WEB SEARCHES WITH QUANTIFIED SEMANTIC FEATURES
    24.
    发明申请
    SYSTEM AND METHOD FOR RANKING WEB SEARCHES WITH QUANTIFIED SEMANTIC FEATURES 审中-公开
    使用量化的语义特征排序网页搜索的系统和方法

    公开(公告)号:US20100191740A1

    公开(公告)日:2010-07-29

    申请号:US12360016

    申请日:2009-01-26

    IPC分类号: G06F17/30

    CPC分类号: G06F16/9535

    摘要: A system and method for ranking web searches with quantified semantic features. A query for a web search is received from a user. The query is segmented and tagged into one or more linguistic segments using linguistic analysis. At least some of the linguistic segments are tagged with a linguistic type. A query execution plan is generated comprising the linguistic segments and, for each of the linguistic segments tagged with a linguistic type, at least one tag attribute comprising at least one domain specific feature of the linguistic type. A search is performed for documents matching the query. Each of the documents is scored for each of the linguistic segments of the query execution plan using the tag attributes of the respective linguistic segment. The documents are ranked using a function that uses the scores of the documents. A ranked list of the documents is transmitted back to the user.

    摘要翻译: 一种用量化语义特征对网页搜索进行排名的系统和方法。 从用户接收到对网页搜索的查询。 使用语言分析将查询分段并标记为一个或多个语言段。 至少一些语言段被用语言类型标记。 生成包括语言段的查询执行计划,并且对于每个具有语言类型的语言段,至少包括语言类型的至少一个域特定特征的标签属性。 对与查询匹配的文档执行搜索。 使用相应语言段的标签属性对查询执行计划中的每个语言段进行每个文档的评分。 使用使用文档分数的函数对文档进行排名。 将文档的排名列表传回给用户。

    TOPICAL RANKING IN INFORMATION RETRIEVAL
    25.
    发明申请
    TOPICAL RANKING IN INFORMATION RETRIEVAL 审中-公开
    信息检索中的主题排名

    公开(公告)号:US20100185623A1

    公开(公告)日:2010-07-22

    申请号:US12354533

    申请日:2009-01-15

    IPC分类号: G06F17/30

    CPC分类号: G06F16/951 G06F16/334

    摘要: An aggregate ranking model is generated, which comprises a general ranking model and one or more topical training models. Each topical ranking model is associated with a topic, or topic class, and for use in ranking search result items determined to belong to the topic, or topic class. As one example, the topical ranking model is trained using a set of topical training data, e.g., training data determined to belong to the topic, or topic class, a general ranking model and a residue, or error, determined from a general ranking generated by the general ranking model for the topical training data, with the topical ranking model being trained to minimize the general ranking model's error in the aggregate ranking model.

    摘要翻译: 产生一个综合排名模型,其中包括一般排名模型和一个或多个主题训练模型。 每个主题排名模型与主题或主题类相关联,并且用于对确定属于主题或主题类的搜索结果项进行排名。 作为一个示例,使用一组主题训练数据训练主题排名模型,例如,确定为属于主题的训练数据,或主题类别,一般排名模型和残差或错误,其从生成的一般排名确定 通过主题训练数据的一般排名模型,对主题排名模型进行训练,以最小化总排名模型在总体排名模型中的误差。

    SELECTIVE TERM WEIGHTING FOR WEB SEARCH BASED ON AUTOMATIC SEMANTIC PARSING
    26.
    发明申请
    SELECTIVE TERM WEIGHTING FOR WEB SEARCH BASED ON AUTOMATIC SEMANTIC PARSING 审中-公开
    基于自动语义分析的网络搜索选择性加权

    公开(公告)号:US20100114878A1

    公开(公告)日:2010-05-06

    申请号:US12256371

    申请日:2008-10-22

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F16/334

    摘要: A method is provided for selecting relevant documents returned from a search query. When a search engine finds search terms in documents, the document score is based on the frequency of the occurrence of those terms, the category of the term, and the section of the document in which the term is found. Each (category type, document section) pair is assigned a weight that is used to modify the contribution of term frequency. The weights are determined in an offline process using historical data and human validation. Through this empirical process, the weight assignments are made to correlate high relevance scores with documents that humans would find relevant to a search query.

    摘要翻译: 提供了一种用于选择从搜索查询返回的相关文档的方法。 当搜索引擎在文档中找到搜索词时,文档分数基于这些术语的发生频率,术语的类别以及找到该术语的文档的部分。 每个(类别类型,文档部分)对被分配一个权重,用于修改术语频率的贡献。 权重是使用历史数据和人类验证在离线过程中确定的。 通过这个经验过程,进行权重分配以将高相关性分数与人类将会发现与搜索查询相关的文档相关联。

    AUTOMATIC QUERY CONCEPTS IDENTIFICATION AND DRIFTING FOR WEB SEARCH
    27.
    发明申请
    AUTOMATIC QUERY CONCEPTS IDENTIFICATION AND DRIFTING FOR WEB SEARCH 审中-公开
    自动查询概念识别和网络搜索

    公开(公告)号:US20100094835A1

    公开(公告)日:2010-04-15

    申请号:US12252220

    申请日:2008-10-15

    IPC分类号: G06F7/06 G06F17/30 G06N5/02

    摘要: Techniques are described for automatically determining which terms in a search query may be augmented by contextually similar terms such that more relevant results can be displayed to a user. Contextually similar words are determined based on training data, including a web corpus and a query log. Once contextually similar words are determined, they may be inserted into a search query and used to find more relevant results. Consequently, documents that contain helpful information but may not have exact word matches may be found more readily by a search engine.

    摘要翻译: 描述了用于自动确定搜索查询中哪些术语可以通过上下文相似术语来增强的技术,使得可以向用户显示更相关的结果。 基于训练数据确定上下文相似的词,包括网络语料库和查询日志。 一旦上下文相似的词被确定,它们可以被插入到搜索查询中并用于找到更相关的结果。 因此,搜索引擎可以更容易地找到包含有用信息但可能没有确切字词匹配的文档。

    NAME VERIFICATION USING MACHINE LEARNING
    28.
    发明申请
    NAME VERIFICATION USING MACHINE LEARNING 审中-公开
    使用机器学习的名称验证

    公开(公告)号:US20090248595A1

    公开(公告)日:2009-10-01

    申请号:US12060154

    申请日:2008-03-31

    IPC分类号: G06F15/18

    CPC分类号: G06F17/2765

    摘要: Computer-enabled methods, apparatus, and computer-readable media are provided for verifying that a given network name, such as a URL, is an official, e.g., registered, approved, or otherwise officially recognized, network name that refers to or identifies a principal, such as a business. These techniques involve receiving a principal name and a given network name, receiving at least one feature attribute from at least one database of feature attributes, wherein the at least one feature attribute comprises a characteristic of the principal name or a characteristic of the network name, and invoking a logistic regression method to generate a probability, based upon the at least one feature attribute, that the given network name is an official network name for the principal name. The logistic regression method may include a gradient boosting tree model that generates the probability based upon the at least one feature attribute.

    摘要翻译: 提供了计算机启用的方法,装置和计算机可读介质,用于验证给定的网络名称(例如URL)是官方的,例如,已注册,批准或以其他官方认可的网络名称,其指代或识别 校长,如企业。 这些技术包括接收主体名称和给定的网络名称,从特征属性的至少一个数据库接收至少一个特征属性,其中所述至少一个特征属性包括主体名称的特征或网络名称的特性, 以及调用逻辑回归方法以基于所述至少一个特征属性生成所述给定网络名称是所述主体名称的正式网络名称的概率。 逻辑回归方法可以包括基于至少一个特征属性生成概率的梯度增强树模型。

    Method and apparatus for generating a speech-recognition-based call-routing system
    29.
    发明授权
    Method and apparatus for generating a speech-recognition-based call-routing system 有权
    用于生成基于语音识别的呼叫路由系统的方法和装置

    公开(公告)号:US07206389B1

    公开(公告)日:2007-04-17

    申请号:US10753590

    申请日:2004-01-07

    IPC分类号: H04M1/64

    摘要: A computerized method is provided for electronically directing a call to a class, such that an utterance spoken by a speaker and received by a call-routing system is classified by the call-routing system as being associated with the class, such that the call-routing system includes a speech-recognition module, a feature-extraction module, and a classification module. The method includes extracting features from recognized speech; weighting elements of a feature vector with respective speech-recognition scores, wherein each weighting element is associated with one of the features; ranking classes to which the features are associated; and electronically directing the call to a highest-ranking class.

    摘要翻译: 提供了一种用于将呼叫电子地引导到类的计算机化方法,使得由呼叫路由系统接收的话音讲话由呼叫路由系统分类为与该类相关联,使得呼叫 - 路由系统包括语音识别模块,特征提取模块和分类模块。 该方法包括从识别的语音中提取特征; 具有相应语音识别分数的特征向量的加权元素,其中每个加权元素与特征之一相关联; 功能相关联的排名级别; 并将电话以电子方式引导到最高排名的班级。