MINING WEB SEARCH USER BEHAVIOR TO ENHANCE WEB SEARCH RELEVANCE
    1.
    发明申请
    MINING WEB SEARCH USER BEHAVIOR TO ENHANCE WEB SEARCH RELEVANCE 审中-公开
    采矿网搜索用户行为来增强网页搜索的相关性

    公开(公告)号:US20070208730A1

    公开(公告)日:2007-09-06

    申请号:US11457733

    申请日:2006-07-14

    IPC分类号: G06F17/30

    CPC分类号: G06F16/337 G06F16/9535

    摘要: Systems and methods that estimate user preference, via automatic interpretation of user behavior. A user behavior component associated with a search engine can automatically interpret collective behavior of users (e.g., web search users). Such feedback component can include user behavior features and predictive models (e.g., from a user behavior component) that are robust to noise, which can be present in observed user interactions with the search results (e.g., malicious and/or irrational user activity.)

    摘要翻译: 通过用户行为的自动解释来估计用户偏好的系统和方法。 与搜索引擎相关联的用户行为组件可以自动解释用户(例如,网络搜索用户)的集体行为。 这样的反馈组件可以包括用户行为特征和对噪声鲁棒的预测模型(例如,来自用户行为组件),其可以存在于观察到的与搜索结果(例如,恶意和/或不合理的用户活动)的用户交互中。

    Utilizing information redundancy to improve text searches
    2.
    发明申请
    Utilizing information redundancy to improve text searches 失效
    利用信息冗余来改进文本搜索

    公开(公告)号:US20060116996A1

    公开(公告)日:2006-06-01

    申请号:US11336360

    申请日:2006-01-20

    IPC分类号: G06F17/30

    摘要: Architecture for improving text searches using information redundancy. A search component is coupled with an analysis component to rerank documents returned in a search according to a redundancy values. Each returned document is used to develop a corresponding word probability distribution that is further used to rerank the returned documents according to the associated redundancy values. In another aspect thereof, the query component is coupled with a projection component to project answer redundancy from one document search to another. This includes obtaining the benefit of considerable answer redundancy from a second data source by projecting the success of the search of the second data source against a first data source.

    摘要翻译: 使用信息冗余改进文本搜索的架构。 搜索组件与分析组件耦合,以根据冗余值重新排列在搜索中返回的文档。 每个返回的文档用于开发相应的字概率分布,其进一步用于根据相关联的冗余值重新排列返回的文档。 在另一方面,查询组件与投影组件耦合以将答复冗余从一个文档搜索投射到另一个。 这包括通过针对第一数据源投射搜索第二数据源的成功来从第二数据源获得相当多的应答冗余的好处。

    Cost-benefit approach to automatically composing answers to questions by extracting information from large unstructured corpora
    3.
    发明申请
    Cost-benefit approach to automatically composing answers to questions by extracting information from large unstructured corpora 有权
    通过从大型非结构化语料库中提取信息来自动构成问题答案的成本效益方法

    公开(公告)号:US20050033711A1

    公开(公告)日:2005-02-10

    申请号:US10635274

    申请日:2003-08-06

    摘要: The present invention relates to a system and methodology to facilitate extraction of information from a large unstructured corpora such as from the World Wide Web and/or other unstructured sources. Information in the form of answers to questions can be automatically composed from such sources via probabilistic models and cost-benefit analyses to guide resource-intensive information-extraction procedures employed by a knowledge-based question answering system. The analyses can leverage predictions of the ultimate quality of answers generated by the system provided by Bayesian or other statistical models. Such predictions, when coupled with a utility model can provide the system with the ability to make decisions about the number of queries issued to a search engine (or engines), given the cost of queries and the expected value of query results in refining an ultimate answer. Given a preference model, information extraction actions can be taken with the highest expected utility. In this manner, the accuracy of answers to questions can be balanced with the cost of information extraction and analysis to compose the answers.

    摘要翻译: 本发明涉及一种便利从诸如万维网和/或其他非结构化来源的大型非结构化语料库提取信息的系统和方法。 通过概率模型和成本效益分析,可以通过这些来源自动构成问题答案形式的信息,以指导基于知识的问答系统采用的资源密集型信息提取程序。 分析可以利用由贝叶斯或其他统计模型提供的系统生成的答案的最终质量的预测。 当与实用新型相结合时,这种预测可以为系统提供对发出给搜索引擎(或引擎)的查询数量的决定的能力,考虑到查询的成本和查询结果的期望值来提炼最终的 回答。 给定一个偏好模型,可以采用最高预期效用的信息提取动作。 以这种方式,可以将问题答案的准确性与信息提取和分析的成本进行平衡,以构成答案。

    COST-BENEFIT APPROACH TO AUTOMATICALLY COMPOSING ANSWERS TO QUESTIONS BY EXTRACTING INFORMATION FROM LARGE UNSTRUCTURED CORPORA

    公开(公告)号:US20060294037A1

    公开(公告)日:2006-12-28

    申请号:US11469136

    申请日:2006-08-31

    IPC分类号: G06N5/02 G06F17/00

    摘要: The present invention relates to a system and methodology to facilitate extraction of information from a large unstructured corpora such as from the World Wide Web and/or other unstructured sources. Information in the form of answers to questions can be automatically composed from such sources via probabilistic models and cost-benefit analyses to guide resource-intensive information-extraction procedures employed by a knowledge-based question answering system. The analyses can leverage predictions of the ultimate quality of answers generated by the system provided by Bayesian or other statistical models. Such predictions, when coupled with a utility model can provide the system with the ability to make decisions about the number of queries issued to a search engine (or engines), given the cost of queries and the expected value of query results in refining an ultimate answer. Given a preference model, information extraction actions can be taken with the highest expected utility. In this manner, the accuracy of answers to questions can be balanced with the cost of information extraction and analysis to compose the answers.

    SYSTEMS AND METHODS FOR IMPROVED SPELL CHECKING
    5.
    发明申请
    SYSTEMS AND METHODS FOR IMPROVED SPELL CHECKING 审中-公开
    改进的SPELL检查系统和方法

    公开(公告)号:US20070106937A1

    公开(公告)日:2007-05-10

    申请号:US11620171

    申请日:2007-01-05

    IPC分类号: G06F17/00

    摘要: The present invention leverages iterative transformations of search query strings along with statistics extracted from search query logs and/or web data to provide possible alternative spellings for the search query strings. This provides a spell checking means that can be influenced to provide individualized suggestions for each user. By utilizing search query logs, the present invention can account for substrings not found in a lexicon but still acceptable as a search query of interest. This allows a means to provide a higher quality proposal for alternative spellings, beyond the content of the lexicon. One instance of the present invention operates at a substring level by utilizing word unigram and/or bigram statistics extracted from query logs combined with an iterative search. This provides substantially better spelling alternatives for a given query than employing only substring matching. Other instances can receive input data from sources other than a search query input.

    摘要翻译: 本发明利用搜索查询字符串的迭代变换以及从搜索查询日志和/或web数据提取的统计信息,以提供用于搜索查询字符串的可能的备选拼写。 这提供了一个拼写检查手段,可以影响为每个用户提供个性化的建议。 通过利用搜索查询日志,本发明可以考虑在词典中未找到的但仍然被认为是感兴趣的搜索查询的子串。 这允许一种方法来提供更高质量的替代拼写提案,超出词汇内容。 本发明的一个实例通过利用与迭代搜索结合的查询记录中提取的单词和/或二进制统计信息,在子字符串级别上操作。 这为给定查询提供了比仅使用子串匹配更好的拼写替代方案。 其他实例可以从搜索查询输入以外的其他来源接收输入数据。

    Using popularity data for ranking
    6.
    发明申请
    Using popularity data for ranking 有权
    使用流行度数据进行排名

    公开(公告)号:US20070100824A1

    公开(公告)日:2007-05-03

    申请号:US11266026

    申请日:2005-11-03

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30864

    摘要: A unique ranking system and method that facilitates improving the ranking and ordering of objects to further enhance the quality, accuracy, and delivery of search results in response to a search query. The system and method involve monitoring and tracking an object in terms of the number of times it's been accessed and optionally by whom, when, for how long, and an access rate. The user's interaction with the object can be tracked as well. By tracking the objects, a popularity measure can be determined. Popularity based rankings can be computed based on the popularity measure or some function thereof. The popularity measure can be affected by the access time, who accessed it, access duration or the user's interaction with the object upon access. The popularity based rankings can be utilized by a search component to improve the quality and retrieval of search results.

    摘要翻译: 一种独特的排名系统和方法,有助于提高对象的排名和排序,以进一步提高搜索结果的质量,准确性和传递以响应搜索查询。 该系统和方法涉及根据访问次数来监视和跟踪对象,并且可选地由谁,何时,多长时间和访问速率来跟踪对象。 也可以跟踪用户与对象的交互。 通过跟踪对象,可以确定流行度量。 基于流行度的排名可以基于流行度量或其一些功能来计算。 流行度量可能受访问时间,访问时间,访问持续时间或用户与访问对象的交互的影响。 搜索组件可以利用基于流行度的排名来提高搜索结果的质量和检索。

    System and method for generating alternative search terms
    7.
    发明申请
    System and method for generating alternative search terms 审中-公开
    用于生成替代搜索项的系统和方法

    公开(公告)号:US20060161520A1

    公开(公告)日:2006-07-20

    申请号:US11034777

    申请日:2005-01-14

    IPC分类号: G06F17/30

    CPC分类号: G06F16/951 G06F16/3322

    摘要: A system and related techniques accepts user search or query terms over of the Internet or other network or connection. In addition to presenting regularly generated search results, according to embodiments of the invention the search engine and related logic may examine the search string for suggested refinements or improvements to the search terms, to attempt to derive improved results or results closer to the user's search intent. According to embodiments of the invention in one regard, the alternative search logic may attempt to extract related or more meaningful search terms from sources including past usage patterns by users, and other data. That alternative search logic may thus examine the user's search terms to determine a substring match to prior searches, for instance stored by the search host for all users. In embodiments, the alternative search logic may likewise present user search extensions or refinement paths selected by prior users running the same search, as an indicator of likely content or source relevance. In further embodiments, the alternative search logic may perform a reverse query lookup to trace queries which resulted in the same Web site or other hit, as the present search and present those other queries as possible alternatives for the user to pursue. These and other search refinements may be performed, taking advantage of usage patterns and other information to improve search quality beyond straightforward spelling-type correction.

    摘要翻译: 系统和相关技术通过互联网或其他网络或连接接受用户搜索或查询条款。 除了呈现定期生成的搜索结果之外,根据本发明的实施例,搜索引擎和相关逻辑可以检查搜索字符串以用于对搜索词的建议改进或改进,以尝试导出更接近用户搜索意图的改进的结果或结果 。 根据本发明的实施例,替代搜索逻辑可以尝试从源(包括用户的过去使用模式)和其他数据中提取相关或更有意义的搜索项。 因此,该替代搜索逻辑可以检查用户的搜索项以确定与先前搜索的子串匹配,例如由搜索主机为所有用户存储。 在实施例中,备选搜索逻辑可以同样呈现由运行相同搜索的先前用户选择的用户搜索扩展或细化路径,作为可能内容或源相关性的指示符。 在另外的实施例中,替代搜索逻辑可以执行反向查询查找以跟踪导致与当前搜索相同的网站或其他命中的查询,并将这些其他查询呈现为用户追求的可能替代方案。 可以利用使用模式和其他信息来提高这些和其他搜索优化,以提高超出直接拼写型校正的搜索质量。

    Application programming interface for text mining and search

    公开(公告)号:US20060101037A1

    公开(公告)日:2006-05-11

    申请号:US11172638

    申请日:2005-07-01

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30902

    摘要: Systems and methods are described that allow programmatic access to search engine results and query logs in a structured form. The search results can be retrieved from the search engine in an intermediary form that contains the information that is in the HTML pages provided to web browsers (potentially with additional information). This intermediary form can then be broken down on the client machine, using local resources, to assemble the structured objects. The library also provides for caching of the search results. This can be provided both on the local machine and on a remote database. When the results for a query exist in the caches, they can be retrieved from such location instead of querying the search engine. Documents and/or web pages can also be cached. The library can also be directed to operate only from the cache, effectively exposing a local data set instead of the remote search engine.

    Accounting for behavioral variability in web search
    9.
    发明授权
    Accounting for behavioral variability in web search 有权
    计算网络搜索中的行为变异性

    公开(公告)号:US07743047B2

    公开(公告)日:2010-06-22

    申请号:US11904103

    申请日:2007-09-26

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30867

    摘要: The concept of variability pertains to whether users exhibit consistent search interaction patterns, for example, in terms of interaction flow or information targeted. Methods are provided for analyzing variability, and then adapting search-related functionality (e.g., processes and/or interfaces) to account for variability characteristics, for example, to account for predictable search interaction behavior.

    摘要翻译: 可变性的概念涉及用户是否展示一致的搜索交互模式,例如,在交互流或信息目标方面。 提供了用于分析变异性的方法,然后使搜索相关功能(例如,过程和/或接口)适应于变异性特征,例如考虑到可预测的搜索交互行为。

    Spell checker with arbitrary length string-to-string transformations to improve noisy channel spelling correction
    10.
    发明授权
    Spell checker with arbitrary length string-to-string transformations to improve noisy channel spelling correction 失效
    具有任意长度的字符串到字符串转换的拼写检查器,以改善噪声通道拼写校正

    公开(公告)号:US07290209B2

    公开(公告)日:2007-10-30

    申请号:US11182388

    申请日:2005-07-15

    IPC分类号: G06N3/00

    CPC分类号: G06F17/273 G10L15/183

    摘要: A spell checker based on the noisy channel model has a source model and an error model. The source model determines how likely a word w in a dictionary is to have been generated. The error model determines how likely the word w was to have been incorrectly entered as the string s (e.g., mistyped or incorrectly interpreted by a speech recognition system) according to the probabilities of string-to-string edits. The string-to-string edits allow conversion of one arbitrary length character sequence to another arbitrary length character sequence.

    摘要翻译: 基于噪声通道模型的拼写检查器具有源模型和误差模型。 源模型确定字典中字w的生成可能性。 错误模型根据字符串到字符串编辑的概率确定字w被错误地输入为字符串s(例如,由语音识别系统错误地或不正确地解释)的可能性。 字符串到字符串的编辑允许将一个任意长度的字符序列转换为另一个任意长度的字符序列。