Accounting for behavioral variability in web search
    1.
    发明授权
    Accounting for behavioral variability in web search 有权
    计算网络搜索中的行为变异性

    公开(公告)号:US07743047B2

    公开(公告)日:2010-06-22

    申请号:US11904103

    申请日:2007-09-26

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30867

    摘要: The concept of variability pertains to whether users exhibit consistent search interaction patterns, for example, in terms of interaction flow or information targeted. Methods are provided for analyzing variability, and then adapting search-related functionality (e.g., processes and/or interfaces) to account for variability characteristics, for example, to account for predictable search interaction behavior.

    摘要翻译: 可变性的概念涉及用户是否展示一致的搜索交互模式,例如,在交互流或信息目标方面。 提供了用于分析变异性的方法,然后使搜索相关功能(例如,过程和/或接口)适应于变异性特征,例如考虑到可预测的搜索交互行为。

    Question answering over structured content on the web
    2.
    发明申请
    Question answering over structured content on the web 失效
    在网络上回答结构化内容的问题

    公开(公告)号:US20070094285A1

    公开(公告)日:2007-04-26

    申请号:US11256503

    申请日:2005-10-21

    IPC分类号: G06F7/00

    摘要: Structured content and associated metadata from the Web are leveraged to provide specific answer string responses to user questions. The structured content can also be indexed at crawl-time to facilitate searching of the content at search-time. Ranking techniques can also be employed to facilitate in providing an optimum answer string and/or a top K list of answer strings for a query. Ranking can be based on trainable algorithms that utilize feature vectors for candidate answer strings. In one instance, at crawl-time, structured content is indexed and automatically associated with metadata relating to the structured content and the source web page. At search-time, candidate indexed structured content is then utilized to extract an appropriate answer string in response to a user query.

    摘要翻译: 来自网络的结构化内容和相关元数据被用来提供用户问题的特定答案字符串响应。 结构化内容还可以在爬行时间进行索引,以便于搜索时搜索内容。 也可以采用排名技术来促进为查询提供最佳答案字符串和/或回答字符串的顶部K列表。 排名可以基于利用候选答案字符串的特征向量的可训练算法。 在一个实例中,在爬行时,结构化内容被索引并且与结构化内容和源网页相关联的元数据自动关联。 在搜索时间,然后利用候选索引的结构化内容来提取响应于用户查询的适当答案字符串。

    MINING WEB SEARCH USER BEHAVIOR TO ENHANCE WEB SEARCH RELEVANCE
    3.
    发明申请
    MINING WEB SEARCH USER BEHAVIOR TO ENHANCE WEB SEARCH RELEVANCE 审中-公开
    采矿网搜索用户行为来增强网页搜索的相关性

    公开(公告)号:US20070208730A1

    公开(公告)日:2007-09-06

    申请号:US11457733

    申请日:2006-07-14

    IPC分类号: G06F17/30

    CPC分类号: G06F16/337 G06F16/9535

    摘要: Systems and methods that estimate user preference, via automatic interpretation of user behavior. A user behavior component associated with a search engine can automatically interpret collective behavior of users (e.g., web search users). Such feedback component can include user behavior features and predictive models (e.g., from a user behavior component) that are robust to noise, which can be present in observed user interactions with the search results (e.g., malicious and/or irrational user activity.)

    摘要翻译: 通过用户行为的自动解释来估计用户偏好的系统和方法。 与搜索引擎相关联的用户行为组件可以自动解释用户(例如,网络搜索用户)的集体行为。 这样的反馈组件可以包括用户行为特征和对噪声鲁棒的预测模型(例如,来自用户行为组件),其可以存在于观察到的与搜索结果(例如,恶意和/或不合理的用户活动)的用户交互中。

    SYSTEMS AND METHODS FOR IMPROVED SPELL CHECKING
    4.
    发明申请
    SYSTEMS AND METHODS FOR IMPROVED SPELL CHECKING 审中-公开
    改进的SPELL检查系统和方法

    公开(公告)号:US20070106937A1

    公开(公告)日:2007-05-10

    申请号:US11620171

    申请日:2007-01-05

    IPC分类号: G06F17/00

    摘要: The present invention leverages iterative transformations of search query strings along with statistics extracted from search query logs and/or web data to provide possible alternative spellings for the search query strings. This provides a spell checking means that can be influenced to provide individualized suggestions for each user. By utilizing search query logs, the present invention can account for substrings not found in a lexicon but still acceptable as a search query of interest. This allows a means to provide a higher quality proposal for alternative spellings, beyond the content of the lexicon. One instance of the present invention operates at a substring level by utilizing word unigram and/or bigram statistics extracted from query logs combined with an iterative search. This provides substantially better spelling alternatives for a given query than employing only substring matching. Other instances can receive input data from sources other than a search query input.

    摘要翻译: 本发明利用搜索查询字符串的迭代变换以及从搜索查询日志和/或web数据提取的统计信息,以提供用于搜索查询字符串的可能的备选拼写。 这提供了一个拼写检查手段,可以影响为每个用户提供个性化的建议。 通过利用搜索查询日志,本发明可以考虑在词典中未找到的但仍然被认为是感兴趣的搜索查询的子串。 这允许一种方法来提供更高质量的替代拼写提案,超出词汇内容。 本发明的一个实例通过利用与迭代搜索结合的查询记录中提取的单词和/或二进制统计信息,在子字符串级别上操作。 这为给定查询提供了比仅使用子串匹配更好的拼写替代方案。 其他实例可以从搜索查询输入以外的其他来源接收输入数据。

    Using popularity data for ranking
    5.
    发明申请
    Using popularity data for ranking 有权
    使用流行度数据进行排名

    公开(公告)号:US20070100824A1

    公开(公告)日:2007-05-03

    申请号:US11266026

    申请日:2005-11-03

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30864

    摘要: A unique ranking system and method that facilitates improving the ranking and ordering of objects to further enhance the quality, accuracy, and delivery of search results in response to a search query. The system and method involve monitoring and tracking an object in terms of the number of times it's been accessed and optionally by whom, when, for how long, and an access rate. The user's interaction with the object can be tracked as well. By tracking the objects, a popularity measure can be determined. Popularity based rankings can be computed based on the popularity measure or some function thereof. The popularity measure can be affected by the access time, who accessed it, access duration or the user's interaction with the object upon access. The popularity based rankings can be utilized by a search component to improve the quality and retrieval of search results.

    摘要翻译: 一种独特的排名系统和方法,有助于提高对象的排名和排序,以进一步提高搜索结果的质量,准确性和传递以响应搜索查询。 该系统和方法涉及根据访问次数来监视和跟踪对象,并且可选地由谁,何时,多长时间和访问速率来跟踪对象。 也可以跟踪用户与对象的交互。 通过跟踪对象,可以确定流行度量。 基于流行度的排名可以基于流行度量或其一些功能来计算。 流行度量可能受访问时间,访问时间,访问持续时间或用户与访问对象的交互的影响。 搜索组件可以利用基于流行度的排名来提高搜索结果的质量和检索。

    System and method for generating alternative search terms
    6.
    发明申请
    System and method for generating alternative search terms 审中-公开
    用于生成替代搜索项的系统和方法

    公开(公告)号:US20060161520A1

    公开(公告)日:2006-07-20

    申请号:US11034777

    申请日:2005-01-14

    IPC分类号: G06F17/30

    CPC分类号: G06F16/951 G06F16/3322

    摘要: A system and related techniques accepts user search or query terms over of the Internet or other network or connection. In addition to presenting regularly generated search results, according to embodiments of the invention the search engine and related logic may examine the search string for suggested refinements or improvements to the search terms, to attempt to derive improved results or results closer to the user's search intent. According to embodiments of the invention in one regard, the alternative search logic may attempt to extract related or more meaningful search terms from sources including past usage patterns by users, and other data. That alternative search logic may thus examine the user's search terms to determine a substring match to prior searches, for instance stored by the search host for all users. In embodiments, the alternative search logic may likewise present user search extensions or refinement paths selected by prior users running the same search, as an indicator of likely content or source relevance. In further embodiments, the alternative search logic may perform a reverse query lookup to trace queries which resulted in the same Web site or other hit, as the present search and present those other queries as possible alternatives for the user to pursue. These and other search refinements may be performed, taking advantage of usage patterns and other information to improve search quality beyond straightforward spelling-type correction.

    摘要翻译: 系统和相关技术通过互联网或其他网络或连接接受用户搜索或查询条款。 除了呈现定期生成的搜索结果之外,根据本发明的实施例,搜索引擎和相关逻辑可以检查搜索字符串以用于对搜索词的建议改进或改进,以尝试导出更接近用户搜索意图的改进的结果或结果 。 根据本发明的实施例,替代搜索逻辑可以尝试从源(包括用户的过去使用模式)和其他数据中提取相关或更有意义的搜索项。 因此,该替代搜索逻辑可以检查用户的搜索项以确定与先前搜索的子串匹配,例如由搜索主机为所有用户存储。 在实施例中,备选搜索逻辑可以同样呈现由运行相同搜索的先前用户选择的用户搜索扩展或细化路径,作为可能内容或源相关性的指示符。 在另外的实施例中,替代搜索逻辑可以执行反向查询查找以跟踪导致与当前搜索相同的网站或其他命中的查询,并将这些其他查询呈现为用户追求的可能替代方案。 可以利用使用模式和其他信息来提高这些和其他搜索优化,以提高超出直接拼写型校正的搜索质量。

    Application programming interface for text mining and search

    公开(公告)号:US20060101037A1

    公开(公告)日:2006-05-11

    申请号:US11172638

    申请日:2005-07-01

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30902

    摘要: Systems and methods are described that allow programmatic access to search engine results and query logs in a structured form. The search results can be retrieved from the search engine in an intermediary form that contains the information that is in the HTML pages provided to web browsers (potentially with additional information). This intermediary form can then be broken down on the client machine, using local resources, to assemble the structured objects. The library also provides for caching of the search results. This can be provided both on the local machine and on a remote database. When the results for a query exist in the caches, they can be retrieved from such location instead of querying the search engine. Documents and/or web pages can also be cached. The library can also be directed to operate only from the cache, effectively exposing a local data set instead of the remote search engine.

    Spell checker with arbitrary length string-to-string transformations to improve noisy channel spelling correction
    8.
    发明授权
    Spell checker with arbitrary length string-to-string transformations to improve noisy channel spelling correction 失效
    具有任意长度的字符串到字符串转换的拼写检查器,以改善噪声通道拼写校正

    公开(公告)号:US07290209B2

    公开(公告)日:2007-10-30

    申请号:US11182388

    申请日:2005-07-15

    IPC分类号: G06N3/00

    CPC分类号: G06F17/273 G10L15/183

    摘要: A spell checker based on the noisy channel model has a source model and an error model. The source model determines how likely a word w in a dictionary is to have been generated. The error model determines how likely the word w was to have been incorrectly entered as the string s (e.g., mistyped or incorrectly interpreted by a speech recognition system) according to the probabilities of string-to-string edits. The string-to-string edits allow conversion of one arbitrary length character sequence to another arbitrary length character sequence.

    摘要翻译: 基于噪声通道模型的拼写检查器具有源模型和误差模型。 源模型确定字典中字w的生成可能性。 错误模型根据字符串到字符串编辑的概率确定字w被错误地输入为字符串s(例如,由语音识别系统错误地或不正确地解释)的可能性。 字符串到字符串的编辑允许将一个任意长度的字符序列转换为另一个任意长度的字符序列。

    Reducing human overhead in text categorization
    9.
    发明申请
    Reducing human overhead in text categorization 有权
    在文本分类中减少人为的开销

    公开(公告)号:US20070183655A1

    公开(公告)日:2007-08-09

    申请号:US11350701

    申请日:2006-02-09

    申请人: Arnd Konig Eric Brill

    发明人: Arnd Konig Eric Brill

    IPC分类号: G06K9/62

    CPC分类号: G06K9/6282

    摘要: A unique multi-stage classification system and method that facilitates reducing human resources or costs associated with text classification while still obtaining a desired level of accuracy is provided. The multi-stage classification system and method involve a pattern-based classifier and a machine learning classifier. The pattern-based classifier is trained on discriminative patterns as identified by humans rather than machines which allow a smaller training set to be employed. Given humans' superior abilities to reason over text, discriminative patterns can be more accurately and more readily identified by them. Unlabeled items can be initially processed by the pattern-based classifier and if no pattern match exists, then the unlabeled data can be processed by the machine learning classifier. By employing the classifiers in this manner, less human involvement is required in the classification process. Even more, classification accuracy is maintained and/or improved.

    摘要翻译: 提供了一种独特的多级分类系统和方法,其有助于减少与文本分类相关联的人力资源或成本,同时仍然获得期望的精度水平。 多级分类系统和方法涉及基于模式的分类器和机器学习分类器。 对基于模式的分类器进行人类识别的识别模式的培训,而不是允许使用较小训练集的机器。 鉴于人类超越文本的优越能力,歧视性模式可以更准确,更容易地被识别。 未标记的项目可以由基于模式的分类器最初处理,如果不存在模式匹配,那么未标记的数据可以由机器学习分类器处理。 通过以这种方式使用分类器,在分类过程中需要较少的人参与。 更重要的是,维护和/或改进分类精度。

    USER INTENT DISCOVERY
    10.
    发明申请
    USER INTENT DISCOVERY 审中-公开
    用户意见发现

    公开(公告)号:US20070162442A1

    公开(公告)日:2007-07-12

    申请号:US11618136

    申请日:2006-12-29

    IPC分类号: G06F17/30

    摘要: A system that facilitates determining a user's intent given a user search query comprises a search engine that is employed to search over a collection of objects within a data store to retrieve a user search result set. The objects within the result set are associated with queries that were previously utilized to locate such objects. A level of relatedness between the previous queries and the user search query is determined, and previous queries that are associated with a result set that is novel and related to the user search result set are returned to the user.

    摘要翻译: 有助于确定用户搜索查询的用户意图的系统包括搜索引擎,该搜索引擎用于搜索数据存储中的对象的集合以检索用户搜索结果集。 结果集中的对象与以前用于定位此类对象的查询相关联。 确定先前查询和用户搜索查询之间的相关性水平,并且将与结果集相关联且与用户搜索结果集相关联的先前查询返回给用户。