Ranking based on reference contexts
    1.
    发明授权
    Ranking based on reference contexts 有权
    基于参考上下文进行排名

    公开(公告)号:US08577893B1

    公开(公告)日:2013-11-05

    申请号:US10800006

    申请日:2004-03-15

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30684

    摘要: A system ranks documents based on contexts associated with the documents. The system identifies a reference in a first document, where the reference is associated with a second document. The system analyzes a portion of the first document associated with the reference, identifies a rare word (or words) from the portion, creates a context identifier based on the rare word(s), and ranks the second document based on the context identifier.

    摘要翻译: 系统根据与文档相关联的上下文对文档进行排序。 系统识别第一个文档中的引用,其中引用与第二个文档相关联。 系统分析与参考相关联的第一文档的一部分,从该部分识别罕见单词(或单词),基于罕见单词创建上下文标识符,并且基于上下文标识符对第二文档进行排序。

    SYSTEM AND METHOD FOR PROVIDING SEARCH QUERY REFINEMENTS
    3.
    发明申请
    SYSTEM AND METHOD FOR PROVIDING SEARCH QUERY REFINEMENTS 失效
    提供搜索查询的系统和方法

    公开(公告)号:US20120054216A1

    公开(公告)日:2012-03-01

    申请号:US13289348

    申请日:2011-11-04

    IPC分类号: G06F17/30

    摘要: A system and method for providing search query refinements are presented. A stored query and a stored document are associated as a logical pairing. A weight is assigned to the logical pairing. The search query is issued and a set of search documents is produced. At least one search document is matched to at least one stored document. The stored query and the assigned weight associated with the matching at least one stored document are retrieved. At least one cluster is formed based on the stored query and the assigned weight associated with the matching at least one stored document. The stored query associated with the matching at least one stored document are scored for the at least one cluster relative to at least one other cluster. At least one such scored search query is suggested as a set of query refinements.

    摘要翻译: 提出了一种提供搜索查询优化的系统和方法。 存储的查询和存储的文档被关联为逻辑配对。 权重被分配给逻辑配对。 发出搜索查询,并生成一组搜索文档。 至少一个搜索文档与至少一个存储的文档匹配。 检索存储的查询和与匹配的至少一个存储的文档相关联的分配的权重。 基于存储的查询和与匹配至少一个存储的文档相关联的分配的权重,形成至少一个群集。 与至少一个存储的文档匹配的存储查询相对于至少一个其他集群对于至少一个集群进行评分。 建议至少一个这样的计分搜索查询作为一组查询优化。

    Determining quality of linked documents
    5.
    发明授权
    Determining quality of linked documents 有权
    确定链接文件的质量

    公开(公告)号:US07783639B1

    公开(公告)日:2010-08-24

    申请号:US10879520

    申请日:2004-06-30

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30864

    摘要: A ranking component ranks documents, such as web pages or web sites, to obtain a ranking score that defines a quality judgment of the document. The ranking score of a particular document is based on the ranking score of the documents which link to it and based on affiliation among the documents.

    摘要翻译: 排名组件对文档(如网页或网站)进行排名,以获得定义文档质量判断的排名分数。 特定文件的排名分数基于链接到文档的文档的排名分数,并基于文档之间的归属。

    Locating meaningful stopwords or stop-phrases in keyword-based retrieval systems
    6.
    发明授权
    Locating meaningful stopwords or stop-phrases in keyword-based retrieval systems 有权
    在基于关键字的检索系统中找到有意义的词汇或停止词组

    公开(公告)号:US07409383B1

    公开(公告)日:2008-08-05

    申请号:US10813590

    申请日:2004-03-31

    IPC分类号: G06F17/30 G06F7/00 G06F17/21

    摘要: A stopword detection component detects stopwords (also stop-phrases) in search queries input to keyword-based information retrieval systems. Potential stopwords are initially identified by comparing the terms in the search query to a list of known stopwords. Context data is then retrieved based on the search query and the identified stopwords. In one implementation, the context data includes documents retrieved from a document index. In another implementation, the context data includes categories relevant to the search query. Sets of retrieved context data are compared to one another to determine if they are substantially similar. If the sets of context data are substantially similar, this fact may be used to infer that the removal of the potential stopword(s) is not material to the search. If the sets of context data are not substantially similar, the potential stopword can be considered material to the search and should not be removed from the query.

    摘要翻译: 停止词检测组件在输入到基于关键字的信息检索系统的搜索查询中检测到停止词(也称为停止词)。 最初通过将搜索查询中的术语与已知无效词列表进行比较来识别潜在的禁忌词。 然后基于搜索查询和所识别的无效词来检索上下文数据。 在一个实现中,上下文数据包括从文档索引检索的文档。 在另一实现中,上下文数据包括与搜索查询相关的类别。 将检索到的上下文数据的集合彼此进行比较以确定它们是否基本相似。 如果上下文数据集合基本相似,则可以使用该事实来推断潜在的停止词的移除对搜索不重要。 如果上下文数据集基本上不相似,潜在的停用词可以被认为是搜索的重要内容,不应该从查询中移除。

    System and method for providing search query refinements
    7.
    发明授权
    System and method for providing search query refinements 失效
    提供搜索查询优化的系统和方法

    公开(公告)号:US08645407B2

    公开(公告)日:2014-02-04

    申请号:US13289348

    申请日:2011-11-04

    IPC分类号: G06F17/30

    摘要: A system and method for providing search query refinements are presented. A stored query and a stored document are associated as a logical pairing. A weight is assigned to the logical pairing. The search query is issued and a set of search documents is produced. At least one search document is matched to at least one stored document. The stored query and the assigned weight associated with the matching at least one stored document are retrieved. At least one cluster is formed based on the stored query and the assigned weight associated with the matching at least one stored document. The stored query associated with the matching at least one stored document are scored for the at least one cluster relative to at least one other cluster. At least one such scored search query is suggested as a set of query refinements.

    摘要翻译: 提出了一种提供搜索查询优化的系统和方法。 存储的查询和存储的文档被关联为逻辑配对。 权重被分配给逻辑配对。 发出搜索查询,并生成一组搜索文档。 至少一个搜索文档与至少一个存储的文档匹配。 检索存储的查询和与匹配的至少一个存储的文档相关联的分配的权重。 基于存储的查询和与匹配至少一个存储的文档相关联的分配的权重,形成至少一个群集。 与至少一个存储的文档匹配的存储查询相对于至少一个其他集群对于至少一个集群进行评分。 建议至少一个这样的计分搜索查询作为一组查询优化。

    Personally identifiable information detection

    公开(公告)号:US08561185B1

    公开(公告)日:2013-10-15

    申请号:US13109646

    申请日:2011-05-17

    IPC分类号: G06F21/00

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for privacy protection. In one aspect, a method includes accessing personally identifiable information (PII) type definitions that characterize PII types; identifying PII type information included in content of a web page, the PII type information being information matching at least one PII type definition; identifying secondary information included in the content of the web page, the secondary information being information that is predefined as being associated with PII type information; determining a risk score from the PII type information and the secondary information; and classifying the web page as a personal information exposure risk if the risk score meets a confidentiality threshold, wherein the personal information exposure risk is indicative of the web page including personally identifiable information.