专利检索 ap:("Simon Tong" OR "Jeffrey Adgate Dean" OR "Sanjay Ghemawat") AND inv:"Sanjay Ghemawat" 第 2 页

11.

发明申请
Removing documents 有权
标题翻译：删除文件

公开(公告)号：US20070043721A1

公开(公告)日：2007-02-22

申请号：US11208005

申请日：2005-08-22

申请人： Sanjay Ghemawat , John Piscitello , Simon Tong , Matt Cutts

发明人： Sanjay Ghemawat , John Piscitello , Simon Tong , Matt Cutts

IPC分类号： G06F7/00

CPC分类号： G06F17/3053 , G06F17/30867

摘要： A system may present information regarding a document and provide an option for removing the document. The system may also receive selection of the option and remove the document when the option is selected. The system may aggregate information regarding documents that have been removed by a group of users and assign scores to a set of documents based on the aggregated information.

摘要翻译： 系统可以呈现关于文档的信息并提供用于移除文档的选项。当选择该选项时，系统还可以接收该选项的选择并移除文档。该系统可以聚合关于一组用户已被删除的文档的信息，并且基于聚合信息将分数分配给一组文档。

12.

发明授权
System and method for analyzing data records 有权

公开(公告)号：US09405808B2

公开(公告)日：2016-08-02

申请号：US13407632

申请日：2012-02-28

申请人： Robert C. Pike , Sean Quinlan , Sean M. Dorward , Jeffrey Dean , Sanjay Ghemawat

发明人： Robert C. Pike , Sean Quinlan , Sean M. Dorward , Jeffrey Dean , Sanjay Ghemawat

IPC分类号： G06F17/30 , G06F11/14

CPC分类号： G06F17/30501 , G06F11/1482 , G06F17/30545 , G06F17/30598 , Y10S707/99933 , Y10S707/99937

摘要： A method and system for analyzing data records includes allocating groups of records to respective processes of a first plurality of processes executing in parallel. In each respective process of the first plurality of processes, for each record in the group of records allocated to the respective process, a query is applied to the record so as to produce zero or more values. Zero or more emit operators are applied to each of the zero or more produced values so as to add corresponding information to an intermediate data structure. Information from a plurality of the intermediate data structures is aggregated to produce output data.

13.

发明授权
Representative document selection for a set of duplicate documents 有权
标题翻译：代表文件选择一套重复的文件

公开(公告)号：US08868559B2

公开(公告)日：2014-10-21

申请号：US13599707

申请日：2012-08-30

申请人： Daniel Dulitz , Alexandre A. Verstak , Sanjay Ghemawat , Jeffrey A. Dean

发明人： Daniel Dulitz , Alexandre A. Verstak , Sanjay Ghemawat , Jeffrey A. Dean

IPC分类号： G06F7/00 , G06F17/30

CPC分类号： G06F17/30867 , G06F17/3053 , G06F17/3071 , G06F17/30864 , Y10S707/99931 , Y10S707/99932 , Y10S707/99935 , Y10S707/99954

摘要： Systems and methods for indexing a representative document from a set of duplicate documents are disclosed. Disclosed systems and methods comprise selecting a first document in a plurality of documents on the basis that the first document is associated with a query independent score. Each respective document in the plurality of documents has a fingerprint that indicates that the respective document has substantially identical content to every other document in the plurality of documents. Disclosed systems and methods further comprise indexing, in accordance with the query independent score, the first document thereby producing an indexed first document. With respect to the plurality of documents, only the indexed first document is included in a document index.

摘要翻译： 公开了从一组重复文件索引代表性文件的系统和方法。公开的系统和方法包括在第一文档与查询独立分数相关联的基础上选择多个文档中的第一文档。多个文档中的每个文档具有指示相应文档具有与多个文档中的每个其他文档基本相同的内容的指纹。公开的系统和方法还包括根据查询独立分数索引第一文档，从而产生索引的第一文档。对于多个文档，仅索引的第一文档被包括在文档索引中。

14.

发明授权
Systems and methods for replicating data 有权
标题翻译：用于复制数据的系统和方法

公开(公告)号：US08504518B1

公开(公告)日：2013-08-06

申请号：US13274043

申请日：2011-10-14

申请人： Sanjay Ghemawat , Howard Gobioff , Shun-Tak Leung

发明人： Sanjay Ghemawat , Howard Gobioff , Shun-Tak Leung

IPC分类号： G06F17/30

CPC分类号： H04L67/1095 , G06F17/30174 , G06F17/30215

摘要： A system that facilitates the distribution and redistribution of chunks of data among multiple servers, may identify servers to store replicas of the chunks based on at least one of utilization, prior data distribution, and failure correlation properties, and place the replicas at the identified servers. The system may monitor total numbers of replicas available in the system, identify chunks that have a total number of replicas below one or more thresholds, assign priorities to the identified chunks, and re-replicate the identified chunks based on the assigned priorities. The system may monitor utilization of the servers, select one or more of the replicas to redistribute based on the utilization of the servers, select one or more of the servers to which to move the one or more replicas, and move the one or more replicas to the selected one or more servers.

摘要翻译： 有助于在多个服务器之间分发和重新分发数据块的系统可以基于利用，先前的数据分布和故障相关属性中的至少一个来识别存储块的副本的服务器，并将副本放置在所识别的服务器。该系统可以监视系统中可用的副本的总数，将具有低于一个或多个阈值的副本总数的块识别为所识别的块分配优先级，并且基于所分配的优先级重新复制所识别的块。系统可以监视服务器的利用率，基于服务器的利用率选择一个或多个副本重新分发，选择要移动一个或多个副本的一个或多个服务器，并移动一个或多个副本到所选择的一个或多个服务器。

15.

发明授权
Using text surrounding hypertext links when indexing and generating page summaries 有权
标题翻译：在索引和生成页面摘要时使用超文本链接的文本

公开(公告)号：US08495483B1

公开(公告)日：2013-07-23

申请号：US10386110

申请日：2003-03-12

申请人： Jeffrey A. Dean , Martin Farach-Colton , Sanjay Ghemawat , Benedict Gomes , Georges R. Hank

发明人： Jeffrey A. Dean , Martin Farach-Colton , Sanjay Ghemawat , Benedict Gomes , Georges R. Hank

IPC分类号： G06F17/00 , G06F17/30

CPC分类号： G06F17/30864

摘要： Web quotes are gathered from web pages that link to a web page of interest. The web quote may include text from the paragraphs that contain the hypertext links to the page of interest as well as text from other portions of the linked web page, such as text from a nearby header. The obtained web quotes may be ranked based on quality or relevance and may then be incorporated into a search engine's document index or into summary information returned to users in response to a search query.

摘要翻译： 网络引用从链接到感兴趣的网页的网页收集。网络报价可以包括来自包含到感兴趣页面的超文本链接的段落的文本以及链接网页的其他部分的文本，例如来自附近标题的文本。获得的网络报价可以基于质量或相关性来排序，然后可以被合并到搜索引擎的文档索引中或者被合并到响应于搜索查询返回给用户的摘要信息中。

16.

发明申请
REPRESENTATIVE DOCUMENT SELECTION FOR A SET OF DUPLICATE DOCUMENTS 有权
标题翻译：一组重复文件的代表性文件选择

公开(公告)号：US20120323896A1

公开(公告)日：2012-12-20

申请号：US13599707

申请日：2012-08-30

申请人： Daniel Dulitz , Alexandre A. Verstak , Sanjay Ghemawat , Jeffrey A. Dean

发明人： Daniel Dulitz , Alexandre A. Verstak , Sanjay Ghemawat , Jeffrey A. Dean

IPC分类号： G06F17/30

CPC分类号： G06F17/30867 , G06F17/3053 , G06F17/3071 , G06F17/30864 , Y10S707/99931 , Y10S707/99932 , Y10S707/99935 , Y10S707/99954

摘要： Systems and methods for indexing a representative document from a set of duplicate documents are disclosed. Disclosed systems and methods comprise selecting a first document in a plurality of documents on the basis that the first document is associated with a query independent score. Each respective document in the plurality of documents has a fingerprint that indicates that the respective document has substantially identical content to every other document in the plurality of documents. Disclosed systems and methods further comprise indexing, in accordance with the query independent score, the first document thereby producing an indexed first document. With respect to the plurality of documents, only the indexed first document is included in a document index.

摘要翻译： 公开了从一组重复文件索引代表性文件的系统和方法。公开的系统和方法包括在第一文档与查询独立分数相关联的基础上选择多个文档中的第一文档。多个文档中的每个文档具有指示相应文档具有与多个文档中的每个其他文档基本相同的内容的指纹。公开的系统和方法还包括根据查询独立分数索引第一文档，从而产生索引的第一文档。对于多个文档，仅索引的第一文档被包括在文档索引中。

17.

发明授权
Representative document selection for sets of duplicate documents in a web crawler system 有权
标题翻译： Web爬网系统中的重复文件集的代表性文档选择

公开(公告)号：US08260781B2

公开(公告)日：2012-09-04

申请号：US13186414

申请日：2011-07-19

申请人： Daniel Dulitz , Alexandre A. Verstak , Sanjay Ghemawat , Jeffrey A. Dean

发明人： Daniel Dulitz , Alexandre A. Verstak , Sanjay Ghemawat , Jeffrey A. Dean

IPC分类号： G06F7/00 , G06F17/30

CPC分类号： G06F17/30867 , G06F17/3053 , G06F17/3071 , G06F17/30864 , Y10S707/99931 , Y10S707/99932 , Y10S707/99935 , Y10S707/99954

摘要： Duplicate documents are detected in a web crawler system. Upon receiving a newly crawled document, a set of documents, if any, sharing the same content as the newly crawled document is identified. Information identifying the newly crawled document and the selected set of documents is merged into information identifying a new set of documents. Duplicate documents are included and excluded from the new set of documents based on a query independent metric for each such document. A single representative document for the new set of documents is identified in accordance with a set of predefined conditions.

摘要翻译： 在网页抓取工具系统中检测到重复的文档。在接收到新爬取的文档时，识别与新爬取的文档共享相同内容的一组文档（如果有的话）。识别新爬取的文档和所选择的一组文档的信息被合并到识别新的一组文档的信息中。基于每个此类文档的查询独立指标，将重复的文档包含在新文档集中并从其中排除。根据一组预定义的条件识别新的文档集合的单个代表性文档。

18.

发明申请
System and Method for Analyzing Data Records 有权
标题翻译：用于分析数据记录的系统和方法

公开(公告)号：US20120215787A1

公开(公告)日：2012-08-23

申请号：US13407632

申请日：2012-02-28

申请人： Robert C. Pike , Sean Quinlan , Sean M. Dorward , Jeffrey Dean , Sanjay Ghemawat

发明人： Robert C. Pike , Sean Quinlan , Sean M. Dorward , Jeffrey Dean , Sanjay Ghemawat

IPC分类号： G06F17/30

CPC分类号： G06F17/30501 , G06F11/1482 , G06F17/30545 , G06F17/30598 , Y10S707/99933 , Y10S707/99937

摘要： A method and system for analyzing data records includes allocating groups of records to respective processes of a first plurality of processes executing in parallel. In each respective process of the first plurality of processes, for each record in the group of records allocated to the respective process, a query is applied to the record so as to produce zero or more values. Zero or more emit operators are applied to each of the zero or more produced values so as to add corresponding information to an intermediate data structure. Information from a plurality of the intermediate data structures is aggregated to produce output data.

摘要翻译： 用于分析数据记录的方法和系统包括：将记录组分配给并行执行的第一多个进程的各个进程。在第一多个处理的每个相应处理中，对于分配给相应处理的记录组中的每个记录，将对该记录应用查询以产生零个或多个值。将零个或更多个发射操作符应用于零或更多产生的值中的每一个，以便将相应的信息添加到中间数据结构。来自多个中间数据结构的信息被聚合以产生输出数据。

19.

发明申请
Representative Document Selection for Sets of Duplicate Documents in a Web Crawler System 有权
标题翻译： Web爬虫系统中重复文档集的代表性文档选择

公开(公告)号：US20110276561A1

公开(公告)日：2011-11-10

申请号：US13186414

申请日：2011-07-19

申请人： Daniel Dulitz , Alexandre A. Verstak , Sanjay Ghemawat , Jeffrey A. Dean

发明人： Daniel Dulitz , Alexandre A. Verstak , Sanjay Ghemawat , Jeffrey A. Dean

IPC分类号： G06F17/30

CPC分类号： G06F17/30867 , G06F17/3053 , G06F17/3071 , G06F17/30864 , Y10S707/99931 , Y10S707/99932 , Y10S707/99935 , Y10S707/99954

摘要： Duplicate documents are detected in a web crawler system. Upon receiving a newly crawled document, a set of documents, if any, sharing the same content as the newly crawled document is identified. Information identifying the newly crawled document and the selected set of documents is merged into information identifying a new set of documents. Duplicate documents are included and excluded from the new set of documents based on a query independent metric for each such document. A single representative document for the new set of documents is identified in accordance with a set of predefined conditions.

摘要翻译： 在网页抓取工具系统中检测到重复的文档。在接收到新爬取的文档时，识别与新爬取的文档共享相同内容的一组文档（如果有的话）。识别新爬取的文档和所选择的一组文档的信息被合并到识别新的一组文档的信息中。基于每个此类文档的查询独立指标，将重复的文档包含在新文档集中并排除在外。根据一组预定义的条件识别新的文档集合的单个代表性文档。

20.

发明授权
Document compression system and method for use with tokenspace repository 有权
标题翻译：文档压缩系统和方法用于托管存储库

公开(公告)号：US07917480B2

公开(公告)日：2011-03-29

申请号：US10917739

申请日：2004-08-13

申请人： Jeffrey Dean , Gautham K. Thambidorai , Sanjay Ghemawat , Benedict Anthony Gomes , Olcan Sercinoglu

发明人： Jeffrey Dean , Gautham K. Thambidorai , Sanjay Ghemawat , Benedict Anthony Gomes , Olcan Sercinoglu

IPC分类号： G06F7/00 , G06F17/00 , G06F15/18

CPC分类号： G06F17/30864 , G06F17/30011 , G06F17/30371 , G06F17/30613

摘要： The disclosed embodiments enable multi-stage query scoring, including “snippet” generation, through incremental document reconstruction facilitated by a multi-tiered mapping scheme. The mapping scheme includes a first mapping between unique tokens contained in a set of documents and unique global token identifiers (e.g., 32-bit integers) contained in a global-lexicon (i.e., dictionary). The mapping scheme also includes a second mapping between the global token identifiers and a set of fixed-length local token identifiers (e.g., 8-bit integers) contained in one or more mini-lexicons (i.e., sub-dictionaries). Each mini-lexicon is associated with a range of token positions in the tokenized documents. The first and second mappings are used to encode/decode documents into local token identifiers having fixed widths which can be compactly stored in the tokenspace repository. The use of fixed-length local token identifiers allows for fast and efficient decoding of tokenized documents.

摘要翻译： 所公开的实施例通过由多层映射方案促进的增量文档重建能够实现多阶段查询评分，包括“代码段”生成。映射方案包括包含在一组文档中的唯一标记和包含在全局词典（即字典）中的唯一全局令牌标识符（例如，32位整数）之间的第一映射。映射方案还包括全局令牌标识符与包含在一个或多个小词典（即子词典）中的一组固定长度的本地令牌标识符（例如，8位整数）之间的第二映射。每个迷你词典与令牌化文档中的一系列令牌位置相关联。第一和第二映射用于将文档编码/解码为具有固定宽度的本地令牌标识符，其可以紧凑地存储在令牌空间存储库中。使用固定长度的本地令牌标识符可以快速有效地解码标记化文档。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类