专利检索 ap:("Daniel Dulitz" OR "Alexandre A. Verstak" OR "Sanjay Ghemawat" OR "Jeffrey A. Dean") AND inv:"Sanjay Ghemawat" 第 7 页

61.

发明授权
Anchor tag indexing in a web crawler system 有权

公开(公告)号：US09305091B2

公开(公告)日：2016-04-05

申请号：US13300516

申请日：2011-11-18

申请人： Huican Zhu , Jeffrey Dean , Sanjay Ghemawat , Bwolen Po-Jen Yang , Anurag Acharya

发明人： Huican Zhu , Jeffrey Dean , Sanjay Ghemawat , Bwolen Po-Jen Yang , Anurag Acharya

IPC分类号： G06F17/00 , G06F17/30 , G06F17/27 , G06F17/22

CPC分类号： G06F17/30014 , G06F17/2235 , G06F17/241 , G06F17/2705 , G06F17/30321 , G06F17/30864

摘要： Provided is a method and system for indexing documents in a collection of linked documents. A link log, including one or more pairings of source documents and target documents is accessed. A sorted anchor map, containing one or more target document to source document pairings, is generated. The pairings in the sorted anchor map are ordered based on target document identifiers.

62.

发明授权
Organizing data in a distributed storage system 有权
标题翻译：在分布式存储系统中组织数据

公开(公告)号：US09069835B2

公开(公告)日：2015-06-30

申请号：US13898411

申请日：2013-05-20

申请人： Jeffrey Adgate Dean , Michael James Boyer Epstein , Andrew Fikes , Sanjay Ghemawat , Wilson Cheng-Yi Hsieh , Alexander Lloyd , Yasushi Saito , Michal Piotr Szymaniak , Sebastian Kanthak , Chris Jorgen Taylor

发明人： Jeffrey Adgate Dean , Michael James Boyer Epstein , Andrew Fikes , Sanjay Ghemawat , Wilson Cheng-Yi Hsieh , Alexander Lloyd , Yasushi Saito , Michal Piotr Szymaniak , Sebastian Kanthak , Chris Jorgen Taylor

IPC分类号： G06F17/30 , G06F3/06

CPC分类号： G06F17/30575 , G06F3/0611 , G06F3/0617 , G06F3/065 , G06F3/067

摘要： A distributed storage system is provided. The distributed storage system includes multiple front-end servers and zones for managing data for clients. Data within the distributed storage system is associated with a plurality of accounts and divided into a plurality of groups, each group including a plurality of splits, each split being associated with a respective account, and each group having multiple tablets and each tablet managed by a respective tablet server of the distributed storage system. Data associated with different accounts may be replicated within the distributed storage system using different data replication policies. There is no limit to the amount of data for an account by adding new splits to the distributed storage system. In response to a client request for a particular account's data, a front-end server communicates such request to a particular zone that has the client-requested data and returns the client-requested data to the requesting client.

摘要翻译： 提供分布式存储系统。分布式存储系统包括多个前端服务器和用于管理客户端数据的区域。分布式存储系统内的数据与多个帐户相关联，并被分成多个组，每个组包括多个分组，每个分组与相应的帐户相关联，并且每组具有多个平板电脑，每个分组由分布式存储系统的平板电脑服务器。可以使用不同的数据复制策略在分布式存储系统内复制与不同帐户相关联的数据。通过向分布式存储系统添加新的拆分，帐户数据的数量没有限制。响应于客户端对特定帐户的数据的请求，前端服务器将该请求传送到具有客户端请求的数据的特定区域，并将客户端请求的数据返回给请求客户端。

63.

发明授权
System and method of accessing a document efficiently through multi-tier web caching 有权
标题翻译：通过多层网页缓存有效访问文档的系统和方法

公开(公告)号：US08788475B2

公开(公告)日：2014-07-22

申请号：US13536701

申请日：2012-06-28

申请人： Eric Russell Fredricksen , Fritz John Schneider , Jeffrey Adgate Dean , Sanjay Ghemawat , Niels Provos , Georges Harik

发明人： Eric Russell Fredricksen , Fritz John Schneider , Jeffrey Adgate Dean , Sanjay Ghemawat , Niels Provos , Georges Harik

IPC分类号： G06F17/30

CPC分类号： G06F17/30902 , G06F17/30011 , Y10S707/99931 , Y10S707/99932

摘要： Upon receipt of a document request, a client assistant examines its cache for the document. If not successful, a server searches for the requested document in its cache. If the server copy is still not fresh or not found, the server seeks the document from its host. If the host cannot provide the copy, the server seeks it from a document repository. Certain documents are identified from the document repository as being fresh or stable. Information about each of these identified documents is transmitted to the server which inserts entries into an index if the index does not already contain an entry for the document. If and when this particular document is requested, the document will not be present in the server, however the server will contain an entry directing the server to obtain the document from the document repository rather than the document's web host.

摘要翻译： 在接收到文档请求时，客户端助理检查其文件的缓存。如果不成功，服务器将在其缓存中搜索所请求的文档。如果服务器副本仍然不新鲜或找不到，则服务器从其主机寻找文档。如果主机无法提供副本，则服务器从文档存储库中查找它。某些文件从文档库中确定为新鲜或稳定。关于这些标识文档中的每一个的信息被传送到服务器，如果该索引尚未包含该文档的条目，则将该条目插入到索引中。如果请求此特定文档时，该文档将不存在于服务器中，但是服务器将包含一个条目，指示服务器从文档存储库而不是文档的Web主机获取文档。

64.

发明申请
Generating Content Snippets Using a Tokenspace Repository 有权
标题翻译：使用令牌空间存储库生成内容片段

公开(公告)号：US20130212076A1

公开(公告)日：2013-08-15

申请号：US13685581

申请日：2012-11-26

申请人： Jeffrey Dean , Gauthaum K. Thambidorai , Sanjay Ghemawat , Benedict Anthony Gomes , Olcan Sercinoglu

发明人： Jeffrey Dean , Gauthaum K. Thambidorai , Sanjay Ghemawat , Benedict Anthony Gomes , Olcan Sercinoglu

IPC分类号： G06F17/30

CPC分类号： G06F17/30864 , G06F17/30011 , G06F17/30371 , G06F17/30613

摘要： A search engine server system receives from a client system a search query and identifies a set of documents in accordance with the search query. A content snippet corresponding to content in a respective document of the identified set of documents is generated, the content snippet associated with at least one query term of the one or more query terms in the search query. A response to the search query is returned to the client system, the response including information identifying at least the respective document and including the content snippet. Generating the content snippet includes performing a first decompression operation on first token identifiers, from a compressed document repository, to provide a set of second token identifiers, and performing a second decompression operation on the set of second token identifiers to recover uncompressed content comprising a portion of the respective document.

摘要翻译： 搜索引擎服务器系统从客户端系统接收搜索查询，并根据搜索查询识别一组文档。产生对应于所识别的一组文档的相应文档中的内容的内容片段，该内容片段与搜索查询中的一个或多个查询词的至少一个查询词相关联。对搜索查询的响应被返回到客户端系统，响应包括至少标识相应文档并且包括内容片段的信息。生成内容片段包括对来自压缩文档库的第一令牌标识符执行第一解压缩操作，以提供一组第二令牌标识符，以及对所述第二令牌标识符集合执行第二解压缩操作，以恢复未压缩内容，其包括部分的相关文件。

65.

发明授权
Anchor tag indexing in a web crawler system 有权
标题翻译：网页抓取系统中的锚点标签索引

公开(公告)号：US08484548B1

公开(公告)日：2013-07-09

申请号：US11936421

申请日：2007-11-07

申请人： Huican Zhu , Jeffrey Dean , Sanjay Ghemawat , Bwolen Po-Jen Yang , Anurag Acharya

发明人： Huican Zhu , Jeffrey Dean , Sanjay Ghemawat , Bwolen Po-Jen Yang , Anurag Acharya

IPC分类号： G06F17/00

CPC分类号： G06F17/30014 , G06F17/2235 , G06F17/241 , G06F17/2705 , G06F17/30321 , G06F17/30864

摘要： Provided is a method and system for indexing documents in a collection of linked documents. A link log, including one or more pairings of source documents and target documents is accessed. A sorted anchor map, containing one or more target document to source document pairings, is generated. The pairings in the sorted anchor map are ordered based on target document identifiers.

摘要翻译： 提供了一种用于在链接文档的集合中索引文档的方法和系统。链接日志，包括一个或多个源文档和目标文档的配对。生成包含一个或多个目标文档到源文档配对的排序的锚图。排序的锚图中的配对是基于目标文档标识符进行排序的。

66.

发明申请
System and Method of Accessing a Document Efficiently Through Multi-Tier Web Caching 有权
标题翻译：通过多层Web缓存高效访问文档的系统和方法

公开(公告)号：US20120271852A1

公开(公告)日：2012-10-25

申请号：US13536701

申请日：2012-06-28

申请人： Eric Russell Fredricksen , Fritz John Schneider , Jeffrey Adgate Dean , Sanjay Ghemawat , Niels Provos , Georges Harik

发明人： Eric Russell Fredricksen , Fritz John Schneider , Jeffrey Adgate Dean , Sanjay Ghemawat , Niels Provos , Georges Harik

IPC分类号： G06F17/30

CPC分类号： G06F17/30902 , G06F17/30011 , Y10S707/99931 , Y10S707/99932

摘要： Upon receipt of a document request, a client assistant examines its cache for the document. If not successful, a server searches for the requested document in its cache. If the server copy is still not fresh or not found, the server seeks the document from its host. If the host cannot provide the copy, the server seeks it from a document repository. Certain documents are identified from the document repository as being fresh or stable. Information about each of these identified documents is transmitted to the server which inserts entries into an index if the index does not already contain an entry for the document. If and when this particular document is requested, the document will not be present in the server, however the server will contain an entry directing the server to obtain the document from the document repository rather than the document's web host.

摘要翻译： 在接收到文档请求时，客户端助理检查其文件的缓存。如果不成功，服务器将在其缓存中搜索所请求的文档。如果服务器副本仍然不新鲜或找不到，则服务器从其主机寻找文档。如果主机无法提供副本，则服务器从文档存储库中查找它。某些文件从文档库中确定为新鲜或稳定。关于这些标识文档中的每一个的信息被传送到服务器，如果索引尚未包含文档的条目，则将该条目插入到索引中。如果请求此特定文档时，该文档将不存在于服务器中，但是服务器将包含一个条目，指示服务器从文档存储库而不是文档的Web主机获取文档。

67.

发明授权
Garbage collecting systems and methods 有权
标题翻译：垃圾收集系统和方法

公开(公告)号：US07865536B1

公开(公告)日：2011-01-04

申请号：US10608039

申请日：2003-06-30

申请人： Sanjay Ghemawat , Howard Gobioff , Shun-Tak Leung

发明人： Sanjay Ghemawat , Howard Gobioff , Shun-Tak Leung

IPC分类号： G06F7/00 , G06F17/00 , G06F12/00

CPC分类号： H04L67/1095 , G06F17/30174 , G06F17/30215

摘要： A system facilitates the deletion of data, such as files, orphaned chunks, and stale replicas. The system may identify a file to be deleted, rename the identified file, permanently delete the renamed file a predetermined amount of time after renaming the identified file as part of a garbage collection process, receive, from the servers, information concerning chunks stored by the servers, and identify, to the servers, ones of the chunks that do not exist possibly due to the permanent deletion of the renamed file. The system may further provide a mapping of file names to chunks, identify chunks, as orphaned chunks, that are not reachable from any of the file names, delete the orphaned chunks, receive, from the servers, information concerning chunks stored by the servers, and identify, to the servers, ones of the chunks that are orphaned chunks. The system may also associate version information with replicas of chunks, identify stale replicas based on the associated version information, delete the stale replicas, receive, from the servers, information concerning replicas stored by the servers, and identify, to the servers, ones of the replicas that are stale replicas.

摘要翻译： 系统便于删除数据，例如文件，孤立的块和陈旧的副本。系统可以识别要删除的文件，重新命名所识别的文件，在将所识别的文件重命名为垃圾收集过程的一部分之后，将重命名的文件永久删除一段预定的时间，从服务器接收关于由服务器，并且由于永久删除重命名的文件，向服务器识别不存在的块。该系统还可以提供文件名与块的映射，识别从任何文件名不可访问的孤立的块，删除孤立的块，从服务器接收关于由服务器存储的块的信息，并向服务器识别那些成为孤立块的块。该系统还可以将版本信息与块的副本相关联，基于相关联的版本信息识别不复制的副本，删除陈旧的副本，从服务器接收与服务器所存储的副本有关的信息，并向服务器标识复制品是陈旧的副本。

68.

发明授权
Large-scale data processing in a distributed and parallel processing enviornment 有权
标题翻译：在分布式和并行处理环境中进行大规模数据处理

公开(公告)号：US07756919B1

公开(公告)日：2010-07-13

申请号：US10871245

申请日：2004-06-18

申请人： Jeffrey Dean , Sanjay Ghemawat

发明人： Jeffrey Dean , Sanjay Ghemawat

IPC分类号： G06F15/16

CPC分类号： G06F17/30339 , G06F9/4881 , G06F9/54 , G06F17/30377 , G06F17/30445

摘要： A large-scale data processing system and method includes one or more application-independent map modules configured to read input data and to apply at least one application-specific map operation to the input data to produce intermediate data values, wherein the map operation is automatically parallelized across multiple processors in the parallel processing environment. A plurality of intermediate data structures are used to store the intermediate data values. One or more application-independent reduce modules are configured to retrieve the intermediate data values and to apply at least one application-specific reduce operation to the intermediate data values to provide output data.

摘要翻译： 大规模数据处理系统和方法包括一个或多个独立于应用的地图模块，其被配置为读取输入数据并将至少一个应用特定地图操作应用于输入数据以产生中间数据值，其中地图操作是自动的在并行处理环境中跨多个处理器并行化。使用多个中间数据结构来存储中间数据值。一个或多个独立于应用的减少模块被配置为检索中间数据值并且将至少一个特定于应用的减少操作应用于中间数据值以提供输出数据。

69.

发明申请
System and Method for Large-Scale Data Processing Using an Application-Independent Framework 有权
标题翻译：使用独立于应用程序的框架进行大规模数据处理的系统和方法

公开(公告)号：US20100122065A1

公开(公告)日：2010-05-13

申请号：US12686292

申请日：2010-01-12

申请人： Jeffrey Dean , Sanjay Ghemawat

发明人： Jeffrey Dean , Sanjay Ghemawat

IPC分类号： G06F9/38 , G06F9/30

CPC分类号： G06F17/30339 , G06F9/4881 , G06F9/54 , G06F17/30377 , G06F17/30445

摘要： A large-scale data processing system and method for processing data in a distributed and parallel processing environment. The system includes an application-independent framework for processing data having a plurality of application-independent map modules and reduce modules. These application-independent modules use application-independent operators to automatically handle parallelization of computations across the distributed and parallel processing environment when performing user-specified data processing operations. The system also includes a plurality of user-specified, application-specific operators, for use with the application-independent framework to perform a user-specified data processing operation on a user-specified set of input files. The application-specific operators include: a map operator and a reduce operator. The map operator is applied by the application-independent map modules to input data in the user-specified set of input files to produce intermediate data values. The reduce operator is applied by the application-independent reduce modules to process the intermediate data values to produce final output data.

摘要翻译： 用于在分布式和并行处理环境中处理数据的大规模数据处理系统和方法。该系统包括用于处理具有多个独立于应用的地图模块并减少模块的数据的独立于应用的框架。这些独立于应用程序的模块在执行用户指定的数据处理操作时，使用独立于应用程序的运算符来自动处理分布式和并行处理环境中的计算并行化。该系统还包括多个用户指定的应用专用运营商，用于与应用无关的框架，以对用户指定的一组输入文件执行用户指定的数据处理操作。应用程序特定的运算符包括：map运算符和reduce运算符。映射运算符由应用无关映射模块应用于输入用户指定的输入文件集中的数据，以产生中间数据值。 reduce运算符由独立于应用程序的模块应用，以处理中间数据值以产生最终输出数据。

70.

发明授权
System and method for analyzing data records 有权
标题翻译：用于分析数据记录的系统和方法

公开(公告)号：US07590620B1

公开(公告)日：2009-09-15

申请号：US10954692

申请日：2004-09-29

申请人： Robert C. Pike , Sean Quinlan , Sean M. Dorward , Jeffrey Dean , Sanjay Ghemawat

发明人： Robert C. Pike , Sean Quinlan , Sean M. Dorward , Jeffrey Dean , Sanjay Ghemawat

IPC分类号： G06F17/30

CPC分类号： G06F17/30501 , G06F11/1482 , G06F17/30545 , G06F17/30598 , Y10S707/99933 , Y10S707/99937

摘要： A method and system for analyzing data records includes allocating groups of records to respective processes of a first plurality of processes executing in parallel. In each respective process of the first plurality of processes, for each record in the group of records allocated to the respective process, a query is applied to the record so as to produce zero or more values. Zero or more emit operators are applied to each of the zero or more produced values so as to add corresponding information to an intermediate data structure. Information from a plurality of the intermediate data structures is aggregated to produce output data.

摘要翻译： 用于分析数据记录的方法和系统包括：将记录组分配给并行执行的第一多个进程的各个进程。在第一多个处理的每个相应处理中，对于分配给相应处理的记录组中的每个记录，将对该记录应用查询以产生零个或多个值。将零个或更多个发射操作符应用于零或更多产生的值中的每一个，以便将相应的信息添加到中间数据结构。来自多个中间数据结构的信息被聚合以产生输出数据。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类