专利检索 ap:("Jeffrey Dean" OR "Sanjay Ghemawat") AND inv:"Jeffrey Dean" 第 1 页

1.

发明授权
System and method for analyzing data records 有权

公开(公告)号：US09405808B2

公开(公告)日：2016-08-02

申请号：US13407632

申请日：2012-02-28

申请人： Robert C. Pike , Sean Quinlan , Sean M. Dorward , Jeffrey Dean , Sanjay Ghemawat

发明人： Robert C. Pike , Sean Quinlan , Sean M. Dorward , Jeffrey Dean , Sanjay Ghemawat

IPC分类号： G06F17/30 , G06F11/14

CPC分类号： G06F17/30501 , G06F11/1482 , G06F17/30545 , G06F17/30598 , Y10S707/99933 , Y10S707/99937

摘要： A method and system for analyzing data records includes allocating groups of records to respective processes of a first plurality of processes executing in parallel. In each respective process of the first plurality of processes, for each record in the group of records allocated to the respective process, a query is applied to the record so as to produce zero or more values. Zero or more emit operators are applied to each of the zero or more produced values so as to add corresponding information to an intermediate data structure. Information from a plurality of the intermediate data structures is aggregated to produce output data.

2.

发明申请
System and Method for Analyzing Data Records 有权
标题翻译：用于分析数据记录的系统和方法

公开(公告)号：US20120215787A1

公开(公告)日：2012-08-23

申请号：US13407632

申请日：2012-02-28

申请人： Robert C. Pike , Sean Quinlan , Sean M. Dorward , Jeffrey Dean , Sanjay Ghemawat

发明人： Robert C. Pike , Sean Quinlan , Sean M. Dorward , Jeffrey Dean , Sanjay Ghemawat

IPC分类号： G06F17/30

CPC分类号： G06F17/30501 , G06F11/1482 , G06F17/30545 , G06F17/30598 , Y10S707/99933 , Y10S707/99937

摘要： A method and system for analyzing data records includes allocating groups of records to respective processes of a first plurality of processes executing in parallel. In each respective process of the first plurality of processes, for each record in the group of records allocated to the respective process, a query is applied to the record so as to produce zero or more values. Zero or more emit operators are applied to each of the zero or more produced values so as to add corresponding information to an intermediate data structure. Information from a plurality of the intermediate data structures is aggregated to produce output data.

摘要翻译： 用于分析数据记录的方法和系统包括：将记录组分配给并行执行的第一多个进程的各个进程。在第一多个处理的每个相应处理中，对于分配给相应处理的记录组中的每个记录，将对该记录应用查询以产生零个或多个值。将零个或更多个发射操作符应用于零或更多产生的值中的每一个，以便将相应的信息添加到中间数据结构。来自多个中间数据结构的信息被聚合以产生输出数据。

3.

发明授权
Document compression system and method for use with tokenspace repository 有权
标题翻译：文档压缩系统和方法用于托管存储库

公开(公告)号：US07917480B2

公开(公告)日：2011-03-29

申请号：US10917739

申请日：2004-08-13

申请人： Jeffrey Dean , Gautham K. Thambidorai , Sanjay Ghemawat , Benedict Anthony Gomes , Olcan Sercinoglu

发明人： Jeffrey Dean , Gautham K. Thambidorai , Sanjay Ghemawat , Benedict Anthony Gomes , Olcan Sercinoglu

IPC分类号： G06F7/00 , G06F17/00 , G06F15/18

CPC分类号： G06F17/30864 , G06F17/30011 , G06F17/30371 , G06F17/30613

摘要： The disclosed embodiments enable multi-stage query scoring, including “snippet” generation, through incremental document reconstruction facilitated by a multi-tiered mapping scheme. The mapping scheme includes a first mapping between unique tokens contained in a set of documents and unique global token identifiers (e.g., 32-bit integers) contained in a global-lexicon (i.e., dictionary). The mapping scheme also includes a second mapping between the global token identifiers and a set of fixed-length local token identifiers (e.g., 8-bit integers) contained in one or more mini-lexicons (i.e., sub-dictionaries). Each mini-lexicon is associated with a range of token positions in the tokenized documents. The first and second mappings are used to encode/decode documents into local token identifiers having fixed widths which can be compactly stored in the tokenspace repository. The use of fixed-length local token identifiers allows for fast and efficient decoding of tokenized documents.

摘要翻译： 所公开的实施例通过由多层映射方案促进的增量文档重建能够实现多阶段查询评分，包括“代码段”生成。映射方案包括包含在一组文档中的唯一标记和包含在全局词典（即字典）中的唯一全局令牌标识符（例如，32位整数）之间的第一映射。映射方案还包括全局令牌标识符与包含在一个或多个小词典（即子词典）中的一组固定长度的本地令牌标识符（例如，8位整数）之间的第二映射。每个迷你词典与令牌化文档中的一系列令牌位置相关联。第一和第二映射用于将文档编码/解码为具有固定宽度的本地令牌标识符，其可以紧凑地存储在令牌空间存储库中。使用固定长度的本地令牌标识符可以快速有效地解码标记化文档。

4.

发明授权
System and method for large-scale data processing using an application-independent framework 有权
标题翻译：使用独立于应用程序的框架进行大规模数据处理的系统和方法

公开(公告)号：US08612510B2

公开(公告)日：2013-12-17

申请号：US12686292

申请日：2010-01-12

申请人： Jeffrey Dean , Sanjay Ghemawat

发明人： Jeffrey Dean , Sanjay Ghemawat

IPC分类号： G06F15/16

CPC分类号： G06F17/30339 , G06F9/4881 , G06F9/54 , G06F17/30377 , G06F17/30445

摘要： A large-scale data processing system and method for processing data in a distributed and parallel processing environment. The system includes an application-independent framework for processing data having a plurality of application-independent map modules and reduce modules. These application-independent modules use application-independent operators to automatically handle parallelization of computations across the distributed and parallel processing environment when performing user-specified data processing operations. The system also includes a plurality of user-specified, application-specific operators, for use with the application-independent framework to perform a user-specified data processing operation on a user-specified set of input files. The application-specific operators include: a map operator and a reduce operator. The map operator is applied by the application-independent map modules to input data in the user-specified set of input files to produce intermediate data values. The reduce operator is applied by the application-independent reduce modules to process the intermediate data values to produce final output data.

摘要翻译： 用于在分布式和并行处理环境中处理数据的大规模数据处理系统和方法。该系统包括用于处理具有多个独立于应用的地图模块并减少模块的数据的独立于应用的框架。这些独立于应用程序的模块在执行用户指定的数据处理操作时，使用独立于应用程序的运算符来自动处理分布式和并行处理环境中的计算并行化。该系统还包括多个用户指定的应用专用运营商，用于与应用无关的框架，以对用户指定的一组输入文件执行用户指定的数据处理操作。应用程序特定的运算符包括：map运算符和reduce运算符。映射运算符由应用无关映射模块应用于输入用户指定的输入文件集中的数据，以产生中间数据值。 reduce运算符由独立于应用程序的模块应用，以处理中间数据值以产生最终输出数据。

5.

发明申请
Anchor Tag Indexing in a Web Crawler System 有权
标题翻译：网络爬虫系统中的锚点标签索引

公开(公告)号：US20120066576A1

公开(公告)日：2012-03-15

申请号：US13300516

申请日：2011-11-18

申请人： Huican Zhu , Jeffrey Dean , Sanjay Ghemawat , Bwolen Po-Jen Yang , Anurag Acharya

发明人： Huican Zhu , Jeffrey Dean , Sanjay Ghemawat , Bwolen Po-Jen Yang , Anurag Acharya

IPC分类号： G06F15/00

CPC分类号： G06F17/30014 , G06F17/2235 , G06F17/241 , G06F17/2705 , G06F17/30321 , G06F17/30864

摘要： Provided is a method and system for indexing documents in a collection of linked documents. A link log, including one or more pairings of source documents and target documents is accessed. A sorted anchor map, containing one or more target document to source document pairings, is generated. The pairings in the sorted anchor map are ordered based on target document identifiers.

摘要翻译： 提供了一种用于在链接文档的集合中索引文档的方法和系统。链接日志，包括一个或多个源文档和目标文档的配对。生成包含一个或多个目标文档到源文档配对的排序的锚图。排序的锚图中的配对是基于目标文档标识符进行排序的。

6.

发明授权
System and method for providing load balanced processing 有权
标题翻译：提供负载平衡处理的系统和方法

公开(公告)号：US07386616B1

公开(公告)日：2008-06-10

申请号：US10447256

申请日：2003-05-27

申请人： Monika Hildegard Henzinger , Deborah Anne Wallach , Jeffrey Dean , Sanjay Ghemawat , Benjamin Thomas Smith , Luiz Andre Barroso

发明人： Monika Hildegard Henzinger , Deborah Anne Wallach , Jeffrey Dean , Sanjay Ghemawat , Benjamin Thomas Smith , Luiz Andre Barroso

IPC分类号： G06F15/173

CPC分类号： G06F9/505 , H04L67/1002 , H04L67/1008 , H04L67/1019

摘要： A system and method for providing load balanced processing is described. One or more files selected from a set of files are logically duplicated. At least one file and at least one logically duplicated file, is stored at one of a plurality of servers as specified in a load balancing layout. Execution of each operation in an operation stream is scheduled on the server storing at least one staged file required by the operation.

摘要翻译： 描述了用于提供负载平衡处理的系统和方法。从一组文件中选择的一个或多个文件在逻辑上被重复。至少一个文件和至少一个逻辑复制的文件被存储在如负载平衡布局中指定的多个服务器中的一个。操作流中的每个操作的执行在存储操作所需的至少一个分段文件的服务器上进行调度。

7.

发明授权
System and method for searching an extended database 有权
标题翻译：用于搜索扩展数据库的系统和方法

公开(公告)号：US07174346B1

公开(公告)日：2007-02-06

申请号：US10676650

申请日：2003-09-30

申请人： Kourosh Gharachorloo , Fay Wen Chang , Deborah Anne Wallach , Sanjay Ghemawat , Jeffrey Dean

发明人： Kourosh Gharachorloo , Fay Wen Chang , Deborah Anne Wallach , Sanjay Ghemawat , Jeffrey Dean

IPC分类号： G06F17/30

CPC分类号： G06F17/30864 , G06F17/30619 , Y10S707/99942 , Y10S707/99943 , Y10S707/99944 , Y10S707/99945

摘要： Once a search query is received from a user, a standard index is searched based on the search query. The standard index forms part of a set of replicated standard indexes having multiple instances of the standard index. A signal is then determined based on the search of the standard index. When the received signal meets predefined criteria, an extended index is searched. The extended index forms part of a set of extended indexes having at least one instance of the extended index. There are fewer instances of the extended index than instances of the standard index. Extended search results are then obtained from the extended index and at least a portion of the extended search results is transmitted towards a user.

摘要翻译： 一旦从用户接收到搜索查询，就会根据搜索查询来搜索标准索引。标准索引构成了具有标准索引的多个实例的一组复制标准索引的一部分。然后基于标准索引的搜索来确定信号。当接收到的信号满足预定标准时，搜索扩展索引。扩展索引构成一组具有扩展索引的至少一个实例的扩展索引的一部分。扩展索引的实例少于标准索引的实例。然后从扩展索引获得扩展搜索结果，并向用户发送扩展搜索结果的至少一部分。

8.

发明授权
Generating content snippets using a tokenspace repository 有权
标题翻译：使用令牌空间存储库生成内容片段

公开(公告)号：US08321445B2

公开(公告)日：2012-11-27

申请号：US13040220

申请日：2011-03-03

申请人： Jeffrey Dean , Gauthaum K. Thambidorai , Sanjay Ghemawat , Benedict Anthony Gomes , Olcan Sercinoglu

发明人： Jeffrey Dean , Gauthaum K. Thambidorai , Sanjay Ghemawat , Benedict Anthony Gomes , Olcan Sercinoglu

IPC分类号： G06F7/00 , G06F17/30

CPC分类号： G06F17/30864 , G06F17/30011 , G06F17/30371 , G06F17/30613

摘要： A search engine server system receives from a client system a search query and identifies a set of documents in accordance with the search query. A content snippet corresponding to content in a respective document of the identified set of documents is generated, the content snippet associated with at least one query term of the one or more query terms in the search query. A response to the search query is returned to the client system, the response including information identifying at least the respective document and including the content snippet. Generating the content snippet includes performing a first decompression operation on first token identifiers, from a compressed document repository, to provide a set of second token identifiers, and performing a second decompression operation on the set of second token identifiers to recover uncompressed content comprising a portion of the respective document.

摘要翻译： 搜索引擎服务器系统从客户端系统接收搜索查询，并根据搜索查询识别一组文档。产生对应于所识别的一组文档的相应文档中的内容的内容片段，该内容片段与搜索查询中的一个或多个查询词的至少一个查询词相关联。对搜索查询的响应被返回到客户端系统，响应包括至少标识相应文档并且包括内容片段的信息。生成内容片段包括对来自压缩文档库的第一令牌标识符执行第一解压缩操作，以提供一组第二令牌标识符，以及对所述第二令牌标识符集合执行第二解压缩操作，以恢复未压缩内容，其包括部分的相关文件。

9.

发明授权
System and method for analyzing data records 有权
标题翻译：用于分析数据记录的系统和方法

公开(公告)号：US08126909B2

公开(公告)日：2012-02-28

申请号：US12533955

申请日：2009-07-31

申请人： Robert C. Pike , Sean Quinlan , Sean M. Dorward , Jeffrey Dean , Sanjay Ghemawat

发明人： Robert C. Pike , Sean Quinlan , Sean M. Dorward , Jeffrey Dean , Sanjay Ghemawat

IPC分类号： G06F17/30

CPC分类号： G06F17/30501 , G06F11/1482 , G06F17/30545 , G06F17/30598 , Y10S707/99933 , Y10S707/99937

摘要： A method and system for analyzing data records includes allocating groups of records to respective processes of a first plurality of processes executing in parallel. In each respective process of the first plurality of processes, for each record in the group of records allocated to the respective process, a query is applied to the record so as to produce zero or more values. Zero or more emit operators are applied to each of the zero or more produced values so as to add corresponding information to an intermediate data structure. Information from a plurality of the intermediate data structures is aggregated to produce output data.

摘要翻译： 用于分析数据记录的方法和系统包括：将记录组分配给并行执行的第一多个进程的各个进程。在第一多个处理的每个相应处理中，对于分配给相应处理的记录组中的每个记录，将对该记录应用查询以产生零个或多个值。将零个或更多个发射操作符应用于零或更多产生的值中的每一个，以便将相应的信息添加到中间数据结构。来自多个中间数据结构的信息被聚合以产生输出数据。

10.

发明申请
Query Processing System and Method for Use with Tokenspace Repository 有权
标题翻译：查询处理系统和方法用于Tokenpace存储库

公开(公告)号：US20110153577A1

公开(公告)日：2011-06-23

申请号：US13040220

申请日：2011-03-03

申请人： Jeffrey Dean , Gauthaum K. Thambidorai , Sanjay Ghemawat , Benedict Anthony Gomes , Olcan Sercinoglu

发明人： Jeffrey Dean , Gauthaum K. Thambidorai , Sanjay Ghemawat , Benedict Anthony Gomes , Olcan Sercinoglu

IPC分类号： G06F17/30

CPC分类号： G06F17/30864 , G06F17/30011 , G06F17/30371 , G06F17/30613

摘要： A search engine server system receives from a client system a search query and identifies a set of documents in accordance with the search query. A content snippet corresponding to content in a respective document of the identified set of documents is generated, the content snippet associated with at least one query term of the one or more query terms in the search query. A response to the search query is returned to the client system, the response including information identifying at least the respective document and including the content snippet. Generating the content snippet includes performing a first decompression operation on first token identifiers, from a compressed document repository, to provide a set of second token identifiers, and performing a second decompression operation on the set of second token identifiers to recover uncompressed content comprising a portion of the respective document.

摘要翻译： 搜索引擎服务器系统从客户端系统接收搜索查询，并根据搜索查询识别一组文档。产生对应于所识别的一组文档的相应文档中的内容的内容片段，该内容片段与搜索查询中的一个或多个查询词的至少一个查询词相关联。对搜索查询的响应被返回到客户端系统，响应包括至少标识相应文档并且包括内容片段的信息。生成内容片段包括对来自压缩文档库的第一令牌标识符执行第一解压缩操作，以提供一组第二令牌标识符，以及对所述第二令牌标识符集合执行第二解压缩操作，以恢复未压缩内容，其包括部分的相关文件。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类