Organizing Data in a Distributed Storage System
    31.
    发明申请
    Organizing Data in a Distributed Storage System 有权
    在分布式存储系统中组织数据

    公开(公告)号:US20130339295A1

    公开(公告)日:2013-12-19

    申请号:US13898411

    申请日:2013-05-20

    IPC分类号: G06F17/30

    摘要: A distributed storage system is provided. The distributed storage system includes multiple front-end servers and zones for managing data for clients. Data within the distributed storage system is associated with a plurality of accounts and divided into a plurality of groups, each group including a plurality of splits, each split being associated with a respective account, and each group having multiple tablets and each tablet managed by a respective tablet server of the distributed storage system. Data associated with different accounts may be replicated within the distributed storage system using different data replication policies. There is no limit to the amount of data for an account by adding new splits to the distributed storage system. In response to a client request for a particular account's data, a front-end server communicates such request to a particular zone that has the client-requested data and returns the client-requested data to the requesting client.

    摘要翻译: 提供分布式存储系统。 分布式存储系统包括多个前端服务器和用于管理客户端数据的区域。 分布式存储系统内的数据与多个帐户相关联,并被分成多个组,每个组包括多个分组,每个分组与相应的帐户相关联,并且每组具有多个平板电脑,每个分组由 分布式存储系统的平板电脑服务器。 可以使用不同的数据复制策略在分布式存储系统内复制与不同帐户相关联的数据。 通过向分布式存储系统添加新的拆分,帐户数据的数量没有限制。 响应于客户端对特定帐户的数据的请求,前端服务器将该请求传送到具有客户端请求的数据的特定区域,并将客户端请求的数据返回给请求客户端。

    System and method of accessing a document efficiently through multi-tier web caching
    32.
    发明授权
    System and method of accessing a document efficiently through multi-tier web caching 有权
    通过多层网页缓存有效访问文档的系统和方法

    公开(公告)号:US08275790B2

    公开(公告)日:2012-09-25

    申请号:US12251413

    申请日:2008-10-14

    IPC分类号: G06F17/30

    摘要: Upon receipt of a document request, a client assistant examines its cache for the document. If not successful, a server searches for the requested document in its cache. If the server copy is still not fresh or not found, the server seeks the document from its host. If the host cannot provide the copy, the server seeks it from a document repository. Certain documents are identified from the document repository as being fresh or stable. Information about each these identified documents is transmitted to the server which inserts entries into an index if the index does not already contain an entry for the document. If and when this particular document is requested, the document will not be present in the server, however the server will contain an entry directing the server to obtain the document from the document repository rather than the document's web host.

    摘要翻译: 在接收到文档请求时,客户端助理检查其文件的缓存。 如果不成功,服务器将在其缓存中搜索所请求的文档。 如果服务器副本仍然不新鲜或找不到,则服务器从其主机寻找文档。 如果主机无法提供副本,则服务器从文档存储库中查找它。 某些文件从文档库中确定为新鲜或稳定。 关于每个这些标识的文档的信息被传送到服务器,如果索引还没有包含文档的条目,则将该条目插入到索引中。 如果请求此特定文档时,该文档将不存在于服务器中,但是服务器将包含一个条目,指示服务器从文档存储库而不是文档的Web主机获取文档。

    Systems and methods for replicating data
    33.
    发明授权
    Systems and methods for replicating data 有权
    用于复制数据的系统和方法

    公开(公告)号:US08065268B1

    公开(公告)日:2011-11-22

    申请号:US12727138

    申请日:2010-03-18

    IPC分类号: G06F17/30

    摘要: A system facilitates the distribution and redistribution of chunks of data among multiple servers. The system may identify servers to store a replica of the data based on at least one of utilization of the servers, prior data distribution involving the servers, and failure correlation properties associated with the servers, and place the replicas of the data at the identified servers. The system may also monitor total numbers of replicas of the chunks available in the system, identify chunks that have a total number of replicas below one or more chunk thresholds, assign priorities to the identified chunks, and re-replicate the identified chunks based substantially on the assigned priorities. The system may further monitor utilization of the servers, determine whether to redistribute any of the replicas, select one or more of the replicas to redistribute based on the utilization of the servers, select one or more of the servers to which to move the one or more replicas, and move the one or more replicas to the selected one or more servers.

    摘要翻译: 系统便于在多个服务器之间分发和重新分发数据块。 该系统可以基于服务器的使用,涉及服务器的先前数据分发以及与服务器相关联的故障相关属性中的至少一个来识别服务器来存储数据的副本,并将数据的副本放置在所识别的服务器 。 该系统还可以监视系统中可用的块的副本的总数,识别具有低于一个或多个块阈值的总副本数量的块,为所识别的块分配优先级,并基于实质上重新复制所识别的块 分配的优先级。 该系统可以进一步监视服务器的利用率,确定是否重新分发任何副本,基于服务器的使用选择一个或多个副本以重新分配,选择一个或多个服务器来移动一个或多个 更多的副本,并将一个或多个副本移动到所选的一个或多个服务器。

    System and method of accessing a document efficiently through multi-tier web caching
    34.
    发明授权
    System and method of accessing a document efficiently through multi-tier web caching 有权
    通过多层网页缓存有效访问文档的系统和方法

    公开(公告)号:US07437364B1

    公开(公告)日:2008-10-14

    申请号:US10882795

    申请日:2004-06-30

    IPC分类号: G06F17/30

    摘要: Upon receipt of a document request, a client assistant examines its cache for the document. If not successful, a server searches for the requested document in its cache. If the server copy is still not fresh or not found, the server seeks the document from its host. If the host cannot provide the copy, the server seeks it from a document repository. Certain documents are identified from the document repository as being fresh or stable. Information about each these identified documents is transmitted to the server which inserts entries into an index if the index does not already contain an entry for the document. If and when this particular document is requested, the document will not be present in the server, however the server will contain an entry directing the server to obtain the document from the document repository rather than the document's web host.

    摘要翻译: 在接收到文档请求时,客户端助理检查其文件的缓存。 如果不成功,服务器将在其缓存中搜索所请求的文档。 如果服务器副本仍然不新鲜或找不到,则服务器从其主机寻找文档。 如果主机无法提供副本,则服务器从文档存储库中查找它。 某些文件从文档库中确定为新鲜或稳定。 关于每个这些标识的文档的信息被传送到服务器,如果索引还没有包含文档的条目,则将该条目插入到索引中。 如果请求此特定文档时,该文档将不存在于服务器中,但是服务器将包含一个条目,指示服务器从文档存储库而不是文档的Web主机获取文档。

    Connectivity server for locating linkage information between Web pages
    36.
    发明授权
    Connectivity server for locating linkage information between Web pages 失效
    用于在网页之间查找链接信息的连接服务器

    公开(公告)号:US6073135A

    公开(公告)日:2000-06-06

    申请号:US37350

    申请日:1998-03-10

    IPC分类号: G06F17/30

    摘要: A server computer is provided for representing and navigating the connectivity of Web pages. The Web pages include links to other Web pages. The links and Web page s have associated names (URLs). The names of the Web pages are sorted in a memory of the connectivity server. The sorted names are delta encoded while periodically storing full names as checkpoints in the memory. Each delta encoded name and checkpoint has a unique identification. A list of pairs of identifications representing existent links is sorted twice, first according to the first identification of each pair to produce an inlist, and second according to the second identification of each pair to produce an outlist. An array of elements is stored in the memory, there is one array element for each Web page. Each element includes a first pointer to one of the checkpoints, a second pointer to an associated inlist of the Web page, and a third pointer to an associated outlist of the Web page. The array is indexed by a particular identification to locate connected Web pages.

    摘要翻译: 提供服务器计算机用于表示和浏览网页的连接。 网页包含指向其他网页的链接。 链接和网页都有相关联的名称(URL)。 网页的名称在连接服务器的内存中排序。 排序的名称是增量编码的,同时周期性地将全名作为检查点存储在内存中。 每个delta编码的名称和检查点都有唯一的标识。 代表存在的链接的标识对的列表被分类两次,首先根据每对的第一个标识来产生一个列表,其次是根据每一对的第二个标识来产生一个列表。 元素数组存储在内存中,每个网页有一个数组元素。 每个元素包括指向其中一个检查点的第一指针,指向该网页的相关联列表的第二指针,以及指向该网页的相关联的列表的第三指针。 该阵列由特定的标识索引,以定位连接的网页。

    Associating summaries with pointers in persistent data structures
    37.
    发明授权
    Associating summaries with pointers in persistent data structures 有权
    将摘要与持久性数据结构中的指针相关联

    公开(公告)号:US09002860B1

    公开(公告)日:2015-04-07

    申请号:US13366934

    申请日:2012-02-06

    申请人: Sanjay Ghemawat

    发明人: Sanjay Ghemawat

    IPC分类号: G06F17/30 G06F12/02

    摘要: Methods for organizing and retrieving data values in a persistent data structure are provided. Data values are grouped into data blocks and pointers are obtained for each data block. In addition, one or more summaries, related to a properties of the data block, are created and associated with the data block's pointer. The summaries allow for a more efficient retrieval of data values from the data structure by preventing unnecessary retrieval calls to persistent storage when the summaries do not match query criteria.

    摘要翻译: 提供了在持久数据结构中组织和检索数据值的方法。 数据值被分组成数据块,并且为每个数据块获得指针。 此外,与数据块的属性相关的一个或多个摘要被创建并与数据块的指针相关联。 总结允许从数据结构更有效地检索数据值,当汇总不符合查询条件时,可以防止对永久存储进行不必要的检索。

    Identification of semantic units from within a search query
    38.
    发明授权
    Identification of semantic units from within a search query 有权
    从搜索查询中识别语义单位

    公开(公告)号:US08719262B1

    公开(公告)日:2014-05-06

    申请号:US13616094

    申请日:2012-09-14

    IPC分类号: G06F17/30

    摘要: A search engine for searching a corpus improves the relevancy of the results by classifying multiple terms in a search query as a single semantic unit. A semantic unit locator of the search engine generates a subset of documents that are generally relevant to the query based on the individual terms within the query. Combinations of search terms that define potential semantic units from the query are then evaluated against the subset of documents to determine which combinations of search terms should be classified as a semantic unit. The resultant semantic units are used to refine the results of the search.

    摘要翻译: 用于搜索语料库的搜索引擎通过将搜索查询中的多个项目分类为单个语义单元来提高结果的相关性。 搜索引擎的语义单元定位器基于查询中的各个术语生成通常与查询相关的文档的子集。 然后根据文档子集来评估从查询定义潜在语义单元的搜索项的组合,以确定搜索词的哪些组合应该被分类为语义单元。 所得到的语义单位用于细化搜索结果。

    System and method for large-scale data processing using an application-independent framework
    40.
    发明授权
    System and method for large-scale data processing using an application-independent framework 有权
    使用独立于应用程序的框架进行大规模数据处理的系统和方法

    公开(公告)号:US08612510B2

    公开(公告)日:2013-12-17

    申请号:US12686292

    申请日:2010-01-12

    IPC分类号: G06F15/16

    摘要: A large-scale data processing system and method for processing data in a distributed and parallel processing environment. The system includes an application-independent framework for processing data having a plurality of application-independent map modules and reduce modules. These application-independent modules use application-independent operators to automatically handle parallelization of computations across the distributed and parallel processing environment when performing user-specified data processing operations. The system also includes a plurality of user-specified, application-specific operators, for use with the application-independent framework to perform a user-specified data processing operation on a user-specified set of input files. The application-specific operators include: a map operator and a reduce operator. The map operator is applied by the application-independent map modules to input data in the user-specified set of input files to produce intermediate data values. The reduce operator is applied by the application-independent reduce modules to process the intermediate data values to produce final output data.

    摘要翻译: 用于在分布式和并行处理环境中处理数据的大规模数据处理系统和方法。 该系统包括用于处理具有多个独立于应用的地图模块并减少模块的数据的独立于应用的框架。 这些独立于应用程序的模块在执行用户指定的数据处理操作时,使用独立于应用程序的运算符来自动处理分布式和并行处理环境中的计算并行化。 该系统还包括多个用户指定的应用专用运营商,用于与应用无关的框架,以对用户指定的一组输入文件执行用户指定的数据处理操作。 应用程序特定的运算符包括:map运算符和reduce运算符。 映射运算符由应用无关映射模块应用于输入用户指定的输入文件集中的数据,以产生中间数据值。 reduce运算符由独立于应用程序的模块应用,以处理中间数据值以产生最终输出数据。