System and method of accessing a document efficiently through multi-tier web caching
    63.
    发明授权
    System and method of accessing a document efficiently through multi-tier web caching 有权
    通过多层网页缓存有效访问文档的系统和方法

    公开(公告)号:US08788475B2

    公开(公告)日:2014-07-22

    申请号:US13536701

    申请日:2012-06-28

    IPC分类号: G06F17/30

    摘要: Upon receipt of a document request, a client assistant examines its cache for the document. If not successful, a server searches for the requested document in its cache. If the server copy is still not fresh or not found, the server seeks the document from its host. If the host cannot provide the copy, the server seeks it from a document repository. Certain documents are identified from the document repository as being fresh or stable. Information about each of these identified documents is transmitted to the server which inserts entries into an index if the index does not already contain an entry for the document. If and when this particular document is requested, the document will not be present in the server, however the server will contain an entry directing the server to obtain the document from the document repository rather than the document's web host.

    摘要翻译: 在接收到文档请求时,客户端助理检查其文件的缓存。 如果不成功,服务器将在其缓存中搜索所请求的文档。 如果服务器副本仍然不新鲜或找不到,则服务器从其主机寻找文档。 如果主机无法提供副本,则服务器从文档存储库中查找它。 某些文件从文档库中确定为新鲜或稳定。 关于这些标识文档中的每一个的信息被传送到服务器,如果该索引尚未包含该文档的条目,则将该条目插入到索引中。 如果请求此特定文档时,该文档将不存在于服务器中,但是服务器将包含一个条目,指示服务器从文档存储库而不是文档的Web主机获取文档。

    Generating Content Snippets Using a Tokenspace Repository
    64.
    发明申请
    Generating Content Snippets Using a Tokenspace Repository 有权
    使用令牌空间存储库生成内容片段

    公开(公告)号:US20130212076A1

    公开(公告)日:2013-08-15

    申请号:US13685581

    申请日:2012-11-26

    IPC分类号: G06F17/30

    摘要: A search engine server system receives from a client system a search query and identifies a set of documents in accordance with the search query. A content snippet corresponding to content in a respective document of the identified set of documents is generated, the content snippet associated with at least one query term of the one or more query terms in the search query. A response to the search query is returned to the client system, the response including information identifying at least the respective document and including the content snippet. Generating the content snippet includes performing a first decompression operation on first token identifiers, from a compressed document repository, to provide a set of second token identifiers, and performing a second decompression operation on the set of second token identifiers to recover uncompressed content comprising a portion of the respective document.

    摘要翻译: 搜索引擎服务器系统从客户端系统接收搜索查询,并根据搜索查询识别一组文档。 产生对应于所识别的一组文档的相应文档中的内容的内容片段,该内容片段与搜索查询中的一个或多个查询词的至少一个查询词相关联。 对搜索查询的响应被返回到客户端系统,响应包括至少标识相应文档并且包括内容片段的信息。 生成内容片段包括对来自压缩文档库的第一令牌标识符执行第一解压缩操作,以提供一组第二令牌标识符,以及对所述第二令牌标识符集合执行第二解压缩操作,以恢复未压缩内容,其包括部分 的相关文件。

    System and Method of Accessing a Document Efficiently Through Multi-Tier Web Caching
    66.
    发明申请
    System and Method of Accessing a Document Efficiently Through Multi-Tier Web Caching 有权
    通过多层Web缓存高效访问文档的系统和方法

    公开(公告)号:US20120271852A1

    公开(公告)日:2012-10-25

    申请号:US13536701

    申请日:2012-06-28

    IPC分类号: G06F17/30

    摘要: Upon receipt of a document request, a client assistant examines its cache for the document. If not successful, a server searches for the requested document in its cache. If the server copy is still not fresh or not found, the server seeks the document from its host. If the host cannot provide the copy, the server seeks it from a document repository. Certain documents are identified from the document repository as being fresh or stable. Information about each of these identified documents is transmitted to the server which inserts entries into an index if the index does not already contain an entry for the document. If and when this particular document is requested, the document will not be present in the server, however the server will contain an entry directing the server to obtain the document from the document repository rather than the document's web host.

    摘要翻译: 在接收到文档请求时,客户端助理检查其文件的缓存。 如果不成功,服务器将在其缓存中搜索所请求的文档。 如果服务器副本仍然不新鲜或找不到,则服务器从其主机寻找文档。 如果主机无法提供副本,则服务器从文档存储库中查找它。 某些文件从文档库中确定为新鲜或稳定。 关于这些标识文档中的每一个的信息被传送到服务器,如果索引尚未包含文档的条目,则将该条目插入到索引中。 如果请求此特定文档时,该文档将不存在于服务器中,但是服务器将包含一个条目,指示服务器从文档存储库而不是文档的Web主机获取文档。

    Garbage collecting systems and methods
    67.
    发明授权
    Garbage collecting systems and methods 有权
    垃圾收集系统和方法

    公开(公告)号:US07865536B1

    公开(公告)日:2011-01-04

    申请号:US10608039

    申请日:2003-06-30

    IPC分类号: G06F7/00 G06F17/00 G06F12/00

    摘要: A system facilitates the deletion of data, such as files, orphaned chunks, and stale replicas. The system may identify a file to be deleted, rename the identified file, permanently delete the renamed file a predetermined amount of time after renaming the identified file as part of a garbage collection process, receive, from the servers, information concerning chunks stored by the servers, and identify, to the servers, ones of the chunks that do not exist possibly due to the permanent deletion of the renamed file. The system may further provide a mapping of file names to chunks, identify chunks, as orphaned chunks, that are not reachable from any of the file names, delete the orphaned chunks, receive, from the servers, information concerning chunks stored by the servers, and identify, to the servers, ones of the chunks that are orphaned chunks. The system may also associate version information with replicas of chunks, identify stale replicas based on the associated version information, delete the stale replicas, receive, from the servers, information concerning replicas stored by the servers, and identify, to the servers, ones of the replicas that are stale replicas.

    摘要翻译: 系统便于删除数据,例如文件,孤立的块和陈旧的副本。 系统可以识别要删除的文件,重新命名所识别的文件,在将所识别的文件重命名为垃圾收集过程的一部分之后,将重命名的文件永久删除一段预定的时间,从服务器接收关于由 服务器,并且由于永久删除重命名的文件,向服务器识别不存在的块。 该系统还可以提供文件名与块的映射,识别从任何文件名不可访问的孤立的块,删除孤立的块,从服务器接收关于由服务器存储的块的信息, 并向服务器识别那些成为孤立块的块。 该系统还可以将版本信息与块的副本相关联,基于相关联的版本信息识别不复制的副本,删除陈旧的副本,从服务器接收与服务器所存储的副本有关的信息,并向服务器标识 复制品是陈旧的副本。

    Large-scale data processing in a distributed and parallel processing enviornment
    68.
    发明授权
    Large-scale data processing in a distributed and parallel processing enviornment 有权
    在分布式和并行处理环境中进行大规模数据处理

    公开(公告)号:US07756919B1

    公开(公告)日:2010-07-13

    申请号:US10871245

    申请日:2004-06-18

    IPC分类号: G06F15/16

    摘要: A large-scale data processing system and method includes one or more application-independent map modules configured to read input data and to apply at least one application-specific map operation to the input data to produce intermediate data values, wherein the map operation is automatically parallelized across multiple processors in the parallel processing environment. A plurality of intermediate data structures are used to store the intermediate data values. One or more application-independent reduce modules are configured to retrieve the intermediate data values and to apply at least one application-specific reduce operation to the intermediate data values to provide output data.

    摘要翻译: 大规模数据处理系统和方法包括一个或多个独立于应用的地图模块,其被配置为读取输入数据并将至少一个应用特定地图操作应用于输入数据以产生中间数据值,其中地图操作是自动的 在并行处理环境中跨多个处理器并行化。 使用多个中间数据结构来存储中间数据值。 一个或多个独立于应用的减少模块被配置为检索中间数据值并且将至少一个特定于应用的减少操作应用于中间数据值以提供输出数据。

    System and Method for Large-Scale Data Processing Using an Application-Independent Framework
    69.
    发明申请
    System and Method for Large-Scale Data Processing Using an Application-Independent Framework 有权
    使用独立于应用程序的框架进行大规模数据处理的系统和方法

    公开(公告)号:US20100122065A1

    公开(公告)日:2010-05-13

    申请号:US12686292

    申请日:2010-01-12

    IPC分类号: G06F9/38 G06F9/30

    摘要: A large-scale data processing system and method for processing data in a distributed and parallel processing environment. The system includes an application-independent framework for processing data having a plurality of application-independent map modules and reduce modules. These application-independent modules use application-independent operators to automatically handle parallelization of computations across the distributed and parallel processing environment when performing user-specified data processing operations. The system also includes a plurality of user-specified, application-specific operators, for use with the application-independent framework to perform a user-specified data processing operation on a user-specified set of input files. The application-specific operators include: a map operator and a reduce operator. The map operator is applied by the application-independent map modules to input data in the user-specified set of input files to produce intermediate data values. The reduce operator is applied by the application-independent reduce modules to process the intermediate data values to produce final output data.

    摘要翻译: 用于在分布式和并行处理环境中处理数据的大规模数据处理系统和方法。 该系统包括用于处理具有多个独立于应用的地图模块并减少模块的数据的独立于应用的框架。 这些独立于应用程序的模块在执行用户指定的数据处理操作时,使用独立于应用程序的运算符来自动处理分布式和并行处理环境中的计算并行化。 该系统还包括多个用户指定的应用专用运营商,用于与应用无关的框架,以对用户指定的一组输入文件执行用户指定的数据处理操作。 应用程序特定的运算符包括:map运算符和reduce运算符。 映射运算符由应用无关映射模块应用于输入用户指定的输入文件集中的数据,以产生中间数据值。 reduce运算符由独立于应用程序的模块应用,以处理中间数据值以产生最终输出数据。

    System and method for analyzing data records
    70.
    发明授权
    System and method for analyzing data records 有权
    用于分析数据记录的系统和方法

    公开(公告)号:US07590620B1

    公开(公告)日:2009-09-15

    申请号:US10954692

    申请日:2004-09-29

    IPC分类号: G06F17/30

    摘要: A method and system for analyzing data records includes allocating groups of records to respective processes of a first plurality of processes executing in parallel. In each respective process of the first plurality of processes, for each record in the group of records allocated to the respective process, a query is applied to the record so as to produce zero or more values. Zero or more emit operators are applied to each of the zero or more produced values so as to add corresponding information to an intermediate data structure. Information from a plurality of the intermediate data structures is aggregated to produce output data.

    摘要翻译: 用于分析数据记录的方法和系统包括:将记录组分配给并行执行的第一多个进程的各个进程。 在第一多个处理的每个相应处理中,对于分配给相应处理的记录组中的每个记录,将对该记录应用查询以产生零个或多个值。 将零个或更多个发射操作符应用于零或更多产生的值中的每一个,以便将相应的信息添加到中间数据结构。 来自多个中间数据结构的信息被聚合以产生输出数据。