Pruning of blob replicas
    2.
    发明授权
    Pruning of blob replicas 有权
    修剪blob副本

    公开(公告)号:US08744997B2

    公开(公告)日:2014-06-03

    申请号:US13022213

    申请日:2011-02-07

    IPC分类号: G06F17/00 G06F7/00

    摘要: A system and method generating and distributing replica removal requests for objects in a distributed storage system is provided. Replica removal requests for objects in a distributed storage system are generated based at least in part on replication policies for the objects. A respective replica removal request instructs a respective instance of the distributed storage system to remove a respective replica of the respective object so as to at least partially satisfy replication policies for the respective object. Then the replica removal requests for the objects in the distributed storage system are distributed to respective instances of the distributed storage system corresponding to the replica removal requests for execution.

    摘要翻译: 提供了一种在分布式存储系统中生成和分发对象的副本去除请求的系统和方法。 至少部分地基于对象的复制策略生成分布式存储系统中对象的副本删除请求。 相应的副本删除请求指示分布式存储系统的相应实例去除相应对象的相应副本,以便至少部分地满足相应对象的复制策略。 然后,将分布式存储系统中的对象的副本删除请求分发到与要执行的副本删除请求相对应的分布式存储系统的相应实例。

    Index server architecture using tiered and sharded phrase posting lists
    3.
    发明授权
    Index server architecture using tiered and sharded phrase posting lists 有权
    索引服务器架构使用分层和分层的短语发布列表

    公开(公告)号:US08682901B1

    公开(公告)日:2014-03-25

    申请号:US13332278

    申请日:2011-12-20

    IPC分类号: G01F7/00

    摘要: An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are extracted from the document collection. Documents are the indexed according to their included phrases, using phrase posting lists. The phrase posting lists are stored in an cluster of index servers. The phrase posting lists can be tiered into groups, and sharded into partitions. Phrases in a query are identified based on possible phrasifications. A query schedule based on the phrases is created from the phrases, and then optimized to reduce query processing and communication costs. The execution of the query schedule is managed to further reduce or eliminate query processing operations at various ones of the index servers.

    摘要翻译: 信息检索系统使用短语来索引,检索,组织和描述文档。 短语从文档集中提取。 文件根据所包含的短语索引,使用短语发布列表。 短语发布列表存储在索引服务器的集群中。 短语列表可以分组成分组,并分成分区。 查询中的短语是根据可能的短语来确定的。 从短语中创建基于短语的查询调度,然后进行优化,以减少查询处理和通信成本。 管理查询调度的执行以进一步减少或消除索引服务器中的各个查询处理操作。

    Index server architecture using tiered and sharded phrase posting lists
    4.
    发明授权
    Index server architecture using tiered and sharded phrase posting lists 有权
    索引服务器架构使用分层和分层的短语发布列表

    公开(公告)号:US07693813B1

    公开(公告)日:2010-04-06

    申请号:US11694780

    申请日:2007-03-30

    IPC分类号: G06F17/30

    摘要: An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are extracted from the document collection. Documents are the indexed according to their included phrases, using phrase posting lists. The phrase posting lists are stored in an cluster of index servers. The phrase posting lists can be tiered into groups, and sharded into partitions. Phrases in a query are identified based on possible phrasifications. A query schedule based on the phrases is created from the phrases, and then optimized to reduce query processing and communication costs. The execution of the query schedule is managed to further reduce or eliminate query processing operations at various ones of the index servers.

    摘要翻译: 信息检索系统使用短语来索引,检索,组织和描述文档。 短语从文档集中提取。 文件根据所包含的短语索引,使用短语发布列表。 短语发布列表存储在索引服务器的集群中。 短语列表可以分组成分组,并分成分区。 查询中的短语是根据可能的短语来确定的。 从短语中创建基于短语的查询调度,然后进行优化,以减少查询处理和通信成本。 管理查询调度的执行以进一步减少或消除索引服务器中的各个查询处理操作。

    Modifying a hierarchical data structure according to a pseudo-rendering of a structured document by annotating and merging nodes
    5.
    发明授权
    Modifying a hierarchical data structure according to a pseudo-rendering of a structured document by annotating and merging nodes 有权
    通过注释和合并节点,根据结构化文档的伪渲染来修改分层数据结构

    公开(公告)号:US09069855B2

    公开(公告)日:2015-06-30

    申请号:US13053154

    申请日:2011-03-21

    申请人: Yonatan Zunger

    发明人: Yonatan Zunger

    IPC分类号: G06F17/22 G06F17/30

    摘要: A structured document is translated into an initial hierarchical data structure in accordance with syntactic elements defined in the structured document. The initial hierarchical data structure includes a plurality of nodes, and each node corresponds to one of the syntactic elements. The method then annotates a node with a set of attributes including geometric parameters of semantic elements in the structured document that are associated with the node in accordance with a pseudo-rendering of the structured document. Finally, the method merges the nodes in the initial hierarchical data structure into a tree of merged nodes in accordance with their respective attributes and a set of predefined rules such that each merged node is associated with a semantically distinct region of the pseudo-rendered document. The predefined rules include rules for merging nodes associated with semantic elements that have nearby positions and/or compatible attributes in the pseudo-rendered document.

    摘要翻译: 结构化文档根据结构化文档中定义的句法元素被翻译成初始分层数据结构。 初始层次数据结构包括多个节点,并且每个节点对应于语法元素之一。 该方法然后根据结构化文档的伪呈现,利用一组属性来注释节点,包括与该节点相关联的结构化文档中的语义元素的几何参数。 最后,该方法根据其各自的属性和一组预定义的规则将初始分层数据结构中的节点合并到合并树的树中,使得每个合并的节点与伪呈现的文档的语义上不同的区域相关联。 预定义规则包括用于合并与伪渲染文档中具有附近位置和/或兼容属性的语义元素相关联的节点的规则。

    Method and system for efficiently replicating data in non-relational databases

    公开(公告)号:US08380659B2

    公开(公告)日:2013-02-19

    申请号:US12703167

    申请日:2010-02-09

    申请人: Yonatan Zunger

    发明人: Yonatan Zunger

    IPC分类号: G06F7/00

    摘要: A method replicates data between instances of a distributed database. The method identifies at least two instances of the database at distinct geographic locations. The method tracks changes to the database by storing deltas. Each delta has a row identifier that identifies the piece of data modified, a sequence identifier that specifies the order in which the deltas are applied to the data, and an instance identifier that specifies where the delta was created. The method determines which deltas to send using an egress map that specifies which combinations of row identifier and sequence identifier have been acknowledged as received at other instances. The method builds a transmission matrix that identifies deltas that have not yet been acknowledged as received. The method then transmits deltas identified in the transmission matrix. After receiving acknowledgement that transmitted deltas have been incorporated into databases at other instances, the method updates the egress map.

    Method and System for Efficiently Replicating Data in Non-Relational Databases
    7.
    发明申请
    Method and System for Efficiently Replicating Data in Non-Relational Databases 有权
    在非关系数据库中有效复制数据的方法和系统

    公开(公告)号:US20120310903A1

    公开(公告)日:2012-12-06

    申请号:US13588993

    申请日:2012-08-17

    申请人: Yonatan Zunger

    发明人: Yonatan Zunger

    IPC分类号: G06F7/00 G06F17/30

    摘要: A method replicates data between instances of a distributed database. The method identifies at least two instances of the database at distinct geographic locations. The method tracks changes to the database by storing deltas. Each delta has a row identifier that identifies the piece of data modified, a sequence identifier that specifies the order in which the deltas are applied to the data, and an instance identifier that specifies where the delta was created. The method determines which deltas to send using an egress map that specifies which combinations of row identifier and sequence identifier have been acknowledged as received at other instances. The method builds a transmission matrix that identifies deltas that have not yet been acknowledged as received. The method then transmits deltas identified in the transmission matrix. After receiving acknowledgement that transmitted deltas have been incorporated into databases at other instances, the method updates the egress map.

    摘要翻译: 一种方法在分布式数据库的实例之间复制数据。 该方法在不同的地理位置识别数据库的至少两个实例。 该方法通过存储三角形跟踪数据库的更改。 每个增量都有一个行标识符,用于标识修改的数据片段,一个指定三角形应用于数据的顺序的序列标识符,以及一个指定增量创建位置的实例标识符。 该方法使用出口映射确定要发送的哪个增量,该出口映射指定哪些行标识符和序列标识符的组合在其他实例中被确认为已被接收。 该方法构建一个传输矩阵,用于标识尚未被确认为收到的三角洲。 该方法然后发送在传输矩阵中标识的三角洲。 在其他实例收到确认传输的增量已被并入数据库之后,该方法更新了出口地图。

    Determining Semantically Distinct Regions of a Document

    公开(公告)号:US20110173528A1

    公开(公告)日:2011-07-14

    申请号:US13053156

    申请日:2011-03-21

    申请人: Yonatan Zunger

    发明人: Yonatan Zunger

    IPC分类号: G06F17/00

    摘要: A structured document is translated into an initial hierarchical data structure in accordance with syntactic elements defined in the structured document. The initial hierarchical data structure includes a plurality of nodes, and each node corresponds to one of the syntactic elements. The method then annotates a node with a set of attributes including geometric parameters of semantic elements in the structured document that are associated with the node in accordance with a pseudo-rendering of the structured document. Finally, the method merges the nodes in the initial hierarchical data structure into a tree of merged nodes in accordance with their respective attributes and a set of predefined rules such that each merged node is associated with a semantically distinct region of the pseudo-rendered document. The predefined rules include rules for merging nodes associated with semantic elements that have nearby positions and/or compatible attributes in the pseudo-rendered document.

    Automatic determination of whether a document includes an image gallery
    9.
    发明授权
    Automatic determination of whether a document includes an image gallery 有权
    自动确定文档是否包含图像库

    公开(公告)号:US07788258B1

    公开(公告)日:2010-08-31

    申请号:US10871030

    申请日:2004-06-21

    IPC分类号: G06F7/00 G06F17/30

    摘要: Image galleries are automatically located within documents, such as web pages. Documents that are determined to contain image galleries may be treated differently when storing the document for later retrieval by an image search engine. In one implementation, the image galleries are automatically located within a document by calculating position information indicating relative positions of images in the document. The document may be determined to contain an image gallery when the position information indicates that the images in the document are generally evenly distributed.

    摘要翻译: 图像库自动位于文档(如网页)中。 确定包含图像库的文档在存储文档以供图像搜索引擎稍后检索时可以被不同地对待。 在一个实现中,图像库通过计算指示文档中的图像的相对位置的位置信息而自动位于文档内。 当位置信息指示文档中的图像大致均匀分布时,文档可以被确定为包含图像库。

    Phrase extraction using subphrase scoring

    公开(公告)号:US09355169B1

    公开(公告)日:2016-05-31

    申请号:US13615541

    申请日:2012-09-13

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30616 G06F17/30864

    摘要: An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are extracted from the document collection. Documents are the indexed according to their included phrases, using phrase posting lists. The phrase posting lists are stored in an cluster of index servers. The phrase posting lists can be tiered into groups, and sharded into partitions. Phrases in a query are identified based on possible phrasifications. A query schedule based on the phrases is created from the phrases, and then optimized to reduce query processing and communication costs. The execution of the query schedule is managed to further reduce or eliminate query processing operations at various ones of the index servers.