PROCESSING HIERARCHICAL DATA IN A MAP-REDUCE FRAMEWORK
    44.
    发明申请
    PROCESSING HIERARCHICAL DATA IN A MAP-REDUCE FRAMEWORK 有权
    在MAP-REDUCF框架中处理分层数据

    公开(公告)号:US20120324459A1

    公开(公告)日:2012-12-20

    申请号:US13598280

    申请日:2012-08-29

    IPC分类号: G06F9/46

    CPC分类号: G06F9/46 G06F9/5066

    摘要: Methods and arrangements for processing hierarchical data in a map-reduce framework. Hierarchical data is accepted, and a map-reduce job is performed on the hierarchical data. This performing of a map-reduce job includes determining a cost of partitioning the data, determining a cost of redefining the job and thereupon selectively performing at least one step taken from the group consisting of: partitioning the data and redefining the job.

    摘要翻译: 在map-reduce框架中处理分层数据的方法和布置。 层次数据被接受,并且对分层数据执行映射减少作业。 这种执行地图缩减工作包括确定分割数据的成本,确定重新定义作业的成本,并随后选择性地执行从由以下组成的组中的至少一个步骤:划分数据并重新定义作业。

    SYSTEMS AND METHODS FOR PROCESSING HIERARCHICAL DATA IN A MAP-REDUCE FRAMEWORK
    45.
    发明申请
    SYSTEMS AND METHODS FOR PROCESSING HIERARCHICAL DATA IN A MAP-REDUCE FRAMEWORK 有权
    在MAP-REDUCF框架中处理分层数据的系统和方法

    公开(公告)号:US20120311589A1

    公开(公告)日:2012-12-06

    申请号:US13118628

    申请日:2011-05-31

    IPC分类号: G06F9/46

    CPC分类号: G06F9/46 G06F9/5066

    摘要: Methods and arrangements for processing hierarchical data in a map-reduce framework. Hierarchical data is accepted, and a map-reduce job is performed on the hierarchical data. This performing of a map-reduce job includes determining a cost of partitioning the data, determining a cost of redefining the job and thereupon selectively performing at least one step taken from the group consisting of: partitioning the data and redefining the job.

    摘要翻译: 在map-reduce框架中处理分层数据的方法和布置。 层次数据被接受,并且对分层数据执行映射减少作业。 这种执行地图缩减工作包括确定分割数据的成本,确定重新定义作业的成本,并随后选择性地执行从由以下组成的组中的至少一个步骤:划分数据并重新定义作业。

    SYSTEMS AND METHODS FOR STANDARDIZATION AND DE-DUPLICATION OF ADDRESSES USING TAXONOMY
    47.
    发明申请
    SYSTEMS AND METHODS FOR STANDARDIZATION AND DE-DUPLICATION OF ADDRESSES USING TAXONOMY 有权
    使用税收的地址标准化和失效的系统和方法

    公开(公告)号:US20120047179A1

    公开(公告)日:2012-02-23

    申请号:US12859607

    申请日:2010-08-19

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30961

    摘要: Systems and associated methods for address standardization and applications related thereto are described. Embodiments exploit a common context in a taxonomy and a given address to detect and correct deviations in the address. Embodiments establish a possible path from a root of the taxonomy to a leaf in the taxonomy that can possibly generate a given address. Given a new address, embodiments use complete addresses, and/or segments or elements thereof, to compute the representations of the elements and find a closest matching leaf in the taxonomy. Embodiments then traverse the path to a root node to detect the agreement and disagreement between the path and the address entry. Taxonomical structured is thus used to detect, segregate and standardize the expected fields.

    摘要翻译: 描述用于地址标准化的系统和相关方法及其相关的应用。 实施例利用分类法和给定地址中的公共上下文来检测和纠正地址中的偏差。 实体建立了从分类的根到可能产生给定地址的分类中的叶的可行路径。 给定新的地址,实施例使用完整的地址和/或其部分或元素来计算元素的表示并在分类中找到最接近的匹配叶。 然后,实施例遍历到根节点的路径以检测路径和地址条目之间的协议和不一致。 因此,分类结构用于检测,分离和规范预期的领域。

    Single pass workload directed clustering of XML documents
    48.
    发明授权
    Single pass workload directed clustering of XML documents 失效
    单通道工作负载定向聚类XML文档

    公开(公告)号:US07512615B2

    公开(公告)日:2009-03-31

    申请号:US10703250

    申请日:2003-11-07

    IPC分类号: G06F7/00 G06F17/00

    摘要: A method and system for clustering of XML documents is disclosed. The method operates under specified memory-use constraints. The system implements the method and scans an XML document, assigns edge-weights according to the application workload, and maps clusters of XML nodes to disk pages, all in a single parser-controlled pass over the XML data. Application workload information is used to generate XML clustering solutions that lead to substantial reduction in page faults for the workload under consideration. Several approaches for representing workload information are disclosed. For example, the workload may list the XPath operators invoked during the application along with their invocation frequencies. The application workload can be further refined by incorporating additional features such as query importance or query compilation costs. XML access patterns could be also modeled using stochastic approaches.

    摘要翻译: 公开了一种用于XML文档聚类的方法和系统。 该方法在指定的内存使用限制下运行。 系统实现该方法并扫描XML文档,根据应用程序工作负载分配边缘权重,并将XML节点的集群映射到磁盘页面,所有这些都在XML数据的单个解析器控制的传递中。 应用程序工作负载信息用于生成XML集群解决方案,从而大大减少了所考虑的工作负载的页面错误。 公开了用于表示工作负载信息的几种方法。 例如,工作负载可能会列出应用程序中调用的XPath运算符及其调用频率。 可以通过结合附加功能(如查询重要性或查询编译成本)来进一步改进应用程序工作负载。 XML访问模式也可以使用随机方法建模。

    Method for estimating storage requirements for a multi-dimensional clustering data configuration
    50.
    发明授权
    Method for estimating storage requirements for a multi-dimensional clustering data configuration 失效
    用于估计多维群集数据配置的存储要求的方法

    公开(公告)号:US07440986B2

    公开(公告)日:2008-10-21

    申请号:US10993567

    申请日:2004-11-19

    IPC分类号: G06F17/30

    摘要: A storage requirements estimating system estimates the storage required for a proposed multidimensional clustering data by modeling wasted space. The amount of wasted space is modeled by calculating the cardinality of the unique value of the clustering key for the proposed configuration. Cardinality may be determined by estimation techniques. Specific values for wasted space and total space may be determined in response to the determined cardinality. Comparison of estimates for different proposed clustering configurations facilitate a selection among proposed multidimensional clustering data configurations.

    摘要翻译: 存储需求估计系统通过建模浪费的空间来估计所提出的多维聚类数据所需的存储。 通过计算建议配置的聚类密钥的唯一值的基数来建模浪费的空间量。 基数可以通过估计技术来确定。 浪费空间和总空间的具体值可以根据确定的基数来确定。 不同提议的聚类配置的估计的比较有助于提出的多维聚类数据配置之间的选择。