摘要:
A new file system partition is added to an existing partition in disk storage space by creating a new file in the existing storage space of the existing partition and giving this file the attributes of a partition. This new file having partition attributes is referred to as a "raw file." Apparatus in a computing system for creating and accessing a raw file would comprise a storage system controller for creating a raw file of a predetermined size with the attributes of a partition, a storage space driver for accessing storage space in a data storage system and a storage access control for translating an access request for a raw file to an actual address for the raw file so the storage space driver can access the raw file based on the actual address for the raw file. Computer implemented steps create a first file of a predetermined size in a first disk file system, allocate storage locations in the first disk file system to accommodate the storage space required by the first file, store a first file allocation map indicating storage locations allocated to the first file, and convert the first file to a raw file with a unique identifier as a file partition using the same storage locations allocated to the first file. The raw file is accessed by transforming the access request for a raw file to an actual address for a storage device driver.
摘要:
A method and system are disclosed for generating a decision-tree classifier in parallel in a multi-processor system, from a training set of records. The method comprises the steps of: partitioning the records among the processors, each processor generating an attribute list for each attribute, and the processors cooperatively generating a decision tree by repeatedly partitioning the records using the attribute lists. For each node, each processor determines its best split test and, along with other processors, selects the best overall split for the records at that node. Preferably, the gini-index and class histograms are used in determining the best splits. Also, each processor builds a hash table using the attribute list of the split attribute and shares it with other processors. The hash tables are used for splitting the remaining attribute lists. The created tree is then pruned based on the MDL principle, which encodes the tree and split tests in an MDL-based code, and determines whether to prune and how to prune each node based on the code length of the node.
摘要:
A phrase recognition method breaks streams of text into text "chunks" and selects certain chunks as "phrases" useful for automated full text searching. The phrase recognition method uses a carefully assembled list of partition elements to partition the text into the chunks, and selects phrases from the chunks according to a small number of frequency based definitions. The method can also incorporate additional processes such as categorization of proper names to enhance phrase recognition. The method selects phrases quickly and efficiently, referring simply to the phrases themselves and the frequency with which they are encountered, rather than relying on complex, time-consuming, resource-consuming grammatical analysis, or on collocation schemes of limited applicability, or on heuristical text analysis of limited reliability or utility.
摘要:
A computer data storage management system includes a memory employing a hierarchical data structure comprising a plurality of nodes (root, branch and leaf), in particular a multi-dimensional information database. The branch nodes are index nodes and the leaf nodes are data nodes. The index nodes are arranged in an index tree structure. When extra information inserted into the memory results in index node overflow, the index node is split and, in certain specified circumstances, an index entry will become disposed at an index tree level higher than the hierarchical level to which it corresponds, i.e. is promoted. Whilst this makes the index tree unbalanced, it facilitates the addition of information to and the searching of such a database.
摘要:
Computerized tools for modeling database designs and specifying queries of the data contained therein. Once it is determined that an information system needs to be created, the Fact Compiler of the present invention is invoked to create it. After creating the information system, the user creates a fact-tree as a prelude to generating queries to the system. After creating the fact-tree, the user verifies that it is correct using the Tree Interpreter of the present invention. Once the fact tree has been verified, the Query Mapper of the present invention is used to generate information system queries.
摘要:
Computerized tools for modeling database designs and specifying queries of the data contained therein. Once it is determined that an information system needs to be created, the Fact Compiler of the present invention is invoked to create it. After creating the information system, the user creates a fact-tree as a prelude to generating queries to the system. After creating the fact-tree, the user verifies that it is correct using the Tree Interpreter of the present invention. Once the fact tree has been verified, the Query Mapper of the present invention is used to generate information system queries.
摘要:
A system that implements a scalable data storage service may maintain tables in a non-relational data store on behalf of clients. The system may provide a Web services interface through which service requests are received, and an API usable to request that a table be created, deleted, or described; that an item be stored, retrieved, deleted, or its attributes modified; or that a table be queried (or scanned) with filtered items and/or their attributes returned. An asynchronous workflow may be invoked to create or delete a table. Items stored in tables may be partitioned and indexed using a simple or composite primary key. The system may not impose pre-defined limits on table size, and may employ a flexible schema. The service may provide a best-effort or committed throughput model. The system may automatically scale and/or re-partition tables in response to detecting workload changes, node failures, or other conditions or anomalies.
摘要:
Methods, systems, and computer program products are provided for generating application-aware data partitioning to support parallel computing. A label for a user defined data partitioning (UDP) key is generated by a labeling process to configure data partitions of original data. The UDP is labeled by the labeling process to include at least one key property excluded from the original data. The data partitions are evenly distributed to co-locate and balance the data partitions and corresponding computations performed by computational servers. A data record of the data partitions is retrieved by performing an all-node parallel search of the computational servers using the UDP key.
摘要:
A method and system is provided for maintaining customer data in a data store system utilizing a scalable partitioning framework. More specifically, the data store of a customer service system is partitioned (divided) into multiple partitions by a partitionable key of customer data so that each partition owns a subset of the customer data. By maintaining several partitions that store a subset of the customer data exclusive to each other, blackout or brownout problems can be local to one partition and, thus, the availability for the entire system will be increased. Moreover, a set of partitionable keys, a minimal unit to be moved between partitions, is grouped and associated with a partition. By eliminating direct dependencies between the partitions and the partitionable keys, the system can have great flexibility with regard to migrating customer data between partitions and adding a new partition.
摘要:
A system that implements a scaleable data storage service may maintain tables in a non-relational data store on behalf of service clients. Each table may include multiple items. Each item may include one or more attributes, each containing a name-value pair. The system may provide an API through which clients can query tables maintained by the service. Items may be partitioned and indexed in a table according to a simple or composite primary key contained in all items in the table. A composite primary key may include a hash key attribute, and a range key attribute. The range key attribute may be usable to order items having the same hash key attribute value, and to partition them dependent on a range of range key attribute values. A query request may specify a logical or mathematical expression dependent on range key attribute values and may be directed to multiple partitions.