摘要:
A business nonstandard character processing apparatus includes a plurality of work processing parts for carrying out processes using nonstandard character data registered in a system nonstandard character file, one or a plurality of nonstandard character files provided correspondence with work identification information, and a nonstandard character registration processing part registering as the nonstandard character file a nonstandard character file provided in correspondence with a specified work identification information.
摘要:
A technique for enhanced data mining of large scale relational databases is described. User defined functions (UDF) are created by a user and distributed by a managing node of a database to each node of the database. Upon the issuance of a prespecified SQL command, the UDF is executed by each node relative to the data controlled by each node. Specifically, targeted tuples in the data controlled by each node are scored based on criteria contained in the UDF. A new data field is added to each target tuple, and the score is placed therein. The score is then used to determine whether, for example, a customer which is represented by the tuple should be included in an advertising campaign or to tailor a mailing to a customer based on the score contained in the new data field.
摘要:
A system, method, and article of manufacture for supporting summary tables in a distributed database environment is disclosed. The system generally comprises a central program and a plurality of remote database systems that may be heterogeneous. The central program is configured to communicate with the database systems and to support summary tables (also referred to as materialized views) within the central program or within one or more of the database systems. The summary tables may contain summary data from one or more of the database systems. The central program may initiate the generation of summary tables, which may be populated local to the central program or local to one or more of the database systems. The central program may also maintain or coordinate maintenance of the summary tables. In addition, the central program may be configured to receive user queries on one or more of the database systems and to generate optimized query plans based upon the user queries, considering in so doing, the summary tables.
摘要:
A reference table, which may not be stored, has columns associated with data attributes and rows containing related words assigned to those attributes in a collection of data. The stored data include at least one macroword thesaurus associated with an attribute and with a prefix length shorter than a word length of said attribute, and reference table row identifier lists respectively associated with thesaurus entries. Each macroword thesaurus associated with an attribute and with a prefix length has a respective entry for each prefix value having this prefix length and matching a corresponding prefix of at least one word assigned to this data attribute in the collection of data.
摘要:
A system for generating a value for a first attribute includes a database having one or more dimensions that each include one or more members. The database includes one or more storage locations that are each associated with one member from each dimension in a set of one or more of the dimensions. A server evaluates an expression including at least one second attribute that depends on a set of one or more of the dimensions, the expression mapping at least one member of a first dimension on which the first attribute depends to at least one member of a second dimension on which the second attribute depends. The value for the first attribute is generated according to the expression. The server and database may operate in an on-line analytical processing (OLAP) environment.
摘要:
A system for tracking the lineage of data in a database. Data within the tables are tracked by attaching lineage information to the data, preferably, by adding a lineage identifier to each row in a table. Data that share a common lineage can be identified by virtue of sharing a common lineage identifier. The lineage identifier can then be used to trace the source of the data, i.e., data having a common identifier share a common history. Preferably, the lineage data type is an identifier that is universally unique and is optimized to provide little impact on the performance of the database. For example, by providing a sufficient size identifier to ensure its uniqueness while minimizing storage size. More preferably, the data lineage data type is a sixteen-byte number.