Abstract:
A system and method is described for database split generation in a massively parallel or distributed database environment including a plurality of databases and a data warehouse layer providing data summarization and querying functionality. A database table accessor of the system obtains, from an associated client application, a query for data in a table of the data warehouse layer, wherein the query includes a user preference. The system obtains table data representative of properties of the table, and determines a splits generator in accordance with one or more of the user preference or the properties of the table. The system generates, by the selected splits generator, table splits dividing the user query into a plurality of query splits, and outputs the plurality of query splits to an associated plurality of mappers for execution by the associated plurality of mappers of each of the plurality of query splits against the table.
Abstract:
A system and method is described for database split generation in a massively parallel or distributed database environment including a plurality of databases and a data warehouse layer providing data summarization and querying functionality. A database table accessor of the system obtains, from an associated client application, a query for data in a table of the data warehouse layer, wherein the query includes a user preference. The system obtains table data representative of properties of the table, and determines a splits generator in accordance with one or more of the user preference or the properties of the table. The system generates, by the selected splits generator, table splits dividing the user query into a plurality of query splits, and outputs the plurality of query splits to an associated plurality of mappers for execution by the associated plurality of mappers of each of the plurality of query splits against the table.
Abstract:
A system and method for marshaling database data from a native interface layer, to a Java layer, using a linear array. In accordance with an embodiment, a request is received from a software application to query or access data stored at the database. At a database driver native interface layer, the system obtains cell data from the database, determines cell coordinates and a cell metadata, and linearizes the cell data if required. The linearized data is then flushed to a linear byte array in the database driver presentation layer, and the cell coordinates and cell metadata are provided for use by a compact data handler and the application in accessing the data.
Abstract:
In accordance with an embodiment, the system enables access to a sharded database using a cache and a shard topology. A shard-aware client application connecting to a sharded database can use a connection pool (e.g., a Universal Connection Pool, UCP), to store or access connections to different shards or chunks of the sharded database within a shared pool. As new connections are created, a shard topology layer can be built at the database driver layer, which learns and caches shard key ranges to locations of shards. The shard topology layer enables subsequent connection requests from a client application to use a fast key path access to the appropriate shard or chunk.
Abstract:
A system and method for efficient storage and retrieval of fragmented data using a pseudo linear dynamic byte array is provided. In accordance with an embodiment, the system comprises a database driver which provides access by a software application to a database. The database driver uses a dynamic byte array to enable access by the application to data in the database, including determining a size of a required data to be stored in memory, and successively allocating and copying the required data into the dynamic byte array as a succession of blocks. The data stored within the succession of blocks can then be accessed and provided to the application.