Abstract:
An improved system and method for writing data dependent upon multiple reads in a distributed database is provided. A client may read several data records and may then send a request to a database server to perform a transaction to write a data record dependent upon multiple data records read. A database server may receive the request specifying a transaction to write a data record dependent upon multiple data records read and may perform the transaction by latching a master data record to be written and validating the data records the write depends upon. The multiple data records upon which the write depends may be validated by verifying the multiple data records are current versions of the data records stored in the distributed database. Data intensive applications may use this transaction type in large scale distributed database systems to provide stronger consistency without significantly degrading performance and scalability.
Abstract:
An improved system and method for parallel retrieval of data from a distributed database is provided. A parallel interface may be provided for use by a cluster of client machine for parallel retrieval of partial results from parallel execution of a database query by a cluster of database servers storing a distributed database. A query interface may be augmented for inputting a database query and specifying the number of instances of parallel retrieval of results from query execution. To do so, a commercial query language may be augmented for sending a query request that may include a parameter specifying the database query and an additional parameter specifying the desired retrieval parallelism. The augmented query interface may return a list of retrieval point addresses for retrieving the partial results assigned to each of the retrieval point addresses from parallel execution of the database query.
Abstract:
Methods and apparatuses are provided for dynamically reorganizing the data within a replicated database system. One method, for example, includes performing a split operation across a plurality of replicated databases with regard to an existing partition therein, wherein the existing partition comprises a plurality of data records and the two new partitions each include at least a portion of the plurality of data records, and allowing at least one type of access to the plurality of data records during the split operation.
Abstract:
In a large-scale transaction such as the bulk loading of new records into an ordered, distributed database, a transaction limit such as an insert limit may be chosen, partitions on overfull storage servers may be designated to be moved to underfull storage servers, and the move assignments may be based, at least in part on the degree to which a storage server is underfull and the move and insertion costs of the partitions to be moved.
Abstract:
Techniques that support trail-based exploration by a user of a repository of documents are described herein. In one embodiment, trail definition data that specifies a trail is received. The trail includes an ordered series of waypoints including a trailhead, intermediate waypoints, and one or more trailends. In some embodiments, deadends may also be defined in the trial. A particular waypoint in the ordered series of waypoints is established as a current waypoint. Search terms can be received from a user to cause a search to be performed. It is then determined whether the search satisfies matching criteria associated with a waypoint that immediately follows the current waypoint in the ordered series of waypoints. If so, the user advances to the next waypoint. Otherwise, the user remains at the current waypoint. Finally, if a trailend is reached, then an action such as rewarding the user in some way may be performed.
Abstract:
Computer-implemented methods, modules and clients relate to expanded, pruned sample table for testing database queries against a base table. The expanded, pruned sample table is formed from the base table by a process of initial sampling, synthesis, and pruning.
Abstract:
A technique is described that reduces the complexity and resource consumption associated with performing record expiry in a distributed database system. In accordance with the technique, a record is checked to see if it has expired only when it has been accessed for a read or a write. If at the time of a read a record is determined to have expired, then it is not served. If at the time of a write a record is determined to have expired, then the write is treated as an insertion of a new record, and steps are taken to treat the insertion consistently with regard to the previous expired version. A background process is used to delete records that have not been written to or actively deleted by a client after expiration.
Abstract:
A novel method is employed for collecting optimizer statistics for optimizing database queries by gathering feedback from the query execution engine about the observed cardinality of predicates and constructing and maintaining multidimensional histograms. This makes use of the correlation between data columns without employing an inefficient data scan. The maximum entropy principle is used to approximate the true data distribution by a histogram distribution that is as “simple” as possible while being consistent with the observed predicate cardinalities. Changes in the underlying data are readily adapted to, automatically detecting and eliminating inconsistent feedback information in an efficient manner. The size of the histogram is controlled by retaining only the most “important” feedback.
Abstract:
A system and method for deriving user intent from a query. The system includes a query engine, and an advertisement engine. The query engine receives a query from the user. The query engine analyzes the query to determine a query intent that is matched to a domain. The query may be further analyzed to derive predicate values based on the query and the domain hierarchy. The domain and associated information may then be matched to a list of advertisements. The advertisement may be assigned an ad match score based on a correlation between the query information and various listing information provided in the advertisement.