摘要:
An apparatus and a method for constructing a multidimensional index tree which minimizes the time to access data objects and is resilient to the skewness of the data. This is achieved through successive partitioning of all given data objects by considering one level at a time starting with one partition and using a top-down approach until each final partition can fit within a leaf node. Subdividing the data objects is via a global optimization approach to minimize the area overlap and perimeter of the minimum bounding rectangles covered by each node. The current invention divides the index construction problem into two subproblems: the first one addresses the tightness of the packing (in terms of area, overlap and perimeter) using a small fan out at each index node and the other one handles the fan out issue to improve index page utilization. These two stages are referred to as binarization and compression. The binarization stage constructs a binary tree such that the entries in the leaf nodes correspond to the spatial data objects. The compression stage converts the binary tree into a tree for which all but the leaf nodes and the parent nodes of all leaf nodes have branch factors of M. In the binarization stage, a weighting or skew factor is used to achieve flexibility in determining the number of data objects to be included in each of the partitions to obtain a tree structure with desirable query performance. Thus the index tree constructed is not required to be height balanced. This provides a means to trade-off imbalance in the index tree in order to reduce the number of pages which need to be accessed in a query.
摘要:
A method for dynamically placing objects in slots on a web page in response to a current client request for the web page comprises the steps of classifying users into user groups based one or more user-characteristics, accumulating self-learning data based on user click behavior for each user group, matching the current client request with a corresponding user group and scheduling real-time selection of the slots for the objects on the web page based on the self-learning data of the corresponding user group.
摘要:
A system and method for caching objects of non-uniform size. A caching logic includes a selection logic and an admission control logic. The admission control logic determines whether an object not currently in the cache is accessed may be cached at all. The admission control logic uses an auxiliary LRU stack which contains the identities and time stamps of the objects which have been recently accessed. Thus, the memory required is relatively small. The auxiliary cache serves as a dynamic popularity list and an object may be admitted to the cache if and only if it appears on the popularity list. The selection logic selects one or more of the objects in the cache which have to be purged when a new object enters the cache. The order of removal of the objects is prioritized based both on the size as well as the frequency of access of the object and may be adjusted by a time to obsolescence factor (TTO). To reduce the time required to compare the space-time product of each object in the cache, the objects may be classified in ranges having geometrically increasing intervals. Specifically, multiple LRU stacks are maintained independently wherein each LRU stack contains only objects in a predetermined range of sizes. In order to choose candidates for replacement, only the least recently used objects in each group need be considered.
摘要:
A computer method of removing simple and strict redundant association rules generated from large collections of data. A compact set of rules is presented to an end user which is devoid of many redundancies in the discovery of data patterns. The method is directed primarily to on-line applications such as the Internet and Intranet. Given a number of large itemsets as input, simple redundancies are removed by generating all maximal ancestors, the frontier set, for each large itemset. The set of maximal ancestors share a hierarchical relationship with the large itemset from which they were derived and further satisfy an inequality whereby the ratio of respective support values is less than the reciprocal of some user defined confidence value.The resulting compact rule set is displayed to an end user at some specified level of support and confidence. The method is also able to generate the full set of rules from the compact set.
摘要:
A method and system of collaboratively caching information to allow improved caching decisions by a lower level or sibling node. In a caching hierarchy, the client and/or servers may factor in the caching status at the higher level in deciding whether to cache an object and which objects are to be replaced. The PICS protocol may be used to pass the caching information of some or all the upper hierarchy down the hierarchy. Furthermore, the caching status information can also be used to direct the object request to the closest higher level proxy which has potentially cached the object, instead of blindly requesting it from the next immediate higher level proxy. A selection policy used to select objects for replacement in the cache may be prioritized not only on the size and the frequency of access of the object, but also on the access time required to get the object if it is not cached. The selection policy may also include a selection weight factor wherein each object is assigned a selection weight based on its replacement cost, the object size and how frequently it is modified. Non-uniform size objects may be classified in ranges of selection weights having geometrically increasing intervals. Multiple LRU stacks may be independently maintained wherein each stack contains objects in a certain range of selection weights. In order to choose candidates for replacement, only the least recently used objects in each group need be considered.
摘要:
A system and method for providing similarity indexing and searching in multi-dimensional databases. In one aspect, given a set of data points in a multidimensional space, the values of the data points on each dimension are partitioned into a plurality of grids, wherein each grid is assigned a grid value. Given a target data point, similarity candidates (i.e., data points that are similar to the target data point) are identified based on matching grid values. An inverted grid index comprising an index on the data points falling into each grid of each dimension is utilized to identify similarity candidates. A similarity selection process is employed to select the closest identified similarity candidates for output, which utilizes a similarity function to measure the closeness of each identified similarity candidate to the target data point. A preferred similarity function is one that considers a subset of the dimensions in which a point falls within a similar grid of the target point. In addition, a correlation effect among the grids in different dimensions may be a factor captured in the similarity function.
摘要:
The present invention relates to the problem of scheduling work for employees and/or other resources in a help desk or similar environment. The employees have different levels of training and availabilities. The jobs, which occur as a result of dynamically occurring events, consist of multiple tasks ordered by chain precedence. Each job and/or task carries with it a penalty which is a step function of the time taken to complete it, the deadlines and penalties having been negotiated as part of one or more service level agreement contracts. The goal is to minimize the total amount of penalties paid. The invention consists of a pair of heuristic schemes for this difficult scheduling problem, one greedy and one randomized. The greedy scheme is used to provide a quick initial solution, while the greedy and randomized schemes are combined in order to think more deeply about particular problem instances. The invention also includes a scheme for determining how much time to allocate to thinking about each of several potential problem instance variants.
摘要:
A storage system includes a plurality of data vats, and a processor including an optimizing unit that optimizes a value of data stored in the storage system. The optimizing unit optimizes the value by computing and implementing an optimal decision for allocating new data to a first data vat of the plurality of data vats, moving existing data from at least a second data vat of the plurality of data vats to the first data vat, and deleting existing data from the first data vat, based on an amount of data in each of the plurality of data vats.
摘要:
A storage system includes a plurality of data vats, and a processor including an optimizing unit that optimizes a value of data stored in the storage system. The optimizing unit optimizes the value by computing and implementing an optimal decision for allocating new data to a first data vat of the plurality of data vats, moving existing data from at least a second data vat of the plurality of data vats to the first data vat, and deleting existing data from the first data vat, based on an amount of data in each of the plurality of data vats.
摘要:
A method (and system) of storing data in a value-based storage system, includes optimizing a value of data stored in the value-based storage system.