摘要:
Techniques for classifying structural data with skewed distribution are disclosed. By way of example, a method classifying structural input data comprises a computer system performing the following steps. Multiple classifiers are constructed, wherein each classifier is constructed on a subset of training data, using one or more selected composite features from the subset of training data. A consensus among the multiple classifiers is computed in accordance with a voting scheme such that at least a portion of the structural input data is assigned to a particular class in accordance with the computed consensus. Such techniques for structured data classification are capable of handling skewed class distribution and partial feature coverage issues.
摘要:
A computer implemented method, apparatus, and computer usable program code for processing multi-way stream correlations. Stream data are received for correlation. A task is formed for continuously partitioning a multi-way stream correlation workload into smaller workload pieces. Each of the smaller workload pieces may be processed by a single host. The stream data are sent to different hosts for correlation processing.
摘要:
A method (and system) of storing data in a value-based storage system, includes optimizing a value of data stored in the value-based storage system.
摘要:
A system and method for learning models from scarce and/or skewed training data includes partitioning a data stream into a sequence of time windows. A most likely current class distribution to classify portions of the data stream is determined based on observing training data in a current time window and based on concept drift probability patterns using historical information.
摘要:
There are provided a system and method for resource adaptive, real-time new event detection. The method includes capturing, from among documents in a document streaming environment, first-story documents that mention previously unmentioned events. The method further includes dynamically adjusting a thoroughness of the capturing step by controlling a number of keywords and documents considered according to a real-time system load.
摘要:
The present invention relates to a method for creating a meta-document. The method collects at least one hyperlinked document based on a seed document and cross-references the documents within the collection. Cross-referencing includes resolving an anchor and an object, and indexing the resolved anchor and object based on respective locations within a meta-document. The method organizes the collected documents and seed documents. The method also publishes the meta-document including the cross-referenced documents. Preferably, the method of collecting includes accepting the seed document having an anchor pointing to an object, and adding a document containing the object to the collection. In addition, collecting includes the step of manually modifying the collection. The meta-document is a collection of the seed document and the hyperlinked document. Further, the index is one of a footnote, an end note, a table of contents, and an appendix.
摘要:
Systems and methods for collaborative web caching among geographically distributed cache servers, particularly, latency-sensitive hashing systems and methods for collaborative web caching among geographically distributed proxy caches. Network latency delays as well as proxy load conditions are taking into consideration during hashing. As a result, requests can be hashed into geographically closer proxy caches if the load conditions permit. Otherwise, requests will be hashed into geographically distant proxy caches to better balance the load among the caches.
摘要:
System and method for generating classification using time sequences comprises inputting a set of time dependant feature variable graphs along with a set of time dependant category variable graphs; finding frequent shapes in the time dependant feature variable graphs; utilizing the frequent shapes to generate combinations of frequent shapes; generating rules relating one or more patterns of combinations of frequent shapes to a category variable; and, performing a categorization utilizing the rules generated.
摘要:
A system and method that enables a given sending user to specify a set of delivery policies and have them used for the electronic delivery of a given message, the message potentially having several heterogeneous parts (e.g., text and pictures) each of which is handled differently, and delivered to multiple heterogeneous devices (e.g., PCs, Smartphones, fax machines), and possibly to several distinct recipients. The factors with which a sender can qualify their delivery policies include: time/date, transmission cost; whether the transmission can be forwarded; receiving device capability; and network reliability, speed, and security transmission. Methods are also provided enabling a sender to specify that particular transmissions be redirected or copied, e.g., “send fax copy to my broker and my accountant.” In one embodiment, the delivery policies may be specified using PICS.
摘要:
A method and apparatus to dynamically maintain META-tag information specifying categorization and/or degree of compound documents, which are collections or hierarchy of collections of objects (possibly web pages), for efficient retrieval of leaf or intermediate objects with specific characteristics without the need to search any content of the collection. The specific characteristic and the contents of the collection can change constantly both qualitatively and quantitatively (including the insertion, deletion and update of objects). While dynamically maintaining the META-tag information, there are no inclusion restrictions on these compound documents, i.e., any collection can contain itself either directly or recursively; and all objects within a META-tagged compound document are not required to participate. The PICS protocol may be used to specify this META-tag information with both categorization and degree; to reflect the obsolescence, currency or freshness of an objects; to validate a given object using a digital signature; and to enable charging for the META-tag service. Aggregation methods are provided to enable maximization, minimization, and averaging; to limit the propagation of META-tags; and to handle the time-out of META-tag and information validity.