摘要:
Techniques for similarity searching are provided. Structural data in a database is searched against one or more structural queries. A desired minimum degree of similarity between the one or more queries and the structural data in the database is first specified. One or more indices are then used to exclude from consideration any structural data in the database that does not share the minimum degree of similarity with one or more of the queries.
摘要:
Systems and methods for parallel stream item counting are disclosed. A data stream is partitioned into portions and the portions are assigned to a plurality of processing cores. A sequential kernel is executed at each processing core to compute a local count for items in an assigned portion of the data stream for that processing core. The counts are aggregated for all the processing cores to determine a final count for the items in the data stream. A frequency-aware counting method (FCM) for data streams includes dynamically capturing relative frequency phases of items from a data stream and placing the items in a sketch structure using a plurality of hash functions where a number of hash functions is based on the frequency phase of the item. A zero-frequency table is provided to reduce errors due to absent items.
摘要:
Methods and apparatus generate an index for use in a document retrieval system where the index is organized by type and keyword. Redundancy in the index is reduced by organizing type entries in a hierarchy of internal and leaf nodes. Determining whether to generate an inverted list for a type is based on the position of the type in the hierarchy; generally inverted lists are generated only for types corresponding to leaf nodes. Redundancy is further reduced by re-using inverted lists generated for keywords for types when there is an overlap between keywords and types. Search performance using the document retrieval index is improved by adding entries corresponding to combinations of keywords and types. The intersections of inverted lists associated with the keywords and types comprising the combinations are determined and added to the index for use in search operations. Determining whether to add an entry for a keyword-type combination is made on a cost-benefit analysis dependent, at least in part, on the proximity of the keyword to type in documents containing the combination.
摘要:
Systems and methods are provided for real-time classification of streaming data. In particular, systems and methods for real-time classification of continuous data streams implement micro-clustering methods for offline and online processing of training data to build and dynamically update training models that are used for classification, as well as incrementally clustering the data over contiguous segments of a continuous data stream (in real-time) into a plurality of micro-clusters from which target profiles are constructed which define/model the behavior of the data in individual segments of the data stream.
摘要:
A method (and system) of assigning a sales opportunity, includes creating an assignment model based on clustering historical sales opportunities, and providing a scoring mechanism on a plurality of sales agents for automatically optimizing an assignment of at least one sales opportunity to at least one of the plurality of sales agents.
摘要:
Systems and methods are provided for resource adaptive workload management. In a method thereof, at least one execution objective is received for at least one of a plurality of queries under execution. A progress status of, and an amount of resource consumed by, each of the plurality of queries are monitored. A remaining resource requirement for each of the plurality of queries is estimated, based on the progress status of, and the amount of resource consumed by, each of the plurality of queries. Resource allocation is adjusted based on the at least one execution objective and the estimates of the remaining resource requirements.
摘要:
A method (and system) of assigning a sales opportunity, includes creating an assignment model based on clustering historical sales opportunities, and providing a scoring mechanism on a plurality of sales agents for automatically optimizing an assignment of at least one sales opportunity to at least one of the plurality of sales agents.
摘要:
Interoperability is enabled between participants in a network by determining values associated with a value metric defined for at least a portion of the network. Information flow is directed between two or more of the participants based at least in part on semantic models corresponding to the participants and on the values associated with the value metric. The semantic models may define interactions between the participants and define at least a portion of information produced or consumed by the participants. The determination of the values and the direction of the information flow may be performed multiple times in order to modify the one or more value metrics. The direction of information flow may allow participants to be deleted from the network, may allow participants to be added to the network, or may allow behavior of the participants to be modified.
摘要:
Systems and methods for parallel stream item counting are disclosed. A data stream is partitioned into portions and the portions are assigned to a plurality of processing cores. A sequential kernel is executed at each processing core to compute a local count for items in an assigned portion of the data stream for that processing core. The counts are aggregated for all the processing cores to determine a final count for the items in the data stream. A frequency-aware counting method (FCM) for data streams includes dynamically capturing relative frequency phases of items from a data stream and placing the items in a sketch structure using a plurality of hash functions where a number of hash functions is based on the frequency phase of the item. A zero-frequency table is provided to reduce errors due to absent items.
摘要:
Interoperability is enabled between participants in a network by determining values associated with a value metric defined for at least a portion of the network. Information flow is directed between two or more of the participants based at least in part on semantic models corresponding to the participants and on the values associated with the value metric. The semantic models may define interactions between the participants and define at least a portion of information produced or consumed by the participants. The determination of the values and the direction of the information flow may be performed multiple times in order to modify the one or more value metrics. The direction of information flow may allow participants to be deleted from the network, may allow participants to be added to the network, or may allow behavior of the participants to be modified.