摘要:
Systems and methods for object classification are provided. An object is identified along with the attributes that describe that object. These attributes are grouped into attribute patterns. Classes to be used in the classification are also identified. For each identified class a sketch table containing a plurality of parallel hash tables is created and trained using known objects with known classifications. For the object to be classified, each attribute pattern is processed using the all of the hash functions for each sketch table. This results in a plurality of values under each sketch table for a single attribute pattern. The lowest value is selected for each sketch table. The distribution of values across all sketch tables is evaluated for each attribute pattern. This produces a discriminatory power for each attribute pattern. Those attribute patterns having a discriminatory power above a given threshold are selected. The selected attribute patterns and associated sketch table values are added. The sketch table with the largest overall sum is identified, and the class associated with that sketch table is assigned to the object to which the attribute patterns belong.
摘要:
The present invention derives product characterizations for products offered at an e-commerce site based on the text descriptions of the products provided at the site. A customer characterization is generated for any customer browsing the e-commerce site. The characterizations include an aggregation of derived product characterizations associated with products bought and/or browsed by that customer. A peer group is formed by clustering customers having similar customer characterizations. Recommendations are then made to a customer based on the processed characterization and peer group data.
摘要:
An object and attributes that describe that object are identified. The attributes are grouped into attribute patterns, and classification classes are identified. For each identified class a sketch table containing a plurality of parallel hash tables is created. For the object to be classified, each attribute pattern is processed using the all of the hash functions for each sketch table, resulting in a plurality of values under each sketch table for a single attribute pattern. The lowest value is selected for each sketch table. The distribution of values across all sketch tables is evaluated for each attribute pattern, producing a discriminatory power for each attribute pattern. Attribute patterns having a discriminatory power above a given threshold are selected and added to the associated sketch table values. The sketch table with the largest overall sum is identified, and the associated class is assigned to the object belonging to the attribute patterns.
摘要:
Novel methods and systems for the privacy preserving mining of string data with the use of simple template based models. Such template based models are effective in practice, and preserve important statistical characteristics of the strings such as intra-record distances. Discussed herein is the condensation model for anonymization of string data. Summary statistics are created for groups of strings, and use these statistics are used to generate pseudo-strings. It will be seen that the aggregate behavior of a new set of strings maintains key characteristics such as composition, the order of the intra-string distances, and the accuracy of data mining algorithms such as classification. The preservation of intra-string distances is a key goal in many string and biological applications which are deeply dependent upon the computation of such distances, while it can be shown that the accuracy of applications such as classification are not affected by the anonymization process.
摘要:
A method for providing product recommendations to customers in an e-commerce environment includes the step of generating content and compatibility representations of products corresponding to a plurality of customers. A similarity function is calculated between pairs of content attributes corresponding to the products. A similarity function is calculated between pairs of compatibility attributes corresponding to the products. The plurality of customers are clustered into a plurality of peer groups. For a given customer, a closest peer group of the plurality of peer groups is determined. At least one potential recommendation is then generated for the given customer based on the closest peer group.
摘要:
A system and method for resource adaptive classification of data streams. Embodiments of systems and methods provide classifying data received in a computer, including discretizing the received data, constructing an intermediate data structure from said received data as training instances, performing subspace sampling on said received data as test instances and adaptively classifying said received data based on statistics of said subspace sampling.
摘要:
Novel methods and systems for the privacy preserving mining of string data with the use of simple template based models. Such template based models are effective in practice, and preserve important statistical characteristics of the strings such as intra-record distances. Discussed herein is the condensation model for anonymization of string data. Summary statistics are created for groups of strings, and use these statistics are used to generate pseudo-strings. It will be seen that the aggregate behavior of a new set of strings maintains key characteristics such as composition, the order of the intra-string distances, and the accuracy of data mining algorithms such as classification. The preservation of intra-string distances is a key goal in many string and biological applications which are deeply dependent upon the computation of such distances, while it can be shown that the accuracy of applications such as classification are not affected by the anonymization process.
摘要:
In accordance with the present invention, a method for selecting a channel and delivery time for digital objects for a broadcast delivery service including multiple channels of varying bandwidths includes the steps of selecting digital objects to be sent over the multiple channels, generating a schedule and pricing for the digital objects based on the digital object selected and existing delivery commitments and manipulating the schedule and pricing to provide a profitable delivery of the digital objects. A system is also included.
摘要:
A method for scheduling delivery of digital objects over a network, in accordance with the invention, includes the steps of providing a user interface for selecting objects to be transmitted thereto, selecting at least one object to be transmitted to the user interface, identifying and receiving in-progress object transmissions corresponding to the at least one selected object, identifying portions of the at least one object not yet received to request transmission of the portions of the at least one object not yet received and receiving remaining portions of the at least one object during additional in-progress transmissions. A system is also included.
摘要:
Systems and methods for embedding metadata such as personal patient information within actual medical data signals obtained from a patient are provided wherein two watermarks, a robust watermark and a fragile watermark are embedded in a given medical data signal. The robust watermark includes a binary coded representation of the metadata that is incorporated into the frequency domain of the medical data signal using discrete Fourier transformations and additive embedding. Error correcting code can also be added to the binary representation of the metadata using Hamming coding. A given robust watermark can be incorporated multiple times in the medical data signal. The fragile watermark is added on top of the modified medical signal containing the robust watermark in the spatial domain of the modified medical signal. The fragile watermark utilizes hash function to generate random sequences that are incorporated through the medical data signal.