Abstract:
A distributed data clustering system having an integrator and at least two computing units. Each computing unit is loaded with common global parameter values and a particular local data set. Each computing unit then generates local sufficient statistics based on the local data set and global parameter values. The integrator employs the local sufficient statistics of all the computing units to update the global parameter values.
Abstract:
Generating masks for de-duplication in a database where distributed entities provide activity data for said database. Determining from activity input data which entities add variable data to a given data field. Generating a list of the masks which effectively remove the variable data portion in the field. Consolidating input data using the generated masks.
Abstract:
A process for rapid data recovery, data cleaning and an automated self-maintenance of the data recovery mechanism is provided. Dirty input data records are used in conjunction with and to build and revise a fast indexing table wherein index keys point to clean data records with which the input data should be rightly associated. Mechanisms for automated revision of the indexing table are described. Said table forms a tool useful in data mining and knowledge discovery to analysis of heuristic processes.
Abstract:
Provided are, among other things, systems, methods and techniques for document-based processing. In one implementation, a document is input; features are extracted from it; an index is queried using at least a subset of the extracted features and, in response, identifications for selected document classifiers are received from a larger pool of document classifiers; the document is processed using individual ones of the selected document classifiers, thereby generating corresponding classifier outputs; and then, based on such classifier outputs, (1) the document is categorized within a computer database and/or (2) feedback information is provided to a user.
Abstract:
A computing system including a component to perform a function and generates a noise. A microphone to receive an input including the noise. The computing system can monitor a component for an event that produces a noise.
Abstract:
Provided are, among other things, systems, methods and techniques for classifying a collection of documents. A term is identified based on an indication of ability of the term's presence within a given document to predict whether the given document should be classified into an identified category. A document index is then queried using the identified term and, in response, search results that define a candidate set of documents are received. Finally, a classifier is applied to documents within the candidate set to determine which of the documents should be classified into the identified category.
Abstract:
An exemplary embodiment of the present invention provides a computer implemented method of developing a classifier. The method includes obtaining a set of training data comprising labeled cases. The method also includes training a classifier based, at least in part, on the training data. The method also includes applying the classifier to a plurality of unlabeled cases to generate classification scores for each of the unlabeled cases, wherein each classification score corresponds with an instance of a corresponding case. Furthermore, the classification score corresponding to a first instance in a case is computed based, at least in part, on a value of a case-centric feature corresponding to the first instance, wherein the value of the case-centric feature is based, at least in part, on characteristics of the first instance and a second instance in the case.
Abstract:
A database includes a list of members of a first group, a list of members of a second group, and ratings for at least some of the members of the second group. The database is accessed. The ratings are attributed to the members of the first group. A machine learning training set is built for a particular member of the first group. The training set includes class labels corresponding to the particular member's ratings for the members of the second group, and features that include supplied and predicted ratings from at least a subset of processed members of the first group. A predictor for the particular member of the first group is trained based on the machine learning training set. The predictor corresponding to the particular member is used to generate predicted ratings for one or more members of the second group the particular member has not rated.
Abstract:
Provided are, among other things, systems, methods and techniques for document-based processing. In one implementation, a document is input; features are extracted from it; an index is queried using at least a subset of the extracted features and, in response, identifications for selected document classifiers are received from a larger pool of document classifiers; the document is processed using individual ones of the selected document classifiers, thereby generating corresponding classifier outputs; and then, based on such classifier outputs, (1) the document is categorized within a computer database and/or (2) feedback information is provided to a user.
Abstract:
In a method for assessing a plurality of electronic devices, cooling efficiencies for the plurality of electronic devices are calculated, where the cooling efficiencies comprise measures of energy usage requirements to respectively maintain the plurality of electronic devices within predetermined temperature ranges. In addition, the plurality of electronic devices are ranked according to their cooling efficiencies and the plurality of electronic devices are stored according to their rankings.