摘要:
Method, system, and programs for distributed machine learning on a cluster including a plurality of nodes are disclosed. A machine learning process is performed in each of the plurality of nodes based on a respective subset of training data to calculate a local parameter. The training data is partitioned over the plurality of nodes. A plurality of operation nodes are determined from the plurality of nodes based on a status of the machine learning process performed in each of the plurality of nodes. The plurality of operation nodes are connected to form a network topology. An aggregated parameter is generated by merging local parameters calculated in each of the plurality of operation nodes in accordance with the network topology.
摘要:
Methods, systems, and apparatuses for generating relevance functions for ranking documents obtained in searches are provided. One or more features to be used as predictor variables in the construction of a relevance function are determined. The relevance function is parameterized by one or more coefficients. An ideal query error is defined that measures, for a given query, a difference between a ranking generated by the relevance function and a ranking based on a training set. According to a structured output learning framework, values for the coefficients of the relevance function are determined to substantially minimize an objective function that depends on a continuous upper bound of the defined ideal query error. The query error is determined using a structured output learning technique. The query error is defined as a maximum over a set of permutations.
摘要:
Methods, systems, and apparatuses for generating relevance functions for ranking documents obtained in searches are provided. One or more features to be used as predictor variables in the construction of a relevance function are determined. The relevance function is parameterized by one or more coefficients. A query error is defined that measures a difference between a relevance ranking generated by the relevance function and a training set relevance ranking based on a query and a set of scored documents associated with the query. The query error is a continuous function of the coefficients and aims at approximating errors measures commonly used in Information Retrieval. Values for the coefficients of the relevance function are determined that substantially minimize an objective function that depends on the defined query error.
摘要:
Support vector machines are used to classify data contained within a structured dataset such as a plurality of signals generated by a spectral analyzer. The signals are pre-processed to ensure alignment of peaks across the spectra. Similarity measures are constructed to provide a basis for comparison of pairs of samples of the signal. A support vector machine is trained to discriminate between different classes of the samples. to identify the most predictive features within the spectra. In a preferred embodiment feature selection is performed to reduce the number of features that must be considered.
摘要:
Support vector machines are used to classify data contained within a structured dataset such as a plurality of signals generated by a spectral analyzer. The signals are pre-processed to ensure alignment of peaks across the spectra. Similarity measures are constructed to provide a basis for comparison of pairs of samples of the signal. A support vector machine is trained to discriminate between different classes of the samples. to identify the most predictive features within the spectra. In a preferred embodiment feature selection is performed to reduce the number of features that must be considered.