摘要:
The subject disclosure pertains to systems and methods for training machine learning systems. Many cost functions are not smooth or differentiable and cannot easily be used during training of a machine learning system. The machine learning system can include a set of estimated gradients based at least in part upon the ranked or sorted results generated by the learning system. The estimated gradients can be selected to reflect the requirements of a cost function and utilized instead of the cost function to determine or modify the parameters of the learning system during training of the learning system.
摘要:
A general probabilistic formulation referred to as ‘Conditional Harmonic Mixing’ is provided, in which links between classification nodes are directed, a conditional probability matrix is associated with each link, and where the numbers of classes can vary from node to node. A posterior class probability at each node is updated by minimizing a divergence between its distribution and that predicted by its neighbors. For arbitrary graphs, as long as each unlabeled point is reachable from at least one training point, a solution generally always exists, is unique, and can be found by solving a sparse linear system iteratively. In one aspect, an automated data classification system is provided. The system includes a data set having at least one labeled category node in the data set. A semi-supervised learning component employs directed arcs to determine the label of at least one other unlabeled category node in the data set.
摘要:
The present invention relates to a system and methodology to facilitate database processing in accordance with a plurality of various applications. In one aspect, a large database of objects is processed, wherein the objects can be represented as points in a vector space, and two or more objects are deemed ‘close’ if a Euclidean distance between the points is small. This can apply for substantially any type of object, provided a suitable distance measure can be defined. In another aspect, a ‘test’ object having a vector x, is processed to determine if there exists an object y in the database such that the distance between x and y falls below a threshold t. If several objects in the database satisfy this criteria, a list of objects can be returned, together with their corresponding distances. If no objects were to satisfy the criterion, an indication of this condition can also be provided, but in addition, the condition or information relating to the condition can be provided.