Training a learning system with arbitrary cost functions
    21.
    发明申请
    Training a learning system with arbitrary cost functions 有权
    培训具有任意成本功能的学习系统

    公开(公告)号:US20070094171A1

    公开(公告)日:2007-04-26

    申请号:US11305395

    申请日:2005-12-16

    IPC分类号: G06F15/18

    CPC分类号: G06N3/08

    摘要: The subject disclosure pertains to systems and methods for training machine learning systems. Many cost functions are not smooth or differentiable and cannot easily be used during training of a machine learning system. The machine learning system can include a set of estimated gradients based at least in part upon the ranked or sorted results generated by the learning system. The estimated gradients can be selected to reflect the requirements of a cost function and utilized instead of the cost function to determine or modify the parameters of the learning system during training of the learning system.

    摘要翻译: 本发明涉及用于训练机器学习系统的系统和方法。 许多成本函数不平滑或可微分,并且在机器学习系统的训练期间不能轻易地使用。 机器学习系统可以至少部分地基于学习系统产生的排名或排序结果来包括一组估计梯度。 可以选择估计的梯度来反映成本函数的要求,而不是使用成本函数来确定或修改在学习系统的训练期间学习系统的参数。

    Leveraging unlabeled data with a probabilistic graphical model
    22.
    发明申请
    Leveraging unlabeled data with a probabilistic graphical model 有权
    利用概率图形模型利用未标记的数据

    公开(公告)号:US20070005341A1

    公开(公告)日:2007-01-04

    申请号:US11170989

    申请日:2005-06-30

    IPC分类号: G06F17/27

    CPC分类号: G06F17/3071

    摘要: A general probabilistic formulation referred to as ‘Conditional Harmonic Mixing’ is provided, in which links between classification nodes are directed, a conditional probability matrix is associated with each link, and where the numbers of classes can vary from node to node. A posterior class probability at each node is updated by minimizing a divergence between its distribution and that predicted by its neighbors. For arbitrary graphs, as long as each unlabeled point is reachable from at least one training point, a solution generally always exists, is unique, and can be found by solving a sparse linear system iteratively. In one aspect, an automated data classification system is provided. The system includes a data set having at least one labeled category node in the data set. A semi-supervised learning component employs directed arcs to determine the label of at least one other unlabeled category node in the data set.

    摘要翻译: 提供了称为“条件谐波混合”的一般概率公式,其中分类节点之间的链接被引导,条件概率矩阵与每个链路相关联,并且类的数量可以在节点之间变化。 通过最小化其分布与其邻居预测的分布之间的差异来更新每个节点处的后级概率。 对于任意图,只要每个未标记的点从至少一个训练点到达,则通常总是存在的解是唯一的,并且可以通过迭代地求解稀疏线性系统来找到。 一方面,提供了一种自动数据分类系统。 该系统包括在数据集中具有至少一个标记类别节点的数据集。 半监督学习组件使用有向弧来确定数据集中至少一个其他未标记类别节点的标签。

    SYSTEM AND METHOD PROVIDING AUTOMATED MARGIN TREE ANALYSIS AND PROCESSING OF SAMPLED DATA
    23.
    发明申请
    SYSTEM AND METHOD PROVIDING AUTOMATED MARGIN TREE ANALYSIS AND PROCESSING OF SAMPLED DATA 失效
    提供自动化树木分析和采样数据处理的系统和方法

    公开(公告)号:US20060271512A1

    公开(公告)日:2006-11-30

    申请号:US11462932

    申请日:2006-08-07

    IPC分类号: G06F17/30

    摘要: The present invention relates to a system and methodology to facilitate database processing in accordance with a plurality of various applications. In one aspect, a large database of objects is processed, wherein the objects can be represented as points in a vector space, and two or more objects are deemed ‘close’ if a Euclidean distance between the points is small. This can apply for substantially any type of object, provided a suitable distance measure can be defined. In another aspect, a ‘test’ object having a vector x, is processed to determine if there exists an object y in the database such that the distance between x and y falls below a threshold t. If several objects in the database satisfy this criteria, a list of objects can be returned, together with their corresponding distances. If no objects were to satisfy the criterion, an indication of this condition can also be provided, but in addition, the condition or information relating to the condition can be provided.

    摘要翻译: 本发明涉及一种根据多种不同应用促进数据库处理的系统和方法。 在一个方面,处理对象的大数据库,其中对象可以被表示为向量空间中的点,并且如果点之间的欧几里得距离小,则两个或更多个对象被认为是“接近”的。 这可以适用于基本上任何类型的物体,只要可以定义适当的距离度量。 在另一方面,处理具有向量x的“测试”对象,以确定数据库中是否存在对象y,使得x和y之间的距离低于阈值t。 如果数据库中的几个对象满足此条件,则可以返回对象列表及其对应的距离。 如果没有物体满足标准,也可以提供该条件的指示,但是还可以提供与条件相关的条件或信息。