-
公开(公告)号:US07296018B2
公开(公告)日:2007-11-13
申请号:US10749518
申请日:2004-01-02
申请人: Naoki Abe , John Langford
发明人: Naoki Abe , John Langford
IPC分类号: G06F17/00
CPC分类号: G06N99/005 , Y10S707/99936 , Y10S707/99943
摘要: Outlier detection methods and apparatus have light computational resources requirement, especially on the storage requirement, and yet achieve a state-of-the-art predictive performance. The outlier detection problem is first reduced to that of a classification learning problem, and then selective sampling based on uncertainty of prediction is applied to further reduce the amount of data required for data analysis, resulting in enhanced predictive performance. The reduction to classification essentially consists in using the unlabeled normal data as positive examples, and randomly generated synthesized examples as negative examples. Application of selective sampling makes use of an underlying, arbitrary classification learning algorithm, the data labeled by the above procedure, and proceeds iteratively. Each iteration consisting of selection of a smaller sub-sample from the input data, training of the underlying classification algorithm with the selected data, and storing the classifier output by the classification algorithm. The selection is done by essentially choosing examples that are harder to classify with the classifiers obtained in the preceding iterations. The final output hypothesis is a voting function of the classifiers obtained in the iterations of the above procedure.
摘要翻译: 异常值检测方法和装置具有较轻的计算资源需求,特别是对存储要求的要求,而且具有最先进的预测性能。 异常值检测问题首先降低到分类学习问题,然后应用基于预测不确定度的选择性抽样来进一步减少数据分析所需的数据量,从而提高预测性能。 归类分类主要在于使用未标记的正常数据作为正例,随机生成合成实例作为阴性实例。 选择性抽样的应用使用了基础的,任意的分类学习算法,由上述过程标记的数据,并且迭代地进行。 每个迭代包括从输入数据中选择较小的子样本,对所选数据训练底层分类算法,以及通过分类算法存储分类器输出。 选择是通过基本上选择难以对上述迭代中获得的分类器进行分类的示例来完成的。 最终输出假设是在上述过程的迭代中获得的分类器的投票函数。
-
公开(公告)号:US08006157B2
公开(公告)日:2011-08-23
申请号:US11863704
申请日:2007-09-28
申请人: Naoki Abe , John Langford
发明人: Naoki Abe , John Langford
CPC分类号: G06N99/005 , Y10S707/99936 , Y10S707/99943
摘要: Outlier detection methods and apparatus have light computational resources requirement, especially on the storage requirement, and yet achieve a state-of-the-art predictive performance. The outlier detection problem is first reduced to that of a classification learning problem, and then selective sampling based on uncertainty of prediction is applied to further reduce the amount of data required for data analysis, resulting in enhanced predictive performance. The reduction to classification essentially consists in using the unlabeled normal data as positive examples, and randomly generated synthesized examples as negative examples. Application of selective sampling makes use of an underlying, arbitrary classification learning algorithm, the data labeled by the above procedure, and proceeds iteratively. Each iteration consisting of selection of a smaller sub-sample from the input data, training of the underlying classification algorithm with the selected data, and storing the classifier output by the classification algorithm. The selection is done by essentially choosing examples that are harder to classify with the classifiers obtained in the preceding iterations. The final output hypothesis is a voting function of the classifiers obtained in the iterations of the above procedure.
摘要翻译: 异常值检测方法和装置具有较轻的计算资源需求,特别是对存储要求的要求,而且具有最先进的预测性能。 异常值检测问题首先降低到分类学习问题,然后应用基于预测不确定度的选择性抽样来进一步减少数据分析所需的数据量,从而提高预测性能。 归类分类主要在于使用未标记的正常数据作为正例,随机生成合成实例作为阴性实例。 选择性抽样的应用使用了基础的,任意的分类学习算法,由上述过程标记的数据,并且迭代地进行。 每个迭代包括从输入数据中选择较小的子样本,对所选数据训练底层分类算法,以及通过分类算法存储分类器输出。 选择是通过基本上选择难以对上述迭代中获得的分类器进行分类的示例来完成的。 最终输出假设是在上述过程的迭代中获得的分类器的投票函数。
-
公开(公告)号:US20050091524A1
公开(公告)日:2005-04-28
申请号:US10690778
申请日:2003-10-22
申请人: Naoki Abe , Carl Abrams , Chidanand Apte , Bishwaranjan Bhattacharjee , Kenneth Goldman , Matthias Gruetzner , Matthew Hilbert , John Langford , Sriram Padmanabhan , Charles Tresser , Kathleen Troidle , Philip Yu
发明人: Naoki Abe , Carl Abrams , Chidanand Apte , Bishwaranjan Bhattacharjee , Kenneth Goldman , Matthias Gruetzner , Matthew Hilbert , John Langford , Sriram Padmanabhan , Charles Tresser , Kathleen Troidle , Philip Yu
CPC分类号: G06Q20/4016 , G06F21/121 , G06F21/602 , G06F21/6218 , G06F21/6245 , G06F21/72 , G06F2221/07 , G06F2221/2101 , G06Q20/341 , G06Q20/3829 , G06Q2220/00 , G07F7/1008 , G07F7/1083 , G07F19/207 , H04L63/04
摘要: Various embodiments for maintaining security and confidentiality of data and operations within a fraud detection system. Each of these embodiments utilizes a secure architecture in which: (1) access to data is limited to only approved or authorized entities; (2) confidential details in received data can be readily identified and concealed; and (3) confidential details that have become non-confidential can be identified and exposed.
摘要翻译: 用于维护欺诈检测系统内的数据和操作的安全性和机密性的各种实施例。 这些实施例中的每一个都利用安全架构,其中:(1)对数据的访问仅限于批准或授权的实体; (2)接收到的数据的机密细节可以很容易地识别和隐藏; 和(3)可以识别和暴露出非机密的机密细节。
-
公开(公告)号:US09064364B2
公开(公告)日:2015-06-23
申请号:US10690778
申请日:2003-10-22
申请人: Naoki Abe , Carl E. Abrams , Chidanand V. Apte , Bishwaranjan Bhattacharjee , Kenneth A. Goldman , Matthias Gruetzner , Matthew A. Hilbert , John Langford , Sriram K. Padmanabhan , Charles P. Tresser , Kathleen M. Troidle , Philip S. Yu
发明人: Naoki Abe , Carl E. Abrams , Chidanand V. Apte , Bishwaranjan Bhattacharjee , Kenneth A. Goldman , Matthias Gruetzner , Matthew A. Hilbert , John Langford , Sriram K. Padmanabhan , Charles P. Tresser , Kathleen M. Troidle , Philip S. Yu
CPC分类号: G06Q20/4016 , G06F21/121 , G06F21/602 , G06F21/6218 , G06F21/6245 , G06F21/72 , G06F2221/07 , G06F2221/2101 , G06Q20/341 , G06Q20/3829 , G06Q2220/00 , G07F7/1008 , G07F7/1083 , G07F19/207 , H04L63/04
摘要: Various embodiments for maintaining security and confidentiality of data and operations within a fraud detection system. Each of these embodiments utilizes a secure architecture in which: (1) access to data is limited to only approved or authorized entities; (2) confidential details in received data can be readily identified and concealed; and (3) confidential details that have become non-confidential can be identified and exposed.
摘要翻译: 用于维护欺诈检测系统内的数据和操作的安全性和机密性的各种实施例。 这些实施例中的每一个都利用安全架构,其中:(1)对数据的访问仅限于批准或授权的实体; (2)接收到的数据的机密细节可以很容易地识别和隐藏; 和(3)可以识别和暴露出非机密的机密细节。
-
公开(公告)号:US20080022177A1
公开(公告)日:2008-01-24
申请号:US11863704
申请日:2007-09-28
申请人: Naoki Abe , John Langford
发明人: Naoki Abe , John Langford
IPC分类号: G06F11/30
CPC分类号: G06N99/005 , Y10S707/99936 , Y10S707/99943
摘要: Outlier detection methods and apparatus have light computational resources requirement, especially on the storage requirement, and yet achieve a state-of-the-art predictive performance. The outlier detection problem is first reduced to that of a classification learning problem, and then selective sampling based on uncertainty of prediction is applied to further reduce the amount of data required for data analysis, resulting in enhanced predictive performance. The reduction to classification essentially consists in using the unlabeled normal data as positive examples, and randomly generated synthesized examples as negative examples. Application of selective sampling makes use of an underlying, arbitrary classification learning algorithm, the data labeled by the above procedure, and proceeds iteratively. Each iteration consisting of selection of a smaller sub-sample from the input data, training of the underlying classification algorithm with the selected data, and storing the classifier output by the classification algorithm. The selection is done by essentially choosing examples that are harder to classify with the classifiers obtained in the preceding iterations. The final output hypothesis is a voting function of the classifiers obtained in the iterations of the above procedure.
摘要翻译: 异常值检测方法和装置具有较轻的计算资源需求,特别是对存储要求的要求,而且具有最先进的预测性能。 异常值检测问题首先降低到分类学习问题,然后应用基于预测不确定度的选择性抽样来进一步减少数据分析所需的数据量,从而提高预测性能。 归类分类主要在于使用未标记的正常数据作为正例,随机生成合成实例作为阴性实例。 选择性抽样的应用使用了基础的,任意的分类学习算法,由上述过程标记的数据,并且迭代地进行。 每个迭代包括从输入数据中选择较小的子样本,对所选数据训练底层分类算法,以及通过分类算法存储分类器输出。 选择是通过基本上选择难以对上述迭代中获得的分类器进行分类的示例来完成的。 最终输出假设是在上述过程的迭代中获得的分类器的投票函数。
-
公开(公告)号:US20050160340A1
公开(公告)日:2005-07-21
申请号:US10749518
申请日:2004-01-02
申请人: Naoki Abe , John Langford
发明人: Naoki Abe , John Langford
CPC分类号: G06N99/005 , Y10S707/99936 , Y10S707/99943
摘要: Outlier detection methods and apparatus have light computational resources requirement, especially on the storage requirement, and yet achieve a state-of-the-art predictive performance. The outlier detection problem is first reduced to that of a classification learning problem, and then selective sampling based on uncertainty of prediction is applied to further reduce the amount of data required for data analysis, resulting in enhanced predictive performance. The reduction to classification essentially consists in using the unlabeled normal data as positive examples, and randomly generated synthesized examples as negative examples. Application of selective sampling makes use of an underlying, arbitrary classification learning algorithm, the data labeled by the above procedure, and proceeds iteratively. Each iteration consisting of selection of a smaller sub-sample from the input data, training of the underlying classification algorithm with the selected data, and storing the classifier output by the classification algorithm. The selection is done by essentially choosing examples that are harder to classify with the classifiers obtained in the preceding iterations. The final output hypothesis is a voting function of the classifiers obtained in the iterations of the above procedure.
摘要翻译: 异常值检测方法和装置具有较轻的计算资源需求,特别是对存储要求的要求,而且具有最先进的预测性能。 异常值检测问题首先降低到分类学习问题,然后应用基于预测不确定度的选择性抽样来进一步减少数据分析所需的数据量,从而提高预测性能。 归类分类主要在于使用未标记的正常数据作为正例,随机生成合成实例作为阴性实例。 选择性抽样的应用使用了基础的,任意的分类学习算法,由上述过程标记的数据,并且迭代地进行。 每个迭代包括从输入数据中选择较小的子样本,对所选数据训练底层分类算法,以及通过分类算法存储分类器输出。 选择是通过基本上选择难以对上述迭代中获得的分类器进行分类的示例来完成的。 最终输出假设是在上述过程的迭代中获得的分类器的投票函数。
-
公开(公告)号:US20130290223A1
公开(公告)日:2013-10-31
申请号:US13458545
申请日:2012-04-27
申请人: Olivier Chapelle , John Langford , Miroslav Dudik , Alekh Agarwal
发明人: Olivier Chapelle , John Langford , Miroslav Dudik , Alekh Agarwal
IPC分类号: G06F15/18
CPC分类号: G06N99/005 , G06F15/18
摘要: Method, system, and programs for distributed machine learning on a cluster including a plurality of nodes are disclosed. A machine learning process is performed in each of the plurality of nodes based on a respective subset of training data to calculate a local parameter. The training data is partitioned over the plurality of nodes. A plurality of operation nodes are determined from the plurality of nodes based on a status of the machine learning process performed in each of the plurality of nodes. The plurality of operation nodes are connected to form a network topology. An aggregated parameter is generated by merging local parameters calculated in each of the plurality of operation nodes in accordance with the network topology.
摘要翻译: 公开了包括多个节点在内的分布式机器学习的方法,系统和程序。 基于训练数据的相应子集,在多个节点的每一个中执行机器学习处理,以计算局部参数。 训练数据在多个节点上分区。 基于在多个节点中的每一个中执行的机器学习处理的状态,从多个节点确定多个操作节点。 多个操作节点被连接以形成网络拓扑。 通过根据网络拓扑结合在多个操作节点中的每一个中计算的局部参数来生成聚合参数。
-
公开(公告)号:US20090287618A1
公开(公告)日:2009-11-19
申请号:US12123270
申请日:2008-05-19
CPC分类号: H04L51/12 , G06N99/005
摘要: Embodiments are directed towards using a community of weighted results from local and global message classifiers to determine whether a message is spam. Each local classifier may receive a message that is to be evaluated to determine whether it is spam. A local classifier receives the message and performs a classification of the message. The local classifier may receive predictions of whether the message is spam from at least one global classifier. The local and global predictions are combined using, in one embodiment, a regression analysis to generate a single local message classification. Combining the local and global predictions is directed towards enabling a community of predictions to be used to classify messages. The user may then re-classify this output, which in turn is used as feedback to modify weights to the local and received global predictions for a next message.
摘要翻译: 实施例旨在使用来自本地和全局消息分类器的加权结果的社区来确定消息是否是垃圾邮件。 每个本地分类器可能会收到要评估的消息,以确定它是否是垃圾邮件。 本地分类器接收消息并对消息进行分类。 本地分类器可以接收来自至少一个全局分类器的消息是否为垃圾邮件的预测。 在一个实施例中,使用回归分析来生成单个本地消息分类来组合本地和全局预测。 结合本地和全球预测,旨在使一个预测社区能够用于对消息进行分类。 然后,用户可以对该输出进行重新分类,该输出又被用作反馈以对下一个消息的本地和接收的全局预测修改权重。
-
公开(公告)号:US08174974B2
公开(公告)日:2012-05-08
申请号:US12617442
申请日:2009-11-12
IPC分类号: H04L12/26
CPC分类号: H04L47/823 , H04L47/72 , H04L47/781 , H04L47/801 , H04L47/805 , H04L47/822 , H04L47/824
摘要: Embodiments are directed towards employing an admission controller (AC) network device to coordinate voluntary requests by traffic source devices (TSDs) to transmit traffic over a network. The TSDs submit voluntary requests to transmit network traffic during an allocated time frame to the AC. The AC monitors historical network traffic data and, based on various allocation policies, provides permission to at least some of the TSDs in the form of a nonexclusive lease of bandwidth with a rate cap for an allocated time frame. The TSDs receiving the lease voluntarily agree to transmit traffic not exceeding the rate cap for the time frame of the lease. TSDs that receive a zero rate cap voluntarily agree not to transmit. However, urgent network traffic bypasses the AC. The allocation policies used to determine the rate cap and number of permitted senders include a reactive approach, a predictive approach, and a predictive-reactive approach.
摘要翻译: 实施例涉及采用准入控制器(AC)网络设备来协调业务源设备(TSD)的自发请求以通过网络传输业务。 TSD在分配的时间范围内向AC提交自愿请求来传送网络流量。 AC监视历史网络流量数据,并且基于各种分配策略,以非限制性带宽租约的形式向至少一些TSD提供对分配的时间帧的速率上限的许可。 收到租约的电讯服务供应商自愿同意传送不超过租约期限的汇率。 收到零利率上限的TSD自愿同意不传输。 然而,紧急网络流量绕过AC。 用于确定允许发件人的费率上限和数量的分配政策包括反应式方法,预测方法和预测反应方法。
-
10.
公开(公告)号:US20120016642A1
公开(公告)日:2012-01-19
申请号:US12836188
申请日:2010-07-14
申请人: Lihong Li , Wei Chu , John Langford , Robert Schapire
发明人: Lihong Li , Wei Chu , John Langford , Robert Schapire
IPC分类号: G06F17/10 , G06F15/173
CPC分类号: G06Q30/0255 , G06Q30/02 , G06Q30/0269
摘要: Methods and apparatus for performing computer-implemented personalized recommendations are disclosed. User information pertaining to a plurality of features of a plurality of users may be obtained. In addition, item information pertaining to a plurality of features of the plurality of items may be obtained. A plurality of sets of coefficients of a linear model may be obtained based at least in part on the user information and/or the item information such that each of the plurality of sets of coefficients corresponds to a different one of a plurality of items, where each of the plurality of sets of coefficients includes a plurality of coefficients, each of the plurality of coefficients corresponding to one of the plurality of features. In addition, at least one of the plurality of coefficients may be shared among the plurality of sets of coefficients for the plurality of items. Each of a plurality of scores for a user may be calculated using the linear model based at least in part upon a corresponding one of the plurality of sets of coefficients associated with a corresponding one of the plurality of items, where each of the plurality of scores indicates a level of interest in a corresponding one of a plurality of items. A plurality of confidence intervals may be ascertained, each of the plurality of confidence intervals indicating a range representing a level of confidence in a corresponding one of the plurality of scores associated with a corresponding one of the plurality of items. One of the plurality of items for which a sum of a corresponding one of the plurality of scores and a corresponding one of the plurality of confidence intervals is highest may be recommended.
摘要翻译: 公开了用于执行计算机实现的个性化推荐的方法和装置。 可以获得与多个用户的多个特征有关的用户信息。 此外,可以获得与多个项目的多个特征有关的项目信息。 可以至少部分地基于用户信息和/或项目信息来获得线性模型的多组系数,使得多个系数集合中的每一个对应于多个项目中的不同项目,其中 所述多个系数集合中的每一个包括多个系数,所述多个系数中的每一个对应于所述多个特征中的一个。 此外,可以在多个项目的多个系数集合中共享多个系数中的至少一个。 可以使用线性模型来计算用户的多个评分中的每一个,至少部分地基于与多个项目中的相应一个项目相关联的多个系数集合中的对应的一组,其中多个分数中的每一个 表示多个项目中相应的一个项目的兴趣程度。 可以确定多个置信区间,所述多个置信区间中的每一个表示表示与所述多个项目中的对应的一个项目相关联的所述多个分数中的对应的一个分数中的置信水平的范围。 可以推荐多个评分中的相应一个分数和多个置信区间中的相应一个的最大值的多个项目中的一个。
-
-
-
-
-
-
-
-
-