-
公开(公告)号:US20120092040A1
公开(公告)日:2012-04-19
申请号:US13335333
申请日:2011-12-22
申请人: Ning-Yi Xu , Feng-Hsiung Hsu , Xiong-Fei Cai
发明人: Ning-Yi Xu , Feng-Hsiung Hsu , Xiong-Fei Cai
IPC分类号: H03K19/177
CPC分类号: G06F17/30675
摘要: Accelerator systems and methods are disclosed that utilize FPGA technology to achieve better parallelism and flexibility. The accelerator system may be used to implement a relevance-ranking algorithm, such as RankBoost, for a training process. The algorithm and related data structures may be organized to enable streaming data access and, thus, increase the training speed. The data may be compressed to enable the system and method to be operable with larger data sets. At least a portion of the approximated RankBoost algorithm may be implemented as a single instruction multiple data streams (SIMD) architecture with multiple processing engines (PEs) in the FPGA. Thus, large data sets can be loaded on memories associated with an FPGA to increase the speed of the relevance ranking algorithm.
摘要翻译: 公开了使用FPGA技术实现更好的并行性和灵活性的加速器系统和方法。 加速器系统可以用于实现针对训练过程的相关性排名算法,例如RankBoost。 可以将算法和相关数据结构组织起来以实现流数据访问,从而增加训练速度。 数据可以被压缩以使系统和方法能够用较大的数据集来操作。 近似的RankBoost算法的至少一部分可以被实现为具有FPGA中的多个处理引擎(PE)的单指令多数据流(SIMD)架构。 因此,可以将大数据集加载到与FPGA相关联的存储器上,以增加相关性排序算法的速度。
-
公开(公告)号:US08583569B2
公开(公告)日:2013-11-12
申请号:US13335333
申请日:2011-12-22
申请人: Ning-Yi Xu , Xiong-Fei Cai , Feng-Hsiung Hsu
发明人: Ning-Yi Xu , Xiong-Fei Cai , Feng-Hsiung Hsu
IPC分类号: G06F15/18
CPC分类号: G06F17/30675
摘要: Accelerator systems and methods are disclosed that utilize FPGA technology to achieve better parallelism and flexibility. The accelerator system may be used to implement a relevance-ranking algorithm, such as RankBoost, for a training process. The algorithm and related data structures may be organized to enable streaming data access and, thus, increase the training speed. The data may be compressed to enable the system and method to be operable with larger data sets. At least a portion of the approximated RankBoost algorithm may be implemented as a single instruction multiple data streams (SIMD) architecture with multiple processing engines (PEs) in the FPGA. Thus, large data sets can be loaded on memories associated with an FPGA to increase the speed of the relevance ranking algorithm.
摘要翻译: 公开了使用FPGA技术实现更好的并行性和灵活性的加速器系统和方法。 加速器系统可以用于实现针对训练过程的相关性排名算法,例如RankBoost。 可以将算法和相关数据结构组织起来以实现流数据访问,从而增加训练速度。 数据可以被压缩以使系统和方法能够用较大的数据集来操作。 近似的RankBoost算法的至少一部分可以被实现为具有FPGA中的多个处理引擎(PE)的单指令多数据流(SIMD)架构。 因此,可以将大数据集加载到与FPGA相关联的存储器上,以增加相关性排序算法的速度。
-
公开(公告)号:US20100076911A1
公开(公告)日:2010-03-25
申请号:US12238012
申请日:2008-09-25
申请人: Ning-Yi Xu , Junyan Chen , Rui Gao , Xiong-Fei Cai , Feng-Hsiung Hsu
发明人: Ning-Yi Xu , Junyan Chen , Rui Gao , Xiong-Fei Cai , Feng-Hsiung Hsu
IPC分类号: G06F15/18
CPC分类号: G06F17/30864 , G06N99/005
摘要: A method using a RankBoost-based algorithm to automatically select features for further ranking model training is provided. The method reiteratively applies a set of ranking candidates to a training data set comprising a plurality of ranking objects having a known pairwise ranking order. Each round of iteration applies a weight distribution of ranking object pairs, yields a ranking result by each ranking candidate, identifies a favored ranking candidate for the round based on the ranking results, and updates the weight distribution to be used in next iteration round by increasing weights of ranking object pairs that are poorly ranked by the favored ranking candidate. The method then infers a target feature set from the favored ranking candidates identified in the iterations.
摘要翻译: 提供了一种使用基于RankBoost的算法自动选择特征进行进一步排名模型训练的方法。 该方法重复地将一组排名候选应用于包括具有已知成对排序顺序的多个排名对象的训练数据集。 每轮迭代应用排序对象对的权重分布,由每个排名候选者产生排名结果,根据排名结果识别轮次的优选排名候选者,并通过增加下一次迭代更新要使用的权重分布 排名对象对的权重由受欢迎的排名候选人排名较差。 该方法然后从迭代中识别的优选排名候选推断目标特征集。
-
公开(公告)号:US08301638B2
公开(公告)日:2012-10-30
申请号:US12238012
申请日:2008-09-25
申请人: Ning-Yi Xu , Feng-Hsiung Hsu , Rui Gao , Xiong-Fei Cai , Junyan Chen
发明人: Ning-Yi Xu , Feng-Hsiung Hsu , Rui Gao , Xiong-Fei Cai , Junyan Chen
CPC分类号: G06F17/30864 , G06N99/005
摘要: A method using a RankBoost-based algorithm to automatically select features for further ranking model training is provided. The method reiteratively applies a set of ranking candidates to a training data set comprising a plurality of ranking objects having a known pairwise ranking order. Each round of iteration applies a weight distribution of ranking object pairs, yields a ranking result by each ranking candidate, identifies a favored ranking candidate for the round based on the ranking results, and updates the weight distribution to be used in next iteration round by increasing weights of ranking object pairs that are poorly ranked by the favored ranking candidate. The method then infers a target feature set from the favored ranking candidates identified in the iterations.
摘要翻译: 提供了一种使用基于RankBoost的算法自动选择特征进行进一步排名模型训练的方法。 该方法重复地将一组排名候选应用于包括具有已知成对排序顺序的多个排序对象的训练数据集。 每轮迭代应用排序对象对的权重分布,由每个排名候选者产生排名结果,根据排名结果识别轮次的优选排名候选者,并通过增加下一次迭代更新要使用的权重分布 排名对象对的权重由受欢迎的排名候选人排名较差。 该方法然后从迭代中识别的优选排名候选推断目标特征集。
-
公开(公告)号:US08131659B2
公开(公告)日:2012-03-06
申请号:US12238239
申请日:2008-09-25
申请人: Ning-Yi Xu , Xiong-Fei Cai , Rui Gao , Jing Yan , Feng-Hsiung Hsu
发明人: Ning-Yi Xu , Xiong-Fei Cai , Rui Gao , Jing Yan , Feng-Hsiung Hsu
IPC分类号: G06Q30/00
CPC分类号: G06N3/063
摘要: Accelerator systems and methods are disclosed that utilize FPGA technology to achieve better parallelism and processing speed. A Field Programmable Gate Array (FPGA) is configured to have a hardware logic performing computations associated with a neural network training algorithm, especially a Web relevance ranking algorithm such as LambaRank. The training data is first processed and organized by a host computing device, and then streamed to the FPGA for direct access by the FPGA to perform high-bandwidth computation with increased training speed. Thus, large data sets such as that related to Web relevance ranking can be processed. The FPGA may include a processing element performing computations of a hidden layer of the neural network training algorithm. Parallel computing may be realized using a single instruction multiple data streams (SIMD) architecture with multiple arithmetic logic units in the FPGA.
摘要翻译: 公开了利用FPGA技术实现更好的并行性和处理速度的加速器系统和方法。 现场可编程门阵列(FPGA)被配置为具有执行与神经网络训练算法相关联的计算的硬件逻辑,特别是诸如LambaRank的Web相关性排序算法。 训练数据首先由主机计算机处理和组织,然后流式传输到FPGA,以便FPGA直接访问,以提高训练速度进行高带宽计算。 因此,可以处理与Web相关性排名相关的大数据集。 FPGA可以包括执行神经网络训练算法的隐藏层的计算的处理元件。 可以使用FPGA中具有多个算术逻辑单元的单指令多数据流(SIMD)架构来实现并行计算。
-
公开(公告)号:US20100076915A1
公开(公告)日:2010-03-25
申请号:US12238239
申请日:2008-09-25
申请人: Ning-Yi Xu , Xiong-Fei Cai , Rui Gao , Jing Yan , Feng-Hsiung Hsu
发明人: Ning-Yi Xu , Xiong-Fei Cai , Rui Gao , Jing Yan , Feng-Hsiung Hsu
IPC分类号: G06N3/08
CPC分类号: G06N3/063
摘要: Accelerator systems and methods are disclosed that utilize FPGA technology to achieve better parallelism and processing speed. A Field Programmable Gate Array (FPGA) is configured to have a hardware logic performing computations associated with a neural network training algorithm, especially a Web relevance ranking algorithm such as LambaRank. The training data is first processed and organized by a host computing device, and then streamed to the FPGA for direct access by the FPGA to perform high-bandwidth computation with increased training speed. Thus, large data sets such as that related to Web relevance ranking can be processed. The FPGA may include a processing element performing computations of a hidden layer of the neural network training algorithm. Parallel computing may be realized using a single instruction multiple data streams (SIMD) architecture with multiple arithmetic logic units in the FPGA.
摘要翻译: 公开了利用FPGA技术实现更好的并行性和处理速度的加速器系统和方法。 现场可编程门阵列(FPGA)被配置为具有执行与神经网络训练算法相关联的计算的硬件逻辑,特别是诸如LambaRank的Web相关性排序算法。 训练数据首先由主机计算机处理和组织,然后流式传输到FPGA,以便FPGA直接访问,以提高训练速度进行高带宽计算。 因此,可以处理与Web相关性排名相关的大数据集。 FPGA可以包括执行神经网络训练算法的隐藏层的计算的处理元件。 可以使用FPGA中具有多个算术逻辑单元的单指令多数据流(SIMD)架构来实现并行计算。
-
公开(公告)号:US08868470B2
公开(公告)日:2014-10-21
申请号:US12942736
申请日:2010-11-09
申请人: Ning-Yi Xu , Feng-Hsiung Hsu , Feng Yan
发明人: Ning-Yi Xu , Feng-Hsiung Hsu , Feng Yan
CPC分类号: G06F9/5061
摘要: Systems, methods, and devices are described for implementing learning algorithms on data sets. A data set may be partitioned into a plurality of data partitions that may be distributed to two or more processors, such as a graphics processing unit. The data partitions may be processed in parallel by each of the processors to determine local counts associated with the data partitions. The local counts may then be aggregated to form a global count that reflects the local counts for the data set. The partitioning may be performed by a data partition algorithm and the processing and the aggregating may be performed by a parallel collapsed Gibbs sampling (CGS) algorithm and/or a parallel collapsed variational Bayesian (CVB) algorithm. In addition, the CGS and/or the CVB algorithms may be associated with the data partition algorithm and may be parallelized to train a latent Dirichlet allocation model.
摘要翻译: 描述了用于实现数据集学习算法的系统,方法和设备。 数据集可以被分割成可以被分配到诸如图形处理单元之类的两个或多个处理器的多个数据分区。 数据分区可以由每个处理器并行处理以确定与数据分区相关联的本地计数。 然后可以将本地计数聚合以形成反映数据集的本地计数的全局计数。 可以通过数据分割算法执行分割,并且处理和聚合可以通过并行折叠吉布斯采样(CGS)算法和/或并行折叠变分贝叶斯(CVB)算法来执行。 此外,CGS和/或CVB算法可以与数据分区算法相关联,并且可以并行化以训练潜在的Dirichlet分配模型。
-
公开(公告)号:US20120117008A1
公开(公告)日:2012-05-10
申请号:US12942736
申请日:2010-11-09
申请人: Ning-Yi Xu , Feng-Hsiung Hsu , Feng Yan
发明人: Ning-Yi Xu , Feng-Hsiung Hsu , Feng Yan
CPC分类号: G06F9/5061
摘要: Systems, methods, and devices are described for implementing learning algorithms on data sets. A data set may be partitioned into a plurality of data partitions that may be distributed to two or more processors, such as a graphics processing unit. The data partitions may be processed in parallel by each of the processors to determine local counts associated with the data partitions. The local counts may then be aggregated to form a global count that reflects the local counts for the data set. The partitioning may be performed by a data partition algorithm and the processing and the aggregating may be performed by a parallel collapsed Gibbs sampling (CGS) algorithm and/or a parallel collapsed variational Bayesian (CVB) algorithm. In addition, the CGS and/or the CVB algorithms may be associated with the data partition algorithm and may be parallelized to train a latent Dirichlet allocation model.
摘要翻译: 描述了用于实现数据集学习算法的系统,方法和设备。 数据集可以被分割成可以被分配到诸如图形处理单元之类的两个或多个处理器的多个数据分区。 数据分区可以由每个处理器并行处理以确定与数据分区相关联的本地计数。 然后可以将本地计数聚合以形成反映数据集的本地计数的全局计数。 可以通过数据分割算法执行分割,并且处理和聚合可以通过并行折叠吉布斯采样(CGS)算法和/或并行折叠变分贝叶斯(CVB)算法来执行。 此外,CGS和/或CVB算法可以与数据分区算法相关联,并且可以并行化以训练潜在的Dirichlet分配模型。
-
-
-
-
-
-
-