In-line network accelerator
    11.
    发明授权

    公开(公告)号:US10129153B2

    公开(公告)日:2018-11-13

    申请号:US15595925

    申请日:2017-05-15

    Abstract: A smart NIC (Network Interface Card) is provided with features to enable the smart NIC to operate as an in-line NIC between a host's NIC and a network. The smart NIC provides pass-through transmission of network flows for the host. Packets sent to and from the host pass through the smart NIC. As a pass-through point, the smart NIC is able to accelerate the performance of the pass-through network flows by analyzing packets, inserting packets, dropping packets, inserting or recognizing congestion information, and so forth. In addition, the smart NIC provides a lightweight transport protocol (LTP) module that enables it to establish connections with other smart NICs. The LTP connections allow the smart NICs to exchange data without passing network traffic through their respective hosts.

    Customized Integrated Circuit For Serial Performance Of Smith Waterman Analysis

    公开(公告)号:US20180137237A1

    公开(公告)日:2018-05-17

    申请号:US15349725

    申请日:2016-11-11

    CPC classification number: G16B30/00 G06F7/02 G06F17/10 G16B30/10

    Abstract: Comparisons between two nucleotide sequences can be performed by customized integrated circuitry that can implement a Smith Waterman analysis in series, as opposed to the parallel implementations known in the art. Series performance enables such customized integrated circuitry to take advantage of optimizations, including enveloping thresholds that demarcate between cells of a two-dimensional matrix for which nucleotide comparisons are to be performed, and cells of the two-dimensional matrix for which no such comparison need be performed, and, instead, a value of zero can simply be entered. Additionally, such customized integrated circuitry facilitates the combination of multiple control units, each directing the comparison of a unique pair of nucleotides, with a single calculation engine that can generate values for individual cells of the two-dimensional matrices by which such pairs of nucleotides are compared.

    MACHINE LEARNING CLASSIFICATION ON HARDWARE ACCELERATORS WITH STACKED MEMORY
    13.
    发明申请
    MACHINE LEARNING CLASSIFICATION ON HARDWARE ACCELERATORS WITH STACKED MEMORY 审中-公开
    硬件加密机的机器学习分类与堆叠存储器

    公开(公告)号:US20160379137A1

    公开(公告)日:2016-12-29

    申请号:US14754323

    申请日:2015-06-29

    CPC classification number: G06N20/00 G06F9/46 G06F9/50 Y02D10/22

    Abstract: A method is provided for processing on an acceleration component a machine learning classification model. The machine learning classification model includes a plurality of decision trees, the decision trees including a first amount of decision tree data. The acceleration component includes an acceleration component die and a memory stack disposed in an integrated circuit package. The memory die includes an acceleration component memory having a second amount of memory less than the first amount of decision tree data. The memory stack includes a memory bandwidth greater than about 50 GB/sec and a power efficiency of greater than about 20 MB/sec/mW. The method includes slicing the model into a plurality of model slices, each of the model slices having a third amount of decision tree data less than or equal to the second amount of memory, storing the plurality of model slices on the memory stack, and for each of the model slices, copying the model slice to the acceleration component memory, and processing the model slice using a set of input data on the acceleration component to produce a slice result.

    Abstract translation: 提供了一种用于对加速度分量进行机器学习分类模型的处理的方法。 机器学习分类模型包括多个决策树,决策树包括第一数量的决策树数据。 加速度分量包括设置在集成电路封装中的加速度分量模具和存储器堆叠。 存储器管芯包括加速度分量存储器,其具有小于第一量决策树数据的第二存储量。 存储器堆栈包括大于约50GB /秒的存储器带宽和大于约20MB / sec / mW的功率效率。 该方法包括将模型切片成多个模型切片,每个模型切片具有小于或等于第二量存储器的第三量决策树数据,将多个模型切片存储在存储器堆栈上,并且 每个模型切片,将模型切片复制到加速度分量存储器,以及使用加速度分量上的一组输入数据来处理模型切片以产生切片结果。

    CONVOLUTIONAL NEURAL NETWORKS ON HARDWARE ACCELERATORS
    14.
    发明申请
    CONVOLUTIONAL NEURAL NETWORKS ON HARDWARE ACCELERATORS 审中-公开
    硬件加速器的连接神经网络

    公开(公告)号:US20160379109A1

    公开(公告)日:2016-12-29

    申请号:US14754367

    申请日:2015-06-29

    CPC classification number: G06N3/063 G06F15/7803 G06N3/04 G06N3/0454

    Abstract: A hardware acceleration component is provided for implementing a convolutional neural network. The hardware acceleration component includes an array of N rows and M columns of functional units, an array of N input data buffers configured to store input data, and an array of M weights data buffers configured to store weights data. Each of the N input data buffers is coupled to a corresponding one of the N rows of functional units. Each of the M weights data buffers is coupled to a corresponding one of the M columns of functional units. Each functional unit in a row is configured to receive a same set of input data. Each functional unit in a column is configured to receive a same set of weights data from the weights data buffer coupled to the row. Each of the functional units is configured to perform a convolution of the received input data and the received weights data, and the M columns of functional units are configured to provide M planes of output data.

    Abstract translation: 提供硬件加速组件来实现卷积神经网络。 硬件加速组件包括功能单元的N行和M列的阵列,被配置为存储输入数据的N个输入数据缓冲器的阵列,以及被配置为存储加权数据的M个权重数据缓冲器的阵列。 N个输入数据缓冲器中的每一个耦合到N行功能单元中的相应一个。 M个权重数据缓冲器中的每一个耦合到功能单元的M列中的相应一个。 一行中的每个功能单元被配置为接收相同的一组输入数据。 列中的每个功能单元被配置为从耦合到该行的权重数据缓冲器接收相同的一组加权数据。 每个功能单元被配置为执行所接收的输入数据和所接收的权重数据的卷积,并且功能单元的M列被配置为提供输出数据的M个平面。

    Machine learning classification on hardware accelerators with stacked memory

    公开(公告)号:US10452995B2

    公开(公告)日:2019-10-22

    申请号:US14754323

    申请日:2015-06-29

    Abstract: A method is provided for processing on an acceleration component a machine learning classification model. The machine learning classification model includes a plurality of decision trees, the decision trees including a first amount of decision tree data. The acceleration component includes an acceleration component die and a memory stack disposed in an integrated circuit package. The memory die includes an acceleration component memory having a second amount of memory less than the first amount of decision tree data. The memory stack includes a memory bandwidth greater than about 50 GB/sec and a power efficiency of greater than about 20 MB/sec/mW. The method includes slicing the model into a plurality of model slices, each of the model slices having a third amount of decision tree data less than or equal to the second amount of memory, storing the plurality of model slices on the memory stack, and for each of the model slices, copying the model slice to the acceleration component memory, and processing the model slice using a set of input data on the acceleration component to produce a slice result.

    Lightweight transport protocol
    17.
    发明授权

    公开(公告)号:US09888095B2

    公开(公告)日:2018-02-06

    申请号:US14752713

    申请日:2015-06-26

    CPC classification number: H04L69/165 H04L12/4633 H04L49/25 H04L49/30

    Abstract: A smart NIC (Network Interface Card) is provided with features to enable the smart NIC to operate as an in-line NIC between a host's NIC and a network. The smart NIC provides pass-through transmission of network flows for the host. Packets sent to and from the host pass through the smart NIC. As a pass-through point, the smart NIC is able to accelerate the performance of the pass-through network flows by analyzing packets, inserting packets, dropping packets, inserting or recognizing congestion information, and so forth. In addition, the smart NIC provides a lightweight transport protocol (LTP) module that enables it to establish connections with other smart NICs. The LTP connections allow the smart NICs to exchange data without passing network traffic through their respective hosts.

    SERVER SYSTEMS WITH HARDWARE ACCELERATORS INCLUDING STACKED MEMORY
    18.
    发明申请
    SERVER SYSTEMS WITH HARDWARE ACCELERATORS INCLUDING STACKED MEMORY 审中-公开
    带有硬件加速器的服务器系统,包括堆叠存储器

    公开(公告)号:US20160379686A1

    公开(公告)日:2016-12-29

    申请号:US14754295

    申请日:2015-06-29

    Abstract: A server unit component is provided that includes a host component including a CPU, and an acceleration component coupled to the host component. The acceleration component includes an acceleration component die and a memory stack. The acceleration component die and the memory stack are disposed in an integrated circuit package. The memory stack has a memory bandwidth greater than about 50 GB/sec and a power efficiency of greater than about 20 MB/sec/mW.

    Abstract translation: 提供了一种服务器单元组件,其包括包括CPU的主机组件和耦合到主机组件的加速组件。 加速度分量包括加速度分量模具和存储器堆叠。 加速度分量芯片和存储器堆叠被布置在集成电路封装中。 存储器堆栈具有大于约50GB /秒的存储器带宽和大于约20MB / sec / mW的功率效率。

    Convolutional neural networks on hardware accelerators

    公开(公告)号:US11200486B2

    公开(公告)日:2021-12-14

    申请号:US16440948

    申请日:2019-06-13

    Abstract: A hardware acceleration component is provided for implementing a convolutional neural network. The hardware acceleration component includes an array of N rows and M columns of functional units, an array of N input data buffers configured to store input data, and an array of M weights data buffers configured to store weights data. Each of the N input data buffers is coupled to a corresponding one of the N rows of functional units. Each of the M weights data buffers is coupled to a corresponding one of the M columns of functional units. Each functional unit in a row is configured to receive a same set of input data. Each functional unit in a column is configured to receive a same set of weights data from the weights data buffer coupled to the row. Each of the functional units is configured to perform a convolution of the received input data and the received weights data, and the M columns of functional units are configured to provide M planes of output data.

    Deep neural network partitioning on servers

    公开(公告)号:US10452971B2

    公开(公告)日:2019-10-22

    申请号:US14754384

    申请日:2015-06-29

    Abstract: A method is provided for implementing a deep neural network on a server component that includes a host component including a CPU and a hardware acceleration component coupled to the host component. The deep neural network includes a plurality of layers. The method includes partitioning the deep neural network into a first segment and a second segment, the first segment including a first subset of the plurality of layers, the second segment including a second subset of the plurality of layers, configuring the host component to implement the first segment, and configuring the hardware acceleration component to implement the second segment.

Patent Agency Ranking