-
公开(公告)号:US20230305841A1
公开(公告)日:2023-09-28
申请号:US17701308
申请日:2022-03-22
发明人: Shubham Jain , Geoffrey Burr , Yasuteru Kohda
CPC分类号: G06F9/30036 , G06F9/30032 , G06F9/3877
摘要: Efficient data layout and alignment techniques for effectively executing AI workloads in wide-vector accelerator systems are provided. In one aspect, a method for processing AI workloads includes: logically dividing a data vector into a hierarchy of segments and sub-segments with each of the segments including more than one of the sub-segments, wherein each of the sub-segments includes words, and each of the words includes data-bits; and physically mapping the data-bits such that the words belonging to a same given one of the sub-segments are mapped contiguously across all of the segments. An AI accelerator system is also provided.
-
公开(公告)号:US20190278727A1
公开(公告)日:2019-09-12
申请号:US16425396
申请日:2019-05-29
发明人: Yasuteru Kohda , Nobuyuki Ohba
IPC分类号: G06F13/364 , H04L29/08
摘要: A method for arbitrating data transfer requests from a plurality of nodes includes specifying one or more nodes among the plurality of nodes, the one or more nodes satisfying a predetermined condition, and selecting, if two or more nodes are specified among the plurality of nodes, one node from the two or more nodes using priority information, the priority information indicating correspondence between the plurality of nodes and a plurality of priorities each assigned to one of the plurality of nodes, the correspondence changing so that the plurality of priorities are assigned equally to each of the plurality of nodes and high and low relations appear equally between pairs of priorities each assigned to a pair of nodes of the plurality of nodes.
-
公开(公告)号:US20190139840A1
公开(公告)日:2019-05-09
申请号:US16234852
申请日:2018-12-28
摘要: A chip intermediate body includes a semiconductor region including plural chip areas. The chip areas respectively are cut out as semiconductor chips. A cut region is provided along edges of the chip areas, the cut region being cut to cut out the semiconductor chips. A contact region is provided opposite to the chip areas across the cut region, the contact region being configured to be contacted by a probe of a test unit to test the chip areas, and electric wiring is provided continuously with the cut region to connect the chip areas and the contact region.
-
公开(公告)号:US09794877B2
公开(公告)日:2017-10-17
申请号:US14785658
申请日:2014-04-11
发明人: Yasunao Katayama , Yasuteru Kohda , Kohji Takano
IPC分类号: H04W52/02
CPC分类号: H04W52/0216 , H04W52/0229 , H04W52/0248 , Y02D70/00
摘要: Transmitting device, receiving device, communication device, programs, transmission method, and receiving method for wireless communication of continuous data in the form of packets. A transmitting device includes a data receiving unit that receives continuous data from a network, the continuous data including actual data and null data; a packetizing unit that deletes at least a part of the null data from the continuous data to generate a packet for wireless communication; a transmitting unit that modulates the packet into a radio carrier wave and wirelessly transmits the resulting packet; and a control unit that causes the transmitting unit to stop transmission of the radio carrier wave during at least a part of a time period in which no such packet is transmitted wirelessly.
-
公开(公告)号:US20170170890A1
公开(公告)日:2017-06-15
申请号:US15272630
申请日:2016-09-22
发明人: Kohji Takano , Daiju Nakano , Yasuteru Kohda
CPC分类号: H04B7/0868 , H04B1/04 , H04B1/709 , H04B7/0408
摘要: Methods and systems for receiving radio frequency (RF) signals include adjusting a digital baseband signal from a first RF front-end to compensate for errors based on a correlation value from a first correlator. A set of digital baseband signals from a set of respective additional RF front-ends is adjusted to compensate for errors based on correlation values from a second correlator. The adjusted digital baseband signal from the first RF front-end and the adjusted set of digital baseband signals from the set of respective additional RF front-ends are combined.
-
6.
公开(公告)号:US12045612B2
公开(公告)日:2024-07-23
申请号:US17931537
申请日:2022-09-12
CPC分类号: G06F9/30036 , G06F9/3555 , G06N20/00
摘要: An efficient pipelined implementation of digital scaling, offset and aggregation operation supports element-by-element programmable scale and offset factors. The method includes time-multiplexed parallel pipelining of a plurality of digital data words, each of the plurality of digital data words encoding an N-bit signed integer, from one of a plurality of receive-registers through a datapath that can either (1) store the plurality of digital data words directly in a dedicated first memory, (2) store the plurality of digital data words directly in a dedicated second memory, or (3) direct the plurality of digital data words into a parallel set of fused-multiply-add units. The method further includes multiplying each digital data word by a corresponding data-word retrieved from the dedicated first memory to form product data words and adding the product data words to a corresponding data-word retrieved from the dedicated second memory to form an output sum-and-product data words.
-
7.
公开(公告)号:US20240220572A1
公开(公告)日:2024-07-04
申请号:US18092183
申请日:2022-12-30
摘要: A compute engine is configured to perform self-attention computations by delaying performance of a division operation of a softmax computation, the performance including iteratively computing a first matrix multiplication of a given row vector of a first matrix and each column vector of a second matrix while determining a first scalar element representing a maximum value of the iterative first matrix multiplications; iteratively subtracting a corresponding determined first scaler element from a result of each computed first matrix multiplication and computing an elementwise exponential function based on a result of the subtraction operation to generate a plurality of elements of a given row vector of a fourth matrix; iteratively computing a second matrix multiplication of a given row vector of the fourth matrix and each column vector of a third matrix while summing the given row vectors of the fourth matrix; and computing a row vector of an output matrix.
-
公开(公告)号:US20240211532A1
公开(公告)日:2024-06-27
申请号:US18083011
申请日:2022-12-16
摘要: Systems and methods for performing layer normalization are described. A circuit can receive a sequence of input data across a plurality of clock cycles, where the sequence of input data represents a portion of an input vector. The circuit can determine a plurality of sums and a plurality of sums of squares corresponding to the sequence of input data. The circuit can determine, based on the plurality of sums of squares, a first scalar representing an inverse square-root of a variance of vector elements in the input vector. The circuit can determine a second scalar representing a negation of a product of the first scalar and a mean of the vector elements in the input vector. The circuit can determine, based on the first scalar, the second scalar and the received sequence of input data, an output vector that is a normalization of the input vector.
-
公开(公告)号:US11144489B2
公开(公告)日:2021-10-12
申请号:US16425396
申请日:2019-05-29
发明人: Yasuteru Kohda , Nobuyuki Ohba
IPC分类号: H04L29/08 , G06F13/364 , H04L12/803
摘要: A method for arbitrating data transfer requests from a plurality of nodes includes specifying one or more nodes among the plurality of nodes, the one or more nodes satisfying a predetermined condition, and selecting, if two or more nodes are specified among the plurality of nodes, one node from the two or more nodes using priority information, the priority information indicating correspondence between the plurality of nodes and a plurality of priorities each assigned to one of the plurality of nodes, the correspondence changing so that the plurality of priorities are assigned equally to each of the plurality of nodes and high and low relations appear equally between pairs of priorities each assigned to a pair of nodes of the plurality of nodes.
-
公开(公告)号:US10999191B2
公开(公告)日:2021-05-04
申请号:US16739929
申请日:2020-01-10
发明人: Yasuteru Kohda , Nobuyuki Ohba
IPC分类号: H04L12/721 , H04L29/08 , H04L12/931 , H04L12/937 , H04L12/865
摘要: A method is provided for packet broadcasting in a mesh-interconnected multi-computer network having a plurality of routers interconnected to a plurality of arbiters. The method includes live-lock free arbitering, by each of the plurality of arbiters, between two or more packet broadcast requests using a shared priority matrix, implemented by a binary matrix, that selects one of the two or more packet broadcast requests and includes a column for each of the plurality of routers, the shared priority matrix being shared amongst the plurality of arbiters and storing priority information determined from summing the matrix column values and relating to a correspondence between a plurality of packet broadcast requests, including the two or more packet broadcast requests, with respect to priorities assigned to each of the plurality of packet broadcast requests. Each of the columns of the shared priority matrix corresponds to a respective one of the routers.
-
-
-
-
-
-
-
-
-