-
公开(公告)号:US11681602B2
公开(公告)日:2023-06-20
申请号:US16896216
申请日:2020-06-09
发明人: Lin Li , Xiaoyang Li , Zhiqiang Hui , Zheng Wang , Zongpu Qi
CPC分类号: G06F11/3419 , G06F9/48 , G06F11/3024 , G06F11/3089 , G06F11/3466 , G06F2209/508
摘要: A performance analysis system includes a picker module and a calculation circuit. The picker module is placed in the processing device to capture a plurality of pieces of time information of a unit circuit of each of a plurality of tasks in the processing device during total execution time of processing the plurality of tasks. The calculation circuit performs an interval analysis operation on the time information. The interval analysis operation includes: calculating an overlap period between a current task and a previous task; and counting time occupied by the unit circuit during the total execution time of processing the tasks by the processing device according to a relation between the current time interval of the current task corresponding to the unit circuit and the overlap period.
-
公开(公告)号:US10310761B2
公开(公告)日:2019-06-04
申请号:US15797617
申请日:2017-10-30
发明人: Zongpu Qi , Di Hu , Wei Zhao , Zheng Wang , Xiaoyang Li
IPC分类号: G06F3/06
摘要: A storage device includes a memory unit, an access monitor, and a memory configurator. The memory unit includes a plurality of memory blocks. The access monitor is configured to monitor whether an access mode of the memory unit is a continuous-access mode or a random-access mode, to generate a monitor signal. The memory configurator configures, according to the monitor signal, any of the memory blocks to be either in a cache mode or a SRAM state to generate a configuration signal.
-
公开(公告)号:US10754648B2
公开(公告)日:2020-08-25
申请号:US16027983
申请日:2018-07-05
发明人: Jing Chen , Xiaoyang Li , Weilin Wang , Jiin Lai
IPC分类号: G06F9/30
摘要: A microprocessor having the capability of executing a micro-instruction for series calculation is provided. The microprocessor includes an instruction decoder and an execution circuit for series calculation. The micro-instruction whose source operands correspond to an undetermined number x and a plurality of coefficients a0 to an (for x0 to xn) is decoded by the instruction decoder. Based on x and a0 to an, the execution circuit for series calculation includes at least one multiplier for calculating exponentiation values of x (e.g. xp), and includes at least one MAU (multiply-and-accumulate unit) for combining x, the exponentiation values of x, and the coefficients a0 to an for the series calculation.
-
公开(公告)号:US10776108B2
公开(公告)日:2020-09-15
申请号:US16163790
申请日:2018-10-18
发明人: Jing Chen , Xiaoyang Li , Juanli Song , Zhenhua Huang , Weilin Wang , Jiin Lai
摘要: A microprocessor provides at least two storage areas and uses a datapath for Booth multiplication. According to a first and second field of a microinstruction, the datapath gets multiplicand number supply data from the first storage area and multiplier number supply data from the second storage area. The datapath operates according to a word length indicated in a third field of the microinstruction. The datapath gets multi-bit acquisitions for Booth multiplication from the multiplier number supply data. The datapath divides the multiplicand number supply data into multiplicand numbers according to the word length, and performs Booth multiplication on the multiplicand numbers based on the multi-bit acquisitions to get partial products. According to the word length, the datapath selects a part of the partial products to be shifted and added for generation of a plurality of products.
-
公开(公告)号:US10754646B2
公开(公告)日:2020-08-25
申请号:US16163776
申请日:2018-10-18
发明人: Jing Chen , Xiaoyang Li , Juanli Song , Zhenhua Huang , Weilin Wang , Jiin Lai
摘要: A microprocessor with Booth multiplication, in which several acquisition registers are used. In a first word length, a first acquisition register stores an unsigned ending acquisition of a first multiplier number carried in multiplier number supply data, and a third acquisition register stores a starting acquisition of a second multiplier number carried in the multiplier number supply data. In a second word length that is longer than the first word length, a fourth acquisition register stores a middle acquisition of a third multiplier number carried in the multiplier number supply data. A partial product selection circuit is required for selection of a partial product, to get the partial product from Booth multiplication based on the third acquisition register (corresponding to the first word length) or based on the fourth acquisition register (corresponding to the second word length).
-
公开(公告)号:US20200244281A1
公开(公告)日:2020-07-30
申请号:US16425973
申请日:2019-05-30
发明人: Lin Li , Zheng Wang , Xiaoyang Li , Zongpu Qi
摘要: An accelerated compression method and apparatus are provided. The accelerated compression apparatus includes a look-ahead buffer, a string matching processing pipeline and a control circuit. The string to be compressed extracted from the data register is stored to the look-ahead buffer. A string to be compressed includes Q characters, and a repeat flag is stored in the look-ahead buffer for each character in the string to be compressed. P instances are issued in parallel in each issue cycle. When all the characters included in P substrings corresponding to the P instances are identical to each other, the control circuit sets the repeat flags of the start characters corresponding to the last (P−1) instances among the P instances to a set state. An instance in which the repeat flag of any character of the P instances is not set to the set state is sent to the string matching processing pipeline for a matching operation, and an instance in which the repeat flags of all the characters are set to the set state is prevented from being sent to the string matching processing pipeline.
-
公开(公告)号:US20190286974A1
公开(公告)日:2019-09-19
申请号:US16004454
申请日:2018-06-11
发明人: Xiaoyang Li , Mengchen Yang , Zhenhua Huang , Weilin Wang , Jiin Lai
摘要: A processing circuit and its neural network computation method are provided. The processing circuit includes multiple processing elements (PEs), multiple auxiliary memories, a system memory, and a configuration module. The PEs perform computation processes. Each of the auxiliary memories corresponds to one of the PEs and is coupled to another two of the auxiliary memories. The system memory is coupled to all of the auxiliary memories and configured to be accessed by the PEs. The configuration module is coupled to the PEs, the auxiliary memories corresponding to the PEs, and the system memory to form a network-on-chip (NoC) structure. The configuration module statically configures computation operations of the PEs and data transmissions on the NoC structure according to a neural network computation. Accordingly, the neural network computation is optimized, and high computation performance is provided.
-
公开(公告)号:US20190243790A1
公开(公告)日:2019-08-08
申请号:US15979466
申请日:2018-05-15
发明人: Xiaoyang Li , Chen Chen , Zhenhua Huang , Weilin Wang , Jiin Lai
CPC分类号: G06F13/28 , G06F9/3887 , G06F13/1668 , G06N3/02
摘要: A direct memory access (DMA) engine and a method thereof are provided. The DMA engine controls data transmission from a source memory to a destination memory, and includes a task configuration storing module, a control module and a computing module. The task configuration storing module stores task configurations. The control module reads source data from the source memory according to the task configuration. The computing module performs a function computation on the source data from the source memory in response to the task configuration of the control module. Then, the control module outputs destination data output through the function computation to the destination memory according to the task configuration. Accordingly, on-the-fly computation is achieved during data transfer between memories.
-
公开(公告)号:US20190213478A1
公开(公告)日:2019-07-11
申请号:US15928114
申请日:2018-03-22
发明人: Xiaoyang Li , Jing Chen
摘要: A micro-processor circuit and a method of performing neural network operation are provided. The micro-processor circuit is suitable for performing neural network operation. The micro-processor circuit includes a parameter generation module, a compute module and a truncation logic. The parameter generation module receives in parallel a plurality of input parameters and a plurality of weight parameters of the neural network operation. The parameter generation module generates in parallel a plurality of sub-output parameters according to the input parameters and the weight parameters. The compute module receives in parallel the sub-output parameters. The compute module sums the sub-output parameters to generate a summed parameter. The truncation logic receives the summed parameter. The truncation logic performs a truncation operation based on the summed parameter to generate a plurality of output parameters of the neural network operation.
-
公开(公告)号:US11256633B2
公开(公告)日:2022-02-22
申请号:US16558618
申请日:2019-09-03
发明人: Xiaoyang Li , Chen Chen , Zongpu Qi , Tao Li , Xuehua Han , Wei Zhao , Dongxue Gao
IPC分类号: G06F13/16 , G06F9/48 , G06F12/1027
摘要: A processing system includes at least one core, a plurality of accelerator function unit (AFU) and a memory access unit. The memory access unit includes at least one pipeline resource and an arbitrator. The core develops a plurality of tasks. Each of the AFU is used to execute at least one of the tasks which corresponds to several memory access requests. The arbitrator selects one of the AFUs using a round-robin method at each clock period to transmit a corresponding memory access request of the selected AFU to the pipeline resource, so that the selected AFU executes the memory access request through the pipeline resource to read or write data related to the task.
-
-
-
-
-
-
-
-
-