Providing self-resetting multi-producer multi-consumer semaphores in distributed processor-based systems

    公开(公告)号:US11144368B2

    公开(公告)日:2021-10-12

    申请号:US16443954

    申请日:2019-06-18

    IPC分类号: G06F9/46 G06F9/52

    摘要: Providing self-resetting multi-producer multi-consumer semaphores in distributed processor-based systems is disclosed. In one aspect, a synchronization management circuit provides a semaphore including a counting semaphore value indicator, a current wait count indicator, and a target wait count indicator. When a consumer completes a wait operation, the synchronization management circuit adjusts the value of the current wait count indicator towards the value of the target wait count indicator, and compares the value of the current wait count indicator to the value of the target wait count indicator. If the value of the current wait count indicator has reached the value of the target wait count indicator, the synchronization management circuit infers that all consumers have observed the semaphore, and accordingly resets both the counting semaphore value indicator and the current wait count indicator to an initial wait value to place the semaphore in its initial state for reuse.

    Providing flexible matrix processors for performing neural network convolution in matrix-processor-based devices

    公开(公告)号:US10936943B2

    公开(公告)日:2021-03-02

    申请号:US16117952

    申请日:2018-08-30

    摘要: Providing flexible matrix processors for performing neural network convolution in matrix-processor-based devices is disclosed. In this regard, a matrix-processor-based device provides a central processing unit (CPU) and a matrix processor. The matrix processor reorganizes a plurality of weight matrices and a plurality of input matrices into swizzled weight matrices and swizzled input matrices, respectively, that have regular dimensions natively supported by the matrix processor. The matrix-processor-based device then performs a convolution operation using the matrix processor to perform matrix multiplication/accumulation operations for the regular dimensions of the weight matrices and the input matrices, and further uses the CPU to execute instructions for handling the irregular dimensions of the weight matrices and the input matrices (e.g., by executing a series of nested loops, as a non-limiting example). The matrix-processor-based device thus provides efficient hardware acceleration by taking advantage of dimensional regularity, while maintaining the flexibility to handle different variations of convolution.

    Providing efficient floating-point operations using matrix processors in processor-based systems

    公开(公告)号:US10747501B2

    公开(公告)日:2020-08-18

    申请号:US16118099

    申请日:2018-08-30

    摘要: Providing efficient floating-point operations using matrix processors in processor-based systems is disclosed. In this regard, a matrix-processor-based device provides a matrix processor comprising a positive partial sum accumulator and a negative partial sum accumulator. As the matrix processor processes pairs of floating-point operands, the matrix processor calculates an intermediate product based on a first floating-point operand and a second floating-point operand and determines a sign of the intermediate product. Based on the sign, the matrix processor normalizes the intermediate product with a partial sum fraction of the positive partial sum accumulator or the negative partial sum accumulator, then adds the intermediate product to the positive sum accumulator or the negative sum accumulator. After processing all pairs of floating-point operands, the matrix processor subtracts the negative partial sum accumulator from the positive partial sum accumulator to generate a final sum, then renormalizes the final sum a single time.