-
公开(公告)号:US20240320047A1
公开(公告)日:2024-09-26
申请号:US18575147
申请日:2022-02-24
Applicant: Intel Corporation
Inventor: Jianhui Li , Zhennan Qin , Jiong Gong , Jingze Cui , Yijie Mei , Yunfei Song
IPC: G06F9/50
CPC classification number: G06F9/5027
Abstract: Systems, apparatuses and methods may provide for technology that identifies a data layout associated with input tensors and output tensors, generates a micro-kernel based at least in part on the data layout, and generates a nested outer loop for a kernel, wherein the micro-kernel performs one or more subtasks associated with a task represented by the kernel. The technology also includes micro-kernel code caches, fused kernel generators and cyclic dependence free graph partitioning for deep learning workloads.