SKELETON-BASED ACTION RECOGNITION USING BI-DIRECTIONAL SPATIAL-TEMPORAL TRANSFORMER

Invention Application

US20220374629A1 SKELETON-BASED ACTION RECOGNITION USING BI-DIRECTIONAL SPATIAL-TEMPORAL TRANSFORMER 有权

Please log in to see more content

Patent Title: SKELETON-BASED ACTION RECOGNITION USING BI-DIRECTIONAL SPATIAL-TEMPORAL TRANSFORMER
Application No.: US17315319

Application Date: 2021-05-09
Publication No.: US20220374629A1

Publication Date: 2022-11-24
Inventor: Bo Wu , Chuang Gan , Dakuo Wang , Kaizhi Qian
Applicant: International Business Machines Corporation
Applicant Address: US NY Armonk
Assignee: International Business Machines Corporation
Current Assignee: International Business Machines Corporation
Current Assignee Address: US NY Armonk
Main IPC: G06K9/00
IPC: G06K9/00 ; G06T7/246 ; G06K9/62 ; G06F3/01

SKELETON-BASED ACTION RECOGNITION USING BI-DIRECTIONAL SPATIAL-TEMPORAL TRANSFORMER

Abstract:

A bi-directional spatial-temporal transformer neural network (BDSTT) is trained to predict original coordinates of a skeletal joint in a specific frame through relative relationships of the skeletal joint to other joints and to the state of the skeletal joint in other frames. Obtain a plurality of frames comprising coordinates of the skeletal joint and coordinates of other joints. Produce a spatially masked frame by masking the original coordinates of the skeletal joint. Provide the specific frame, the spatially masked frame, and at least one more frame to a coordinate prediction head of the BDSTT. Obtain, from the coordinate prediction head, a prediction of coordinates for the skeletal joint. Adjust parameters of the BDSTT until a mean-squared error, between the prediction of coordinates for the skeletal joint and the original coordinates of the skeletal joint, converges.

Public/Granted literature

US11854305B2 Skeleton-based action recognition using bi-directional spatial-temporal transformer Public/Granted day:2023-12-26

Information query

Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06K	图形数据读取（图像或视频识别或理解G06V）；数据的呈现；记录载体；处理记录载体
G06K9/00	识别模式的方法或装置（图形读取或将机械参数模式（例如力或存在）转换为电信号的方法或装置 G06K11/00）（图像或视频识别或理解 G06V）（语音识别 G10L15/00 )