Patent search ap:("INSTITUTE OF AUTOMATION Page CHINESE ACADEMY OF SCIENCES") AND inv:"Zheng Lian"

1.

发明授权
Dialogue emotion correction method based on graph neural network 有权

公开(公告)号：US12100418B2

公开(公告)日：2024-09-24

申请号：US17472511

申请日：2021-09-10

Applicant: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventor： Jianhua Tao , Zheng Lian , Bin Liu , Xuefei Liu

IPC: G10L25/63 , G06F18/25 , G06F40/166 , G06F40/211 , G06F40/216 , G06F40/284 , G06F40/289 , G06F40/30 , G06N20/00 , G06N20/20 , G06V20/40 , G06V40/16 , G10L15/02 , G10L15/26 , G10L25/30

CPC classification number: G10L25/63 , G06F18/253 , G06F40/166 , G06F40/211 , G06F40/216 , G06F40/284 , G06F40/289 , G06F40/30 , G06N20/00 , G06N20/20 , G06V20/41 , G06V40/166 , G06V40/168 , G10L15/02 , G10L15/26 , G10L25/30

Abstract: Disclosed is a dialogue emotion correction method based on a graph neural network, including: extracting acoustic features, text features, and image features from a video file to fuse them into multi-modal features; obtaining an emotion prediction result of each sentence of a dialogue in the video file by using the multi-modal features; fusing the emotion prediction result of each sentence with interaction information between talkers in the video file to obtain interaction information fused emotion features; combining, on the basis of the interaction information fused emotion features, with context-dependence relationship in the dialogue to obtain time-series information fused emotion features; correcting, by using the time-series information fused emotion features, the emotion prediction result of each sentence that is obtained previously as to obtain a more accurate emotion recognition result.

2.

发明授权
Automatic lie detection method and apparatus for interactive scenarios, device and medium 有权

公开(公告)号：US11238289B1

公开(公告)日：2022-02-01

申请号：US17389364

申请日：2021-07-30

Applicant: Institute of Automation, Chinese Academy of Sciences

Inventor： Jianhua Tao , Zheng Lian , Bin Liu , Licai Sun

IPC: G06K9/00 , G06N3/04 , G06F16/783 , A61B5/16

Abstract: An automatic lie detection method and apparatus for interactive scenarios, a device and a medium to improve the accuracy of automatic lie detection are provided. The method includes: segmenting three modalities, namely a video, an audio and a text, of a to-be-detected sample; extracting short-term features of the three modalities; integrating the short-term features of the three modalities in the to-be-detected sample to obtain long-term features of the three modalities corresponding to each dialogue; integrating the long-term features of the three modalities by a self-attention mechanism to obtain a multi-modal feature of the each dialogue; integrating the multi-modal feature of the each dialogue with interactive information by a graph neutral network to obtain a multi-modal feature integrated with the interactive information; and predicting a lie level of the each dialogue according to the multi-modal feature integrated with the interactive information.

3.

发明授权
Multimodal dimensional emotion recognition method 有权

公开(公告)号：US11281945B1

公开(公告)日：2022-03-22

申请号：US17468994

申请日：2021-09-08

Applicant: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventor： Jianhua Tao , Licai Sun , Bin Liu , Zheng Lian

IPC: G06K9/00 , G06K9/62 , G10L25/63 , G06F40/279 , G10L25/30 , G06V20/40 , G06V40/16

Abstract: A multimodal dimensional emotion recognition method includes: acquiring a frame-level audio feature, a frame-level video feature, and a frame-level text feature from an audio, a video, and a corresponding text of a sample to be tested; performing temporal contextual modeling on the frame-level audio feature, the frame-level video feature, and the frame-level text feature respectively by using a temporal convolutional network to obtain a contextual audio feature, a contextual video feature, and a contextual text feature; performing weighted fusion on these three features by using a gated attention mechanism to obtain a multimodal feature; splicing the multimodal feature and these three features together to obtain a spliced feature, and then performing further temporal contextual modeling on the spliced feature by using a temporal convolutional network to obtain a contextual spliced feature; and performing regression prediction on the contextual spliced feature to obtain a final dimensional emotion prediction result.

4.

发明授权
Multi-modal lie detection method and apparatus, and device 有权

公开(公告)号：US11244119B1

公开(公告)日：2022-02-08

申请号：US17389383

申请日：2021-07-30

Applicant: Institute of Automation, Chinese Academy of Sciences

Inventor： Jianhua Tao , Licai Sun , Bin Liu , Zheng Lian

IPC: G06F40/35 , G10L15/08 , G10L15/02 , G10L15/24 , G06N3/04 , G06K9/62 , G06K9/00

Abstract: A multi-modal lie detection method and apparatus, and a device to improve an accuracy of an automatic lie detection are provided. The multi-modal lie detection method includes inputting original data of three modalities, namely a to-be-detected audio, a to-be-detected video and a to-be-detected text; performing a feature extraction on input contents to obtain deep features of the three modalities; explicitly depicting first-order, second-order and third-order interactive relationships of the deep features of the three modalities to obtain an integrated multi-modal feature of each word; performing a context modeling on the integrated multi-modal feature of the each word to obtain a final feature of the each word; and pooling the final feature of the each word to obtain global features, and then obtaining a lie classification result by a fully-connected layer.

5.

发明授权
Expression recognition method under natural scene 有权

公开(公告)号：US11216652B1

公开(公告)日：2022-01-04

申请号：US17470135

申请日：2021-09-09

Applicant: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventor： Jianhua Tao , Mingyuan Xiao , Bin Liu , Zheng Lian

IPC: G06K9/62 , G06K9/00 , G06T5/00 , G06T5/40 , G06K9/46 , G06N3/04

Abstract: An expression recognition method under a natural scene comprises: converting an input video into a video frame sequence in terms of a specified frame rate, and performing facial expression labeling on the video frame sequence to obtain a video frame labeled sequence; removing natural light impact, non-face areas, and head posture impact elimination on facial expression from the video frame labeled sequence to obtain an expression video frame sequence; augmenting the expression video frame sequence to obtain a video preprocessed frame sequence; from the video preprocessed frame sequence, extracting HOG features that characterize facial appearance and shape features, extracting second-order features that describe a face creasing degree, and extracting facial pixel-level deep neural network features by using a deep neural network; then, performing vector fusion on these three obtain facial feature fusion vectors for training; and inputting the facial feature fusion vectors into a support vector machine for expression classification.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification