-
公开(公告)号:US12100418B2
公开(公告)日:2024-09-24
申请号:US17472511
申请日:2021-09-10
Inventor: Jianhua Tao , Zheng Lian , Bin Liu , Xuefei Liu
IPC: G10L25/63 , G06F18/25 , G06F40/166 , G06F40/211 , G06F40/216 , G06F40/284 , G06F40/289 , G06F40/30 , G06N20/00 , G06N20/20 , G06V20/40 , G06V40/16 , G10L15/02 , G10L15/26 , G10L25/30
CPC classification number: G10L25/63 , G06F18/253 , G06F40/166 , G06F40/211 , G06F40/216 , G06F40/284 , G06F40/289 , G06F40/30 , G06N20/00 , G06N20/20 , G06V20/41 , G06V40/166 , G06V40/168 , G10L15/02 , G10L15/26 , G10L25/30
Abstract: Disclosed is a dialogue emotion correction method based on a graph neural network, including: extracting acoustic features, text features, and image features from a video file to fuse them into multi-modal features; obtaining an emotion prediction result of each sentence of a dialogue in the video file by using the multi-modal features; fusing the emotion prediction result of each sentence with interaction information between talkers in the video file to obtain interaction information fused emotion features; combining, on the basis of the interaction information fused emotion features, with context-dependence relationship in the dialogue to obtain time-series information fused emotion features; correcting, by using the time-series information fused emotion features, the emotion prediction result of each sentence that is obtained previously as to obtain a more accurate emotion recognition result.
-
2.
公开(公告)号:US11238289B1
公开(公告)日:2022-02-01
申请号:US17389364
申请日:2021-07-30
Inventor: Jianhua Tao , Zheng Lian , Bin Liu , Licai Sun
IPC: G06K9/00 , G06N3/04 , G06F16/783 , A61B5/16
Abstract: An automatic lie detection method and apparatus for interactive scenarios, a device and a medium to improve the accuracy of automatic lie detection are provided. The method includes: segmenting three modalities, namely a video, an audio and a text, of a to-be-detected sample; extracting short-term features of the three modalities; integrating the short-term features of the three modalities in the to-be-detected sample to obtain long-term features of the three modalities corresponding to each dialogue; integrating the long-term features of the three modalities by a self-attention mechanism to obtain a multi-modal feature of the each dialogue; integrating the multi-modal feature of the each dialogue with interactive information by a graph neutral network to obtain a multi-modal feature integrated with the interactive information; and predicting a lie level of the each dialogue according to the multi-modal feature integrated with the interactive information.
-
公开(公告)号:US11281945B1
公开(公告)日:2022-03-22
申请号:US17468994
申请日:2021-09-08
Inventor: Jianhua Tao , Licai Sun , Bin Liu , Zheng Lian
Abstract: A multimodal dimensional emotion recognition method includes: acquiring a frame-level audio feature, a frame-level video feature, and a frame-level text feature from an audio, a video, and a corresponding text of a sample to be tested; performing temporal contextual modeling on the frame-level audio feature, the frame-level video feature, and the frame-level text feature respectively by using a temporal convolutional network to obtain a contextual audio feature, a contextual video feature, and a contextual text feature; performing weighted fusion on these three features by using a gated attention mechanism to obtain a multimodal feature; splicing the multimodal feature and these three features together to obtain a spliced feature, and then performing further temporal contextual modeling on the spliced feature by using a temporal convolutional network to obtain a contextual spliced feature; and performing regression prediction on the contextual spliced feature to obtain a final dimensional emotion prediction result.
-
公开(公告)号:US11244119B1
公开(公告)日:2022-02-08
申请号:US17389383
申请日:2021-07-30
Inventor: Jianhua Tao , Licai Sun , Bin Liu , Zheng Lian
Abstract: A multi-modal lie detection method and apparatus, and a device to improve an accuracy of an automatic lie detection are provided. The multi-modal lie detection method includes inputting original data of three modalities, namely a to-be-detected audio, a to-be-detected video and a to-be-detected text; performing a feature extraction on input contents to obtain deep features of the three modalities; explicitly depicting first-order, second-order and third-order interactive relationships of the deep features of the three modalities to obtain an integrated multi-modal feature of each word; performing a context modeling on the integrated multi-modal feature of the each word to obtain a final feature of the each word; and pooling the final feature of the each word to obtain global features, and then obtaining a lie classification result by a fully-connected layer.
-
公开(公告)号:US11216652B1
公开(公告)日:2022-01-04
申请号:US17470135
申请日:2021-09-09
Inventor: Jianhua Tao , Mingyuan Xiao , Bin Liu , Zheng Lian
Abstract: An expression recognition method under a natural scene comprises: converting an input video into a video frame sequence in terms of a specified frame rate, and performing facial expression labeling on the video frame sequence to obtain a video frame labeled sequence; removing natural light impact, non-face areas, and head posture impact elimination on facial expression from the video frame labeled sequence to obtain an expression video frame sequence; augmenting the expression video frame sequence to obtain a video preprocessed frame sequence; from the video preprocessed frame sequence, extracting HOG features that characterize facial appearance and shape features, extracting second-order features that describe a face creasing degree, and extracting facial pixel-level deep neural network features by using a deep neural network; then, performing vector fusion on these three obtain facial feature fusion vectors for training; and inputting the facial feature fusion vectors into a support vector machine for expression classification.
-
-
-
-