-
公开(公告)号:US12100418B2
公开(公告)日:2024-09-24
申请号:US17472511
申请日:2021-09-10
Inventor: Jianhua Tao , Zheng Lian , Bin Liu , Xuefei Liu
IPC: G10L25/63 , G06F18/25 , G06F40/166 , G06F40/211 , G06F40/216 , G06F40/284 , G06F40/289 , G06F40/30 , G06N20/00 , G06N20/20 , G06V20/40 , G06V40/16 , G10L15/02 , G10L15/26 , G10L25/30
CPC classification number: G10L25/63 , G06F18/253 , G06F40/166 , G06F40/211 , G06F40/216 , G06F40/284 , G06F40/289 , G06F40/30 , G06N20/00 , G06N20/20 , G06V20/41 , G06V40/166 , G06V40/168 , G10L15/02 , G10L15/26 , G10L25/30
Abstract: Disclosed is a dialogue emotion correction method based on a graph neural network, including: extracting acoustic features, text features, and image features from a video file to fuse them into multi-modal features; obtaining an emotion prediction result of each sentence of a dialogue in the video file by using the multi-modal features; fusing the emotion prediction result of each sentence with interaction information between talkers in the video file to obtain interaction information fused emotion features; combining, on the basis of the interaction information fused emotion features, with context-dependence relationship in the dialogue to obtain time-series information fused emotion features; correcting, by using the time-series information fused emotion features, the emotion prediction result of each sentence that is obtained previously as to obtain a more accurate emotion recognition result.
-
公开(公告)号:US11281945B1
公开(公告)日:2022-03-22
申请号:US17468994
申请日:2021-09-08
Inventor: Jianhua Tao , Licai Sun , Bin Liu , Zheng Lian
Abstract: A multimodal dimensional emotion recognition method includes: acquiring a frame-level audio feature, a frame-level video feature, and a frame-level text feature from an audio, a video, and a corresponding text of a sample to be tested; performing temporal contextual modeling on the frame-level audio feature, the frame-level video feature, and the frame-level text feature respectively by using a temporal convolutional network to obtain a contextual audio feature, a contextual video feature, and a contextual text feature; performing weighted fusion on these three features by using a gated attention mechanism to obtain a multimodal feature; splicing the multimodal feature and these three features together to obtain a spliced feature, and then performing further temporal contextual modeling on the spliced feature by using a temporal convolutional network to obtain a contextual spliced feature; and performing regression prediction on the contextual spliced feature to obtain a final dimensional emotion prediction result.
-
3.
公开(公告)号:US11908240B2
公开(公告)日:2024-02-20
申请号:US17471384
申请日:2021-09-10
Inventor: Jianhua Tao , Hao Zhang , Bin Liu , Wenxiang She
IPC: G06V40/16 , G06N3/049 , G06F18/214
CPC classification number: G06V40/176 , G06F18/2148 , G06N3/049 , G06V40/168 , G06V40/172
Abstract: Disclosed is a micro-expression recognition method based on a multi-scale spatiotemporal feature neural network, in which spatial features and temporal features of micro-expression are obtained from micro-expression video frames, and combined together to form more robust micro-expression features, at the same time, since the micro-expression occurs in local areas of a face, active local areas of the face during occurrence of the micro-expression and an overall area of the face are combined together for micro-expression recognition.
-
4.
公开(公告)号:US11238289B1
公开(公告)日:2022-02-01
申请号:US17389364
申请日:2021-07-30
Inventor: Jianhua Tao , Zheng Lian , Bin Liu , Licai Sun
IPC: G06K9/00 , G06N3/04 , G06F16/783 , A61B5/16
Abstract: An automatic lie detection method and apparatus for interactive scenarios, a device and a medium to improve the accuracy of automatic lie detection are provided. The method includes: segmenting three modalities, namely a video, an audio and a text, of a to-be-detected sample; extracting short-term features of the three modalities; integrating the short-term features of the three modalities in the to-be-detected sample to obtain long-term features of the three modalities corresponding to each dialogue; integrating the long-term features of the three modalities by a self-attention mechanism to obtain a multi-modal feature of the each dialogue; integrating the multi-modal feature of the each dialogue with interactive information by a graph neutral network to obtain a multi-modal feature integrated with the interactive information; and predicting a lie level of the each dialogue according to the multi-modal feature integrated with the interactive information.
-
公开(公告)号:US11194972B1
公开(公告)日:2021-12-07
申请号:US17464421
申请日:2021-09-01
Inventor: Jianhua Tao , Ke Xu , Bin Liu , Yongwei Li
IPC: G06F17/00 , G06F40/30 , G06F40/284 , G06N3/04
Abstract: Disclosed is a semantic sentiment analysis method fusing in-depth features and time sequence models, including: converting a text into a uniformly formatted matrix of word vectors; extracting local semantic emotional text features and contextual semantic emotional text features from the matrix of word vectors; weighting the local semantic emotional text features and the contextual semantic emotional text features by using an attention mechanism to generate fused semantic emotional text features; connecting the local semantic emotional text features, the contextual semantic emotional text features and the fused semantic emotional text features to generate global semantic emotional text features; and performing final text emotional semantic analysis and recognition by using a softmax classifier and taking the global semantic emotional text features as input.
-
公开(公告)号:US11266338B1
公开(公告)日:2022-03-08
申请号:US17389381
申请日:2021-07-30
Inventor: Jianhua Tao , Mingyue Niu , Bin Liu , Qifei Li
Abstract: An automatic depression detection method includes the following steps of: inputting audio and video files, wherein the audio and video files contain original data in both audio and video modes; conducting segmentation and feature extraction on the audio and video files to obtain a plurality of audio segment horizontal features and video segment horizontal features; combining segment horizontal features into an audio horizontal feature and a video horizontal feature respectively by utilizing a feature evolution pooling objective function; and conducting attentional computation on the segment horizontal features to obtain a video attention audio feature and an audio attention video feature, splicing the audio horizontal feature, the video horizontal feature, the video attention audio feature and the audio attention video feature to form a multimodal spatio-temporal representation, and inputting the multimodal spatio-temporal representation into support vector regression to predict the depression level of individuals in the input audio and video files.
-
公开(公告)号:US11244119B1
公开(公告)日:2022-02-08
申请号:US17389383
申请日:2021-07-30
Inventor: Jianhua Tao , Licai Sun , Bin Liu , Zheng Lian
Abstract: A multi-modal lie detection method and apparatus, and a device to improve an accuracy of an automatic lie detection are provided. The multi-modal lie detection method includes inputting original data of three modalities, namely a to-be-detected audio, a to-be-detected video and a to-be-detected text; performing a feature extraction on input contents to obtain deep features of the three modalities; explicitly depicting first-order, second-order and third-order interactive relationships of the deep features of the three modalities to obtain an integrated multi-modal feature of each word; performing a context modeling on the integrated multi-modal feature of the each word to obtain a final feature of the each word; and pooling the final feature of the each word to obtain global features, and then obtaining a lie classification result by a fully-connected layer.
-
公开(公告)号:US11216652B1
公开(公告)日:2022-01-04
申请号:US17470135
申请日:2021-09-09
Inventor: Jianhua Tao , Mingyuan Xiao , Bin Liu , Zheng Lian
Abstract: An expression recognition method under a natural scene comprises: converting an input video into a video frame sequence in terms of a specified frame rate, and performing facial expression labeling on the video frame sequence to obtain a video frame labeled sequence; removing natural light impact, non-face areas, and head posture impact elimination on facial expression from the video frame labeled sequence to obtain an expression video frame sequence; augmenting the expression video frame sequence to obtain a video preprocessed frame sequence; from the video preprocessed frame sequence, extracting HOG features that characterize facial appearance and shape features, extracting second-order features that describe a face creasing degree, and extracting facial pixel-level deep neural network features by using a deep neural network; then, performing vector fusion on these three obtain facial feature fusion vectors for training; and inputting the facial feature fusion vectors into a support vector machine for expression classification.
-
公开(公告)号:US11963771B2
公开(公告)日:2024-04-23
申请号:US17472191
申请日:2021-09-10
Inventor: Jianhua Tao , Cong Cai , Bin Liu , Mingyue Niu
IPC: A61B5/16 , A61B5/00 , G06F18/25 , G06N3/044 , G06N3/045 , G06N3/048 , G06N3/08 , G06T7/00 , G06V10/80 , G06V20/40 , G10L25/30 , G10L25/57 , G10L25/63 , G10L25/66
CPC classification number: A61B5/165 , A61B5/4803 , A61B5/7275 , G06F18/253 , G06N3/08 , G06T7/0012 , G06V20/46 , G06V20/49 , G10L25/30 , G10L25/57 , G10L25/63 , G10L25/66 , G06T2207/10016
Abstract: Disclosed is an automatic depression detection method using audio-video, including: acquiring original data containing two modalities of long-term audio file and long-term video file from an audio-video file; dividing the long-term audio file into several audio segments, and meanwhile dividing the long-term video file into a plurality of video segments; inputting each audio segment/each video segment into an audio feature extraction network/a video feature extraction network to obtain in-depth audio features/in-depth video features; calculating the in-depth audio features and the in-depth video features by using multi-head attention mechanism so as to obtain attention audio features and attention video features; aggregating the attention audio features and the attention video features into audio-video features; and inputting the audio-video features into a decision network to predict a depression level of an individual in the audio-video file.
-
公开(公告)号:US11227161B1
公开(公告)日:2022-01-18
申请号:US17471485
申请日:2021-09-10
Inventor: Jianhua Tao , Yu He , Bin Liu , Licai Sun
Abstract: A physiological signal prediction method includes: collecting a video file, the video file containing long-term videos, and contents of the video file containing data for a face of a single person and true physiological signal data; segmenting a single long-term video into multiple short-term video clips; extracting, by using each frame of image in each of the short-term video clips, features of interested regions for identifying physiological signals so as to form features of interested regions of a single frame; splicing, for each of the short-term video clips, features of interested regions of all fixed frames corresponding to the short-term video clip into features of interested regions of a multi-frame video, and converting the features of the interested regions of the multi-frame video into a spatio-temporal graph; inputting the spatio-temporal graph into a deep learning model for training, and using the trained deep learning model to predict physiological signal parameters.
-
-
-
-
-
-
-
-
-