Multimodal dimensional emotion recognition method

    公开(公告)号:US11281945B1

    公开(公告)日:2022-03-22

    申请号:US17468994

    申请日:2021-09-08

    Abstract: A multimodal dimensional emotion recognition method includes: acquiring a frame-level audio feature, a frame-level video feature, and a frame-level text feature from an audio, a video, and a corresponding text of a sample to be tested; performing temporal contextual modeling on the frame-level audio feature, the frame-level video feature, and the frame-level text feature respectively by using a temporal convolutional network to obtain a contextual audio feature, a contextual video feature, and a contextual text feature; performing weighted fusion on these three features by using a gated attention mechanism to obtain a multimodal feature; splicing the multimodal feature and these three features together to obtain a spliced feature, and then performing further temporal contextual modeling on the spliced feature by using a temporal convolutional network to obtain a contextual spliced feature; and performing regression prediction on the contextual spliced feature to obtain a final dimensional emotion prediction result.

    Automatic lie detection method and apparatus for interactive scenarios, device and medium

    公开(公告)号:US11238289B1

    公开(公告)日:2022-02-01

    申请号:US17389364

    申请日:2021-07-30

    Abstract: An automatic lie detection method and apparatus for interactive scenarios, a device and a medium to improve the accuracy of automatic lie detection are provided. The method includes: segmenting three modalities, namely a video, an audio and a text, of a to-be-detected sample; extracting short-term features of the three modalities; integrating the short-term features of the three modalities in the to-be-detected sample to obtain long-term features of the three modalities corresponding to each dialogue; integrating the long-term features of the three modalities by a self-attention mechanism to obtain a multi-modal feature of the each dialogue; integrating the multi-modal feature of the each dialogue with interactive information by a graph neutral network to obtain a multi-modal feature integrated with the interactive information; and predicting a lie level of the each dialogue according to the multi-modal feature integrated with the interactive information.

    Semantic sentiment analysis method fusing in-depth features and time sequence models

    公开(公告)号:US11194972B1

    公开(公告)日:2021-12-07

    申请号:US17464421

    申请日:2021-09-01

    Abstract: Disclosed is a semantic sentiment analysis method fusing in-depth features and time sequence models, including: converting a text into a uniformly formatted matrix of word vectors; extracting local semantic emotional text features and contextual semantic emotional text features from the matrix of word vectors; weighting the local semantic emotional text features and the contextual semantic emotional text features by using an attention mechanism to generate fused semantic emotional text features; connecting the local semantic emotional text features, the contextual semantic emotional text features and the fused semantic emotional text features to generate global semantic emotional text features; and performing final text emotional semantic analysis and recognition by using a softmax classifier and taking the global semantic emotional text features as input.

    Automatic depression detection method and device, and equipment

    公开(公告)号:US11266338B1

    公开(公告)日:2022-03-08

    申请号:US17389381

    申请日:2021-07-30

    Abstract: An automatic depression detection method includes the following steps of: inputting audio and video files, wherein the audio and video files contain original data in both audio and video modes; conducting segmentation and feature extraction on the audio and video files to obtain a plurality of audio segment horizontal features and video segment horizontal features; combining segment horizontal features into an audio horizontal feature and a video horizontal feature respectively by utilizing a feature evolution pooling objective function; and conducting attentional computation on the segment horizontal features to obtain a video attention audio feature and an audio attention video feature, splicing the audio horizontal feature, the video horizontal feature, the video attention audio feature and the audio attention video feature to form a multimodal spatio-temporal representation, and inputting the multimodal spatio-temporal representation into support vector regression to predict the depression level of individuals in the input audio and video files.

    Multi-modal lie detection method and apparatus, and device

    公开(公告)号:US11244119B1

    公开(公告)日:2022-02-08

    申请号:US17389383

    申请日:2021-07-30

    Abstract: A multi-modal lie detection method and apparatus, and a device to improve an accuracy of an automatic lie detection are provided. The multi-modal lie detection method includes inputting original data of three modalities, namely a to-be-detected audio, a to-be-detected video and a to-be-detected text; performing a feature extraction on input contents to obtain deep features of the three modalities; explicitly depicting first-order, second-order and third-order interactive relationships of the deep features of the three modalities to obtain an integrated multi-modal feature of each word; performing a context modeling on the integrated multi-modal feature of the each word to obtain a final feature of the each word; and pooling the final feature of the each word to obtain global features, and then obtaining a lie classification result by a fully-connected layer.

    Expression recognition method under natural scene

    公开(公告)号:US11216652B1

    公开(公告)日:2022-01-04

    申请号:US17470135

    申请日:2021-09-09

    Abstract: An expression recognition method under a natural scene comprises: converting an input video into a video frame sequence in terms of a specified frame rate, and performing facial expression labeling on the video frame sequence to obtain a video frame labeled sequence; removing natural light impact, non-face areas, and head posture impact elimination on facial expression from the video frame labeled sequence to obtain an expression video frame sequence; augmenting the expression video frame sequence to obtain a video preprocessed frame sequence; from the video preprocessed frame sequence, extracting HOG features that characterize facial appearance and shape features, extracting second-order features that describe a face creasing degree, and extracting facial pixel-level deep neural network features by using a deep neural network; then, performing vector fusion on these three obtain facial feature fusion vectors for training; and inputting the facial feature fusion vectors into a support vector machine for expression classification.

    Physiological signal prediction method

    公开(公告)号:US11227161B1

    公开(公告)日:2022-01-18

    申请号:US17471485

    申请日:2021-09-10

    Abstract: A physiological signal prediction method includes: collecting a video file, the video file containing long-term videos, and contents of the video file containing data for a face of a single person and true physiological signal data; segmenting a single long-term video into multiple short-term video clips; extracting, by using each frame of image in each of the short-term video clips, features of interested regions for identifying physiological signals so as to form features of interested regions of a single frame; splicing, for each of the short-term video clips, features of interested regions of all fixed frames corresponding to the short-term video clip into features of interested regions of a multi-frame video, and converting the features of the interested regions of the multi-frame video into a spatio-temporal graph; inputting the spatio-temporal graph into a deep learning model for training, and using the trained deep learning model to predict physiological signal parameters.

Patent Agency Ranking