Automatic lie detection method and apparatus for interactive scenarios, device and medium

    公开(公告)号:US11238289B1

    公开(公告)日:2022-02-01

    申请号:US17389364

    申请日:2021-07-30

    Abstract: An automatic lie detection method and apparatus for interactive scenarios, a device and a medium to improve the accuracy of automatic lie detection are provided. The method includes: segmenting three modalities, namely a video, an audio and a text, of a to-be-detected sample; extracting short-term features of the three modalities; integrating the short-term features of the three modalities in the to-be-detected sample to obtain long-term features of the three modalities corresponding to each dialogue; integrating the long-term features of the three modalities by a self-attention mechanism to obtain a multi-modal feature of the each dialogue; integrating the multi-modal feature of the each dialogue with interactive information by a graph neutral network to obtain a multi-modal feature integrated with the interactive information; and predicting a lie level of the each dialogue according to the multi-modal feature integrated with the interactive information.

    Multimodal dimensional emotion recognition method

    公开(公告)号:US11281945B1

    公开(公告)日:2022-03-22

    申请号:US17468994

    申请日:2021-09-08

    Abstract: A multimodal dimensional emotion recognition method includes: acquiring a frame-level audio feature, a frame-level video feature, and a frame-level text feature from an audio, a video, and a corresponding text of a sample to be tested; performing temporal contextual modeling on the frame-level audio feature, the frame-level video feature, and the frame-level text feature respectively by using a temporal convolutional network to obtain a contextual audio feature, a contextual video feature, and a contextual text feature; performing weighted fusion on these three features by using a gated attention mechanism to obtain a multimodal feature; splicing the multimodal feature and these three features together to obtain a spliced feature, and then performing further temporal contextual modeling on the spliced feature by using a temporal convolutional network to obtain a contextual spliced feature; and performing regression prediction on the contextual spliced feature to obtain a final dimensional emotion prediction result.

    Multi-modal lie detection method and apparatus, and device

    公开(公告)号:US11244119B1

    公开(公告)日:2022-02-08

    申请号:US17389383

    申请日:2021-07-30

    Abstract: A multi-modal lie detection method and apparatus, and a device to improve an accuracy of an automatic lie detection are provided. The multi-modal lie detection method includes inputting original data of three modalities, namely a to-be-detected audio, a to-be-detected video and a to-be-detected text; performing a feature extraction on input contents to obtain deep features of the three modalities; explicitly depicting first-order, second-order and third-order interactive relationships of the deep features of the three modalities to obtain an integrated multi-modal feature of each word; performing a context modeling on the integrated multi-modal feature of the each word to obtain a final feature of the each word; and pooling the final feature of the each word to obtain global features, and then obtaining a lie classification result by a fully-connected layer.

    Expression recognition method under natural scene

    公开(公告)号:US11216652B1

    公开(公告)日:2022-01-04

    申请号:US17470135

    申请日:2021-09-09

    Abstract: An expression recognition method under a natural scene comprises: converting an input video into a video frame sequence in terms of a specified frame rate, and performing facial expression labeling on the video frame sequence to obtain a video frame labeled sequence; removing natural light impact, non-face areas, and head posture impact elimination on facial expression from the video frame labeled sequence to obtain an expression video frame sequence; augmenting the expression video frame sequence to obtain a video preprocessed frame sequence; from the video preprocessed frame sequence, extracting HOG features that characterize facial appearance and shape features, extracting second-order features that describe a face creasing degree, and extracting facial pixel-level deep neural network features by using a deep neural network; then, performing vector fusion on these three obtain facial feature fusion vectors for training; and inputting the facial feature fusion vectors into a support vector machine for expression classification.

Patent Agency Ranking