Abstract:
A system for decentralized privacy-preserving clinical data evaluation includes a plurality of sites of a decentralized private network, a memory device for storing program code, and at least one processor device operatively coupled to the memory device and configured to execute program code stored on the memory device to, for each of the local datasets, evaluate the local dataset using each of the local models to obtain one or more features related to a degree of outlierness, determine at least one outlier dataset based on the one or more features, and implement one or more actions based on the determination.
Abstract:
Techniques facilitating attention based sequential image processing are provided. A system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise an initialization component that can perform self-attention based training on a model that comprises context information associated with a sequence of images. Images of the sequence of images can be selected during the self-attention based training. The computer executable components can also comprise a localization component that can extract local information from the images selected during the self-attention based training based on the context information. In addition, the computer executable components can also comprise an integration component that can update the model based on an end-to-end integrated attention training framework comprising the context information and the local information.
Abstract:
Techniques facilitating attention based sequential image processing are provided. A system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise an initialization component that can perform self-attention based training on a model that comprises context information associated with a sequence of images. Images of the sequence of images can be selected during the self-attention based training. The computer executable components can also comprise a localization component that can extract local information from the images selected during the self-attention based training based on the context information. In addition, the computer executable components can also comprise an integration component that can update the model based on an end-to-end integrated attention training framework comprising the context information and the local information.
Abstract:
A pattern based audio searching method includes labeling a plurality of source audio data based on patterns to obtain audio label sequences of the source audio data; obtaining, with a processing device, an audio label sequence of target audio data; determining matching degree between the target audio data and the source audio data according to a predetermined matching rule based on the audio label sequence of the target audio data and the audio label sequences of the source audio data; and outputting source audio data having matching degree higher than a predetermined matching threshold as a search result.
Abstract:
Techniques are described that facilitate automatically distinguishing between different expressions of a same or similar emotion. In one embodiment, a computer-implemented is provided that comprises partitioning, by a device operatively coupled to a processor, a data set comprising facial expression data into different clusters of the facial expression data based on one or more distinguishing features respectively associated with the different clusters, wherein the facial expression data reflects facial expressions respectively expressed by people. The computer-implemented method can further comprise performing, by the device, a multi-task learning process to determine a final number of the different clusters for the data set using a multi-task learning process that is dependent on an output of an emotion classification model that classifies emotion types respectively associated with the facial expressions.
Abstract:
In an approach for visual liveness detection, a video-audio signal related to a speaker speaking a text is obtained. The video-audio signal is split into a video signal which records images of the speaker and an audio signal which records a speech spoken by the speaker. Then a first sequence indicating visual mouth openness is obtained from the video signal, and a second sequence indicating acoustic mouth openness is obtained based on the text and the audio signal. Synchrony between the first and second sequences is measured, and the liveness of the speaker is determined based on the synchrony.
Abstract:
A method and system for achieving emotional text to speech. The method includes: receiving text data; generating emotion tag for the text data by a rhythm piece; and achieving TTS to the text data corresponding to the emotion tag, where the emotion tags are expressed as a set of emotion vectors; where each emotion vector includes a plurality of emotion scores given based on a plurality of emotion categories. A system for the same includes: a text data receiving module; an emotion tag generating module; and a TTS module for achieving TTS, wherein the emotion tag is expressed as a set of emotion vectors; and wherein emotion vector includes a plurality of emotion scores given based on a plurality of emotion categories.
Abstract:
A pattern based audio searching method includes labeling a plurality of source audio data based on patterns to obtain audio label sequences of the source audio data; obtaining, with a processing device, an audio label sequence of target audio data; determining matching degree between the target audio data and the source audio data according to a predetermined matching rule based on the audio label sequence of the target audio data and the audio label sequences of the source audio data; and outputting source audio data having matching degree higher than a predetermined matching threshold as a search result.
Abstract:
A data processing method includes obtaining text information corresponding to a presented content, the presented content comprising a plurality of areas; performing text analysis on the text information to obtain a first keyword sequence, the first keyword sequence including area keywords associated with at least one area of the plurality of areas; obtaining speech information related to the presented content, the speech information at least comprising a current speech segment; and using a first model network to perform analysis on the current speech segment to determine the area corresponding to the current speech segment, wherein the first model network comprises the first keyword sequence.
Abstract:
Voice based biometric authentication method, apparatus (system), and computer program product. Provided is voice verification solution with a high accuracy rate that can prevent cheating via recording. The method includes: transmitting to the user a question prompt requiring the user to speak out a voice segment and an answer to a dynamic question, the voice segment having a corresponding text dependent speaker verification model enrolled before the authentication; segmenting, in response to receiving the voice answer, the voice segment part and the dynamic question answer part out from the voice answer; and verifying boundary smoothness between the voice segment and the answer to the dynamic question within the voice answer. With this method, whether a voice answer relates to cheating via recording is determined according to the degree of smoothness at a detected boundary. The apparatus and computer program product carry out the steps of the above-mentioned method.