Abstract:
The present invention relates to a method of training data augmentation for end-to-end speech recognition. The method for training data augmentation for end-to-end speech recognition includes: combining speech augmentation data and text augmentation data; performing a dynamic augmentation process on each of the speech augmentation data and the text augmentation data that have been combined; and training the end-to-end speech recognition using the speech augmentation data and the text augmentation data that are subjected to the dynamic augmentation process.
Abstract:
Provided is an apparatus for large vocabulary continuous speech recognition (LVCSR) based on a context-dependent deep neural network hidden Markov model (CD-DNN-HMM) algorithm. The apparatus may include an extractor configured to extract acoustic model-state level information corresponding to an input speech signal from a training data model set using at least one of a first feature vector based on a gammatone filterbank signal analysis algorithm and a second feature vector based on a bottleneck algorithm, and a speech recognizer configured to provide a result of recognizing the input speech signal based on the extracted acoustic model-state level information.
Abstract:
A speech recognition broadcasting apparatus that uses a smart remote control and a controlling method thereof, the method including receiving a runtime resource for speech recognition from a speech recognition server; receiving a speech signal from the smart remote control; recognizing the speech signal based on the received runtime resource for speech recognition; transmitting a result of recognition of the speech signal to the smart remote control; receiving at least one of EPG (Electronic Program Guide) search information or control information of the speech recognition broadcasting apparatus that are based on the result of recognition from the smart remote control; and outputting a search screen or controlling the speech recognition broadcasting apparatus based on the EPG search information or control information of the speech recognition broadcasting apparatus.
Abstract:
An exploration method used by an exploration apparatus in multi-agent reinforcement learning to collect training samples during the training process is provided. The exploration method includes calculating the influence of a selected action of each agent on the actions of other agents in a current state, calculating a linear sum of the value of a utility function representing the action value of each agent and the influence on the actions of the other agent calculated for the selected action of each agent, and obtaining a sample to be used for training an action policy of each agent by probabilistically selecting the action in which the linear sum is the maximum, and the random action.
Abstract:
The present disclosure relates to an apparatus and method for separating voice sections from each other. Various embodiments are directed to providing an apparatus and method for separating voice sections from each other, which can maximize speaker separation performance for a short voice section by dividing a short voice section having low speaker separation reliability and separating multiple speakers from one another.
Abstract:
Provided are a system and method for adaptive masking and non-directional language understanding and generation. The system for adaptive masking and non-directional language understanding and generation according to the present invention includes an encoder unit including an adaptive masking block for performing masking on training data, a language generator for restoring masked words, and an encoder for detecting whether or not the restored sentence construction words are original, and a decoder unit including a generation word position detector for detecting a position of a word to be generated next, a language generator for determining a word suitable for the corresponding position, and a non-directional training data generator for decoder training.
Abstract:
A method and apparatus for estimating a user's requirement through a neural network which are capable of reading and writing a working memory and for providing fashion coordination knowledge appropriate for the requirement through the neural network using a long-term memory, by using the neural network using an explicit memory, in order to accurately provide the fashion coordination knowledge. The apparatus includes a language embedding unit for embedding a user's question and a previously created answer to acquire a digitized embedding vector; a fashion coordination knowledge creation unit for creating fashion coordination through the neural network having the explicit memory by using the embedding vector as an input; and a dialog creation unit for creating dialog content for configuring the fashion coordination through the neural network having the explicit memory by using the fashion coordination knowledge and the embedding vector an input.
Abstract:
Provided are sentence embedding method and apparatus based on subword embedding and skip-thoughts. To integrate skip-thought sentence embedding learning methodology with a subword embedding technique, a skip-thought sentence embedding learning method based on subword embedding and methodology for simultaneously learning subword embedding learning and skip-thought sentence embedding learning, that is, multitask learning methodology, are provided as methodology for applying intra-sentence contextual information to subword embedding in the case of subword embedding learning. This makes it possible to apply a sentence embedding approach to agglutinative languages such as Korean in a bag-of-words form. Also, skip-thought sentence embedding learning methodology is integrated with a subword embedding technique such that intra-sentence contextual information can be used in the case of subword embedding learning. A proposed model minimizes additional training parameters based on sentence embedding such that most training results may be accumulated in a subword embedding parameter.
Abstract:
The present invention relates to a multi-modality system for recommending multiple items using an interaction and a method of operating the same. The multi-modality system includes an interaction data preprocessing module that preprocesses an interaction data set and converts the preprocessed interaction data set into interaction training data; an item data preprocessing module that preprocesses item information data and converts the preprocessed item information data into item training data; and a learning module that includes a neural network model that is trained using the interaction training data and the item training data and outputs a result including a set of recommended items using a conversation context with a user as input.
Abstract:
A concept based few-shot learning method is disclosed. The method includes estimating a task embedding corresponding to a task to be executed from support data that is a small amount of learning data; calculating a slot probability of a concept memory necessary for a task based on the task embedding; extracting features of query data that is test data, and of the support data; comparing local features for the extracted features with slots of a concept memory to extract a concept, and generating synthesis features to have maximum similarity to the extracted features through the slots of the concept memory; and calculating a task execution result from the synthesis feature and the extracted concept by applying the slot probability as a weight.