MIXUP DATA AUGMENTATION FOR KNOWLEDGE DISTILLATION FRAMEWORK

    公开(公告)号:US20220188643A1

    公开(公告)日:2022-06-16

    申请号:US17119592

    申请日:2020-12-11

    发明人: Takashi Fukuda

    IPC分类号: G06N3/08 G06N3/04 G10L15/02

    摘要: A method of training a student neural network is provided. The method includes feeding a data set including a plurality of input vectors into a teacher neural network to generate a plurality of output values, and converting two of the plurality of output values from the teacher neural network for two corresponding input vectors into two corresponding soft labels. The method further includes combining the two corresponding input vectors to form a synthesized data vector, and forming a masked soft label vector from the two corresponding soft labels. The method further includes feeding the synthesized data vector into the student neural network, using the masked soft label vector to determine an error for modifying weights of the student neural network, and modifying the weights of the student neural network.

    PROCESSING OF SPEECH SIGNAL
    77.
    发明申请

    公开(公告)号:US20190080684A1

    公开(公告)日:2019-03-14

    申请号:US15704426

    申请日:2017-09-14

    摘要: A computer-implemented method for processing a speech signal, includes: identifying speech segments in an input speech signal; calculating an upper variance and a lower variance, the upper variance being a variance of upper spectra larger than a criteria among speech spectra corresponding to frames in the speech segments, the lower variance being a variance of lower spectra smaller than a criteria among the speech spectra corresponding to the frames in the speech segments; determining whether the input speech signal is a special input speech signal using a difference between the upper variance and the lower variance; and performing speech recognition of the input speech signal which has been determined to be the special input speech signal, using a special acoustic model for the special input speech signal.

    Discriminative training of a feature-space transform

    公开(公告)号:US10170103B2

    公开(公告)日:2019-01-01

    申请号:US15004413

    申请日:2016-01-22

    发明人: Takashi Fukuda

    摘要: A method, a system, and a computer program product are provided for discriminatively training a feature-space transform. The method includes performing feature-space discriminative training (f-DT) on an initialized feature-space transform, using manually transcribed data, to obtain a pre-stage trained feature-space transform. The method further includes performing f-DT on the pre-stage trained feature-space transform as a newly initialized feature-space transform, using automatically transcribed data, to obtain a main-stage trained feature-space transform. The method additionally includes performing f-DT on the main-stage trained feature-space transform as a newly initialized feature-space transform, using manually transcribed data, to obtain a post-stage trained feature-space transform.

    GENERATION OF VOICE DATA AS DATA AUGMENTATION FOR ACOUSTIC MODEL TRAINING

    公开(公告)号:US20180350347A1

    公开(公告)日:2018-12-06

    申请号:US15609665

    申请日:2017-05-31

    IPC分类号: G10L15/06 G10L25/21 G10L25/18

    摘要: A method, computer system, and a computer program product for generating a plurality of voice data having a particular speaking style is provided. The present invention may include preparing a plurality of original voice data corresponding to at least one word or at least one phrase is prepared. The present invention may also include attenuating a low frequency component and a high frequency component in the prepared plurality of original voice data. The present invention may then include reducing power at a beginning and an end of the prepared plurality of original voice data. The present invention may further include storing a plurality of resultant voice data obtained after the attenuating and the reducing.

    Extraction of target speeches
    80.
    发明授权

    公开(公告)号:US09818428B2

    公开(公告)日:2017-11-14

    申请号:US15440773

    申请日:2017-02-23

    摘要: Methods and systems are provided for separating a target speech from a plurality of other speeches having different directions of arrival. One of the methods includes obtaining speech signals from speech input devices disposed apart in predetermined distances from one another, calculating a direction of arrival of target speeches and directions of arrival of other speeches other than the target speeches for each of at least one pair of speech input devices, calculating an aliasing metric, wherein the aliasing metric indicates which frequency band of speeches is susceptible to spatial aliasing, enhancing speech signals arrived from the direction of arrival of the target speech signals, based on the speech signals and the direction of arrival of the target speeches, to generate the enhanced speech signals, reading a probability model, and inputting the enhanced speech signals and the aliasing metric to the probability model to output target speeches.