-
公开(公告)号:US11429676B2
公开(公告)日:2022-08-30
申请号:US16657180
申请日:2019-10-18
发明人: Hiroaki Kikuchi , Yuichi Suzuki , Takashi Fukuda
摘要: A first user request which specifies a target document set wherein a first subset of the documents is flagged by a user. A primary flag table is created for the target document set. A first document subset is created matching the first user request. It is determined whether a number of flagged documents exceeds a first threshold. If so, a secondary flag table is created for the first document subset and flag data corresponding to the first document subset is stored in the secondary flag table. The flag data in the secondary flag table is merged into the primary flag table.
-
公开(公告)号:US20220188643A1
公开(公告)日:2022-06-16
申请号:US17119592
申请日:2020-12-11
发明人: Takashi Fukuda
摘要: A method of training a student neural network is provided. The method includes feeding a data set including a plurality of input vectors into a teacher neural network to generate a plurality of output values, and converting two of the plurality of output values from the teacher neural network for two corresponding input vectors into two corresponding soft labels. The method further includes combining the two corresponding input vectors to form a synthesized data vector, and forming a masked soft label vector from the two corresponding soft labels. The method further includes feeding the synthesized data vector into the student neural network, using the masked soft label vector to determine an error for modifying weights of the student neural network, and modifying the weights of the student neural network.
-
公开(公告)号:US11106974B2
公开(公告)日:2021-08-31
申请号:US15641379
申请日:2017-07-05
发明人: Takashi Fukuda , Osamu Ichikawa
摘要: A technique for training a neural network including an input layer, one or more hidden layers and an output layer, in which the trained neural network can be used to perform a task such as speech recognition. In the technique, a base of the neural network having at least a pre-trained hidden layer is prepared. A parameter set associated with one pre-trained hidden layer in the neural network is decomposed into a plurality of new parameter sets. The number of hidden layers in the neural network is increased by using the plurality of the new parameter sets. Pre-training for the neural network is performed.
-
公开(公告)号:US20210117483A1
公开(公告)日:2021-04-22
申请号:US16657180
申请日:2019-10-18
发明人: Hiroaki Kikuchi , Yuichi Suzuki , Takashi Fukuda
摘要: A first user request which specifies a target document set wherein a first subset of the documents is flagged by a user. A primary flag table is created for the target document set. A first document subset is created matching the first user request. It is determined whether a number of flagged documents exceeds a first threshold. If so, a secondary flag table is created for the first document subset and flag data corresponding to the first document subset is stored in the secondary flag table. The flag data in the secondary flag table is merged into the primary flag table.
-
公开(公告)号:US10783882B2
公开(公告)日:2020-09-22
申请号:US15861037
申请日:2018-01-03
发明人: Osamu Ichikawa , Gakuto Kurata , Takashi Fukuda
IPC分类号: H04M1/725 , G10L15/20 , G10L15/02 , G10L15/08 , G10L15/06 , G10L15/07 , G10L25/51 , G10L25/24 , G10L25/27
摘要: Acoustic change is detected by a method including preparing a first Gaussian Mixture Model (GMM) trained with first audio data of first speech sound from a speaker at a first distance from an audio interface and a second GMM generated from the first GMM using second audio data of second speech sound from the speaker at a second distance from the audio interface; calculating a first output of the first GMM and a second output of the second GMM by inputting obtained third audio data into the first GMM and the second GMM; and transmitting a notification in response to determining at least that a difference between the first output and the second output exceeds a threshold. Each Gaussian distribution of the second GMM has a mean obtained by shifting a mean of a corresponding Gaussian distribution of the first GMM by a common channel bias.
-
公开(公告)号:US20200293566A1
公开(公告)日:2020-09-17
申请号:US16885404
申请日:2020-05-28
IPC分类号: G06F16/36 , G06N5/00 , G06F16/93 , G06F16/31 , G06F40/169 , G06F40/216 , G06F40/242
摘要: Embodiments are directed to a system, computer program product, and method for text mining, and dynamic facet and facet value management and application to a document collection. Two or more words from a first document collection are extracted, with the extracted words being associated with an applied annotation. At least one word is selected from the extracted words, designated as a facet, and a value is selectively added to the facet. An analysis of the added value is dynamically performed, and a dictionary with the annotation, facet, and values is constructed and the dictionary is applied to the document collection. A targeted list of documents is returned from the dictionary application to the document collection.
-
公开(公告)号:US20190080684A1
公开(公告)日:2019-03-14
申请号:US15704426
申请日:2017-09-14
摘要: A computer-implemented method for processing a speech signal, includes: identifying speech segments in an input speech signal; calculating an upper variance and a lower variance, the upper variance being a variance of upper spectra larger than a criteria among speech spectra corresponding to frames in the speech segments, the lower variance being a variance of lower spectra smaller than a criteria among the speech spectra corresponding to the frames in the speech segments; determining whether the input speech signal is a special input speech signal using a difference between the upper variance and the lower variance; and performing speech recognition of the input speech signal which has been determined to be the special input speech signal, using a special acoustic model for the special input speech signal.
-
公开(公告)号:US10170103B2
公开(公告)日:2019-01-01
申请号:US15004413
申请日:2016-01-22
发明人: Takashi Fukuda
IPC分类号: G10L15/00 , G10L15/02 , G10L15/06 , G10L15/183
摘要: A method, a system, and a computer program product are provided for discriminatively training a feature-space transform. The method includes performing feature-space discriminative training (f-DT) on an initialized feature-space transform, using manually transcribed data, to obtain a pre-stage trained feature-space transform. The method further includes performing f-DT on the pre-stage trained feature-space transform as a newly initialized feature-space transform, using automatically transcribed data, to obtain a main-stage trained feature-space transform. The method additionally includes performing f-DT on the main-stage trained feature-space transform as a newly initialized feature-space transform, using manually transcribed data, to obtain a post-stage trained feature-space transform.
-
公开(公告)号:US20180350347A1
公开(公告)日:2018-12-06
申请号:US15609665
申请日:2017-05-31
发明人: Takashi Fukuda , Osamu Ichikawa , Gakuto Kurata , Masayuki Suzuki
摘要: A method, computer system, and a computer program product for generating a plurality of voice data having a particular speaking style is provided. The present invention may include preparing a plurality of original voice data corresponding to at least one word or at least one phrase is prepared. The present invention may also include attenuating a low frequency component and a high frequency component in the prepared plurality of original voice data. The present invention may then include reducing power at a beginning and an end of the prepared plurality of original voice data. The present invention may further include storing a plurality of resultant voice data obtained after the attenuating and the reducing.
-
公开(公告)号:US09818428B2
公开(公告)日:2017-11-14
申请号:US15440773
申请日:2017-02-23
发明人: Takashi Fukuda , Osamu Ichikawa
IPC分类号: G10L15/00 , G10L21/00 , G10L21/028 , G10L15/14 , G10L21/0264 , G10L25/21 , G10L21/0216
CPC分类号: G10L21/028 , G10L15/14 , G10L2021/02166
摘要: Methods and systems are provided for separating a target speech from a plurality of other speeches having different directions of arrival. One of the methods includes obtaining speech signals from speech input devices disposed apart in predetermined distances from one another, calculating a direction of arrival of target speeches and directions of arrival of other speeches other than the target speeches for each of at least one pair of speech input devices, calculating an aliasing metric, wherein the aliasing metric indicates which frequency band of speeches is susceptible to spatial aliasing, enhancing speech signals arrived from the direction of arrival of the target speech signals, based on the speech signals and the direction of arrival of the target speeches, to generate the enhanced speech signals, reading a probability model, and inputting the enhanced speech signals and the aliasing metric to the probability model to output target speeches.
-
-
-
-
-
-
-
-
-