-
公开(公告)号:US20220005466A1
公开(公告)日:2022-01-06
申请号:US17298368
申请日:2019-11-19
Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
Inventor: Takashi NAKAMURA , Tomohiro TANAKA
Abstract: A keyword is extracted robustly despite a voice recognition result including an error. A model storage unit 10 stores a keyword extraction model that accepts word vector representations of a plurality of words as an input and extracts and outputs a word vector representation of a word to be extracted as a keyword. A speech detection unit 11 detects a speech part from a voice signal. A voice recognition unit 12 executes voice recognition on the speech part of the voice signal and outputs a confusion network which is a voice recognition result. A word vector representation generating unit 13 generates a word vector representation including reliability of voice recognition with regard to each candidate word for each confusion set. A keyword extraction unit 14 inputs the word vector representation of the candidate word to the keyword extraction model in descending order of the reliability and obtains the word vector representation of the keyword.
-
2.
公开(公告)号:US20210272587A1
公开(公告)日:2021-09-02
申请号:US17293021
申请日:2019-10-31
Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
Inventor: Takashi NAKAMURA , Takaaki FUKUTOMI , Kiyoaki MATSUI
Abstract: Detection precision of a non-verbal sound is improved. An acoustic model storage unit 10A stores an acoustic model that is configured by a deep neural network with a bottleneck structure, and estimates a phoneme state from a sound feature value. A non-verbal sound model storage unit 10B stores a non-verbal sound model that estimates a posterior probability of a non-verbal sound likeliness from the sound feature value and a bottleneck feature value. A sound feature value extraction unit 11 extracts a sound feature value from an input sound signal. A bottleneck feature value estimation unit 12 inputs the sound feature value to the acoustic model and obtains an output of a bottleneck layer of the acoustic model as a bottleneck feature value. A non-verbal sound detection unit 13 inputs the sound feature value and the bottleneck feature value to the non-verbal sound model and obtains the posterior probability of the non-verbal sound likeliness output by the non-verbal sound model.
-
3.
公开(公告)号:US20220101828A1
公开(公告)日:2022-03-31
申请号:US17429737
申请日:2020-01-29
Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
Inventor: Takaaki FUKUTOMI , Takashi NAKAMURA , Kiyoaki MATSUI
IPC: G10L15/06 , G10L25/78 , G10L21/0208 , G06N20/00
Abstract: A learning data acquisition device or the like, capable of acquiring learning data by superimposing noise data on clean voice data at an appropriate SN ratio, is provided. The learning data acquisition device includes a voice recognition influence degree calculation unit and a learning data acquisition unit. The voice recognition influence degree calculation unit calculates an influence degree on voice recognition accuracy caused by a change of a signal-to-noise ratio, based on a result of voice recognition on the kth noise superimposed voice data and a result of voice recognition on the k−1th noise superimposed voice data, where K is an integer of 2 or larger, k=2, 3, . . . , K, and a signal-to-noise ratio of the the kth noise superimposed voice data is smaller than a signal-to-noise ratio of the k−1th noise superimposed voice data, and obtains a largest signal-to-noise ratio SNRapply among signal-to-noise ratios of the k−1th noise superimposed voice data when the influence degree meets a given threshold condition. The learning data acquisition unit acquires noise superimposed voice data having a signal-to-noise ratio that is equal to or larger than the signal-to-noise ratio SNRapply, as learning data.
-
公开(公告)号:US20210035558A1
公开(公告)日:2021-02-04
申请号:US16968126
申请日:2019-02-07
Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
Inventor: Takashi NAKAMURA , Takaaki FUKUTOMI
Abstract: Provided is technology for assessing whether uttered speech detected from input speech is speech suited to a prescribed purpose. A method comprises detecting, from input speech including speech uttered by a speaker and noise, the uttered speech corresponding to the speech uttered by the speaker, extracting an acoustic feature of the uttered speech, generating, from the uttered speech, a speech recognition result set with a recognition score, generating, from the speech recognition result set with the recognition score, a speech recognition result word vector expression set and a speech recognition result part-of-speech vector expression set, generating a target utterance estimation model, providing, using the target utterance estimation model, a probability of the uttered speech being suited to the prescribed purpose, and outputting the uttered speech and the speech recognition result set with the recognition score, the the uttered speech suitable to the prescribed purpose.
-
5.
公开(公告)号:US20210005215A1
公开(公告)日:2021-01-07
申请号:US16979393
申请日:2019-03-11
Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
Inventor: Takaaki FUKUTOMI , Manabu OKAMOTO , Takashi NAKAMURA , Kiyoaki MATSUI
IPC: G10L21/0208 , G10L21/013 , G10L15/06
Abstract: A training speech data generating apparatus includes: a voice conversion unit that converts, using fourth noise data, which is noise data based on third noise data, and speech data, the speech data so as to make the speech data clearly audible under a noise environment corresponding to the fourth noise data; and a noise superimposition unit that obtains training speech data by superimposing the third noise data and the converted speech data.
-
公开(公告)号:US20240013798A1
公开(公告)日:2024-01-11
申请号:US18036598
申请日:2020-11-13
Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
Inventor: Kazunori YAMADA , Ko MITSUDA , Tetsuya KINEBUCHI , Yushi AONO , Hiroko YABUSHITA , Akihiko TAKASHIMA , Takashi NAKAMURA
Abstract: A conversion device (10) includes: an evaluation unit (11) that estimates which one of subjective evaluation values obtained by quantifying easiness of transmission of a content of a voice felt by a person is to be taken from an input voice signal; and a conversion unit (12) that converts the input voice signal so as to obtain a subjective evaluation value of a predetermined value on the basis of the subjective evaluation value estimated by the evaluation unit (11).
-
公开(公告)号:US20190244604A1
公开(公告)日:2019-08-08
申请号:US16333156
申请日:2017-09-05
Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
Inventor: Hirokazu MASATAKI , Taichi ASAMI , Takashi NAKAMURA , Ryo MASUMURA
CPC classification number: G10L15/16 , G06F17/2715 , G06N3/0454 , G06N3/049 , G06N3/08 , G06N99/00 , G10L15/06 , G10L15/063 , G10L15/065 , G10L2015/0635
Abstract: A model learning device comprises: an initial value setting part that uses a parameter of a learned first model including a neural network to set a parameter of a second model including a neural network having a same network structure as the first model; a first output probability distribution calculating part that calculates a first output probability distribution including a distribution of an output probability of each unit on an output layer, using learning features and the first model; a second output probability distribution calculating part that calculates a second output probability distribution including a distribution of an output probability of each unit on the output layer, using learning features and the second model; and a modified model update part that obtains a weighted sum of a second loss function calculated from correct information and from the second output probability distribution, and a cross entropy between the first output probability distribution and the second output probability distribution, and updates the parameter of the second model so as to reduce the weighted sum.
-
公开(公告)号:US20230005467A1
公开(公告)日:2023-01-05
申请号:US17779528
申请日:2019-11-26
Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
Inventor: Kazunori YAMADA , Takashi NAKAMURA
Abstract: A dialogue apparatus includes a speech recognition unit (1) configured to perform speech recognition on utterance input to generate a text corresponding to the utterance, a speech waveform corresponding to the utterance, and information regarding a length of sound of the utterance; a language understanding unit (2) configured to grasp contents of the utterance by using the text corresponding to the utterance; a dialogue management unit (3) configured to determine contents of a response corresponding to the utterance by using the content of the utterance; an utterance state extraction unit (4) configured to extract a state of the utterance by using the text corresponding to the utterance, the speech waveform corresponding to the utterance, and the information regarding the length of the sound of the utterance; a response state determination unit (5) configured to determine a state of the response according to the state of the utterance; a response sentence generation unit (6) configured to generate a response sentence by using the content of the response; and a speech synthesis unit (7) configured to synthesize speech corresponding to the response sentence with the state of the response taken into account.
-
公开(公告)号:US20210035553A1
公开(公告)日:2021-02-04
申请号:US16968120
申请日:2019-02-06
Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
Inventor: Takashi NAKAMURA , Takaaki FUKUTOMI
Abstract: The present invention provides a device for estimating the deterioration factor of speech recognition accuracy by estimating an acoustic factor that leads to a speech recognition error. The device extracts an acoustic feature amount for each frame from an input speech, calculates a posterior probability for each acoustic event for the acoustic feature amount for each frame, corrects the posterior probability by filtering the posterior probability for each acoustic event using a time-series filter with weighting coefficients developed in the time axis, outputs a set of speech recognition results with a recognition score, outputs a feature amount for the speech recognition results for each frame, calculates and outputs a principal deterioration factor class for the speech recognition accuracy for each frame on the basis of the corrected posterior probability, the feature amount for speech recognition results for each frame, and the acoustic feature amount for each frame.
-
公开(公告)号:US20200035223A1
公开(公告)日:2020-01-30
申请号:US16337081
申请日:2017-09-27
Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
Inventor: Taichi ASAMI , Takashi NAKAMURA
Abstract: An acoustic model learning apparatus includes a first output probability distribution calculating part that calculates a first output probability distribution including a distribution of output probabilities of respective units of an output layer using a feature amount obtained from an acoustic signal for learning and a learned first acoustic model including a neural network, and the first output probability distribution calculating part obtains the first output probability distribution using a smoothing parameter made up of a real value greater than 0 as input so that the first output probability distribution approaches a uniform distribution as the smoothing parameter is greater, and calculates the first output probability distribution by obtaining logits of respective units of an output layer using the feature amount obtained from the acoustic signal for learning and the first acoustic model and setting a value of the smoothing parameter greater in the case where an output unit number with the greatest logit value is different from a correct unit number than in the case where the output unit number with the greatest logit value matches the correct unit number.
-
-
-
-
-
-
-
-
-