-
公开(公告)号:US20190220747A1
公开(公告)日:2019-07-18
申请号:US16378696
申请日:2019-04-09
发明人: Takashi Fukuda , Osamu Ichikawa
CPC分类号: G06N3/08 , G06N3/04 , G06N3/0454 , G06N3/084 , G06N7/005 , G10L15/063 , G10L15/144 , G10L15/16 , G10L15/26 , G10L25/30
摘要: A technique for training a neural network including an input layer, one or more hidden layers and an output layer, in which the trained neural network can be used to perform a task such as speech recognition. In the technique, a base of the neural network having at least a pre-trained hidden layer is prepared. A parameter set associated with one pre-trained hidden layer in the neural network is decomposed into a plurality of new parameter sets. The number of hidden layers in the neural network is increased by using the plurality of the new parameter sets. Pre-training for the neural network is performed.
-
公开(公告)号:US20190130932A1
公开(公告)日:2019-05-02
申请号:US15800112
申请日:2017-11-01
发明人: Masayuki Suzuki , Takashi Fukuda , Toru Nagano
摘要: A method for processing a speech signal. The method comprises obtaining a logmel feature of a speech signal. The method further includes one or more processors processing the logmel feature so that the logmel feature is normalized under a constraint that a power level of the logmel feature is kept as originally obtained. The method further includes inputting the processed logmel feature into a speech-to-text system to generate corresponding text data.
-
公开(公告)号:US20190065584A1
公开(公告)日:2019-02-28
申请号:US15691900
申请日:2017-08-31
发明人: Takashi Fukuda , Hiroaki Kikuchi
IPC分类号: G06F17/30
摘要: Natural Language Processing (NLP) is performed on a corpus using a processor and a memory to extract a set of facets corresponding to a dimension in a set of dimensions. Using a score threshold, a subset of the set of facets is selected where each facet in the set of facets has a corresponding score relative to the corpus. A subsequent query is formed by increasing a complexity of a previous query using a facet in the subset of facets. The subsequent query is executed on at least a portion of the corpus. The documents in a new result set are ranked, the new result set being in response to executing the subsequent query. An output is produced from the new result set, which includes a ranking of that subset of documents whose ranks have changed by more than a threshold rank distance from the corresponding ranks in the corpus.
-
公开(公告)号:US20180247641A1
公开(公告)日:2018-08-30
申请号:US15441973
申请日:2017-02-24
IPC分类号: G10L15/16 , G10L15/02 , G10L21/038 , G10L15/06 , G10L25/24
CPC分类号: G10L15/16 , G10L15/02 , G10L15/063 , G10L21/038 , G10L25/24 , G10L2015/025
摘要: A computer-implemented method and an apparatus are provided. The method includes obtaining, by a processor, a frequency spectrum of an audio signal data. The method further includes extracting, by the processor, periodic indications from the frequency spectrum. The method also includes inputting, by the processor, the periodic indications and components of the frequency spectrum into a neural network. The method additionally includes estimating, by the processor, sound identification information from the neural network.
-
公开(公告)号:US20170278524A1
公开(公告)日:2017-09-28
申请号:US15440773
申请日:2017-02-23
发明人: Takashi Fukuda , Osamu Ichikawa
IPC分类号: G10L21/028 , G10L21/0264 , G10L15/14
CPC分类号: G10L21/028 , G10L15/14 , G10L2021/02166
摘要: Methods and systems are provided for separating a target speech from a plurality of other speeches having different directions of arrival. One of the methods includes obtaining speech signals from speech input devices disposed apart in predetermined distances from one another, calculating a direction of arrival of target speeches and directions of arrival of other speeches other than the target speeches for each of at least one pair of speech input devices, calculating an aliasing metric, wherein the aliasing metric indicates which frequency band of speeches is susceptible to spatial aliasing, enhancing speech signals arrived from the direction of arrival of the target speech signals, based on the speech signals and the direction of arrival of the target speeches, to generate the enhanced speech signals, reading a probability model, and inputting the enhanced speech signals and the aliasing metric to the probability model to output target speeches.
-
公开(公告)号:US20170052758A1
公开(公告)日:2017-02-23
申请号:US14829482
申请日:2015-08-18
发明人: Takashi Fukuda , Osamu Ichikawa
IPC分类号: G06F3/16
摘要: A method, a system, and a computer program product detect a clipping event in audio signals. The method includes digitalizing audio signals having limited frequency bands, at a sampling frequency which is greater than two times as large as the maximum frequency component of the audio signal; and detecting a clipping event of the audio signals, based on magnitudes of spectrum in a bandwidth which is greater than or equal to the limited frequency band. The sampling frequency may be greater than or equal to three times as large as the maximum frequency component of the audio signal. The detection of a clipping event may include determining, for each frame, whether or not a sum or average of the magnitudes of spectrum at the bandwidth which is greater than or equal to the limited frequency band is larger than a predetermined threshold.
摘要翻译: 方法,系统和计算机程序产品检测音频信号中的限幅事件。 该方法包括以大于音频信号的最大频率分量的两倍的采样频率数字化具有有限频带的音频信号; 以及基于大于或等于所述受限频带的带宽中的频谱的幅度来检测所述音频信号的剪切事件。 采样频率可以大于或等于音频信号的最大频率分量的三倍。 剪辑事件的检测可以包括:对于每个帧,确定在大于或等于受限频带的带宽上的频谱幅度的和或平均值是否大于预定阈值。
-
公开(公告)号:US20170004823A1
公开(公告)日:2017-01-05
申请号:US14755854
申请日:2015-06-30
发明人: Takashi Fukuda , Osamu Ichikawa , Futoshi Iwama
IPC分类号: G10L15/01 , G10L15/193 , G10L15/22 , G10L13/027
CPC分类号: G10L15/01 , G10L13/08 , G10L15/193
摘要: A method, for testing words defined in a pronunciation lexicon used in an automatic speech recognition (ASR) system, is provided. The method includes: obtaining test sentences which can be accepted by a language model used in the ASR system. The test sentences cover words defined in the pronunciation lexicon. The method further includes obtaining variations of speech data corresponding to each test sentence, and obtaining a plurality of texts by recognizing the variations of speech data, or a plurality of texts generated by recognizing the variation of speech data. The method also includes constructing a word graph, using the plurality of texts, for each test sentence, where each word in the word graph corresponds to each word defined in the pronunciation lexicon; and determining whether or not all or parts of words in a test sentence are present in a path of the word graph derived from the test sentence.
摘要翻译: 提供了一种用于测试在自动语音识别(ASR)系统中使用的发音词典中定义的单词的方法。 该方法包括:获取ASR系统中使用的语言模型可以接受的测试句子。 测试句涵盖了发音词典中定义的单词。 该方法还包括获得对应于每个测试句子的语音数据的变体,以及通过识别语音数据的变化或通过识别语音数据的变化产生的多个文本来获得多个文本。 该方法还包括为每个测试句子使用多个文本构建单词图形,其中单词图形中的每个单词对应于在发音词典中定义的每个单词; 并且确定测试句中的全部或部分单词是否存在于从测试句子导出的单词图形的路径中。
-
公开(公告)号:US12026157B2
公开(公告)日:2024-07-02
申请号:US17331719
申请日:2021-05-27
发明人: Kenta Watanabe , Takahito Tashiro , Takashi Fukuda
IPC分类号: G06F16/2453 , G06F16/93
CPC分类号: G06F16/2453 , G06F16/93
摘要: In a method for improving generation and relevancy of search results, a processor receives a search query comprising a search term. A processor generates a document group based on the search query and at least one synonym related to the search term in a synonym dictionary. The synonym dictionary may include search document attributes for base words and synonyms of the base words. A processor extracts, from the document group, an extracted document having a document attribute matching a search document attribute of the at least one synonym. A processor lists the extracted document as a search result.
-
公开(公告)号:US20230136842A1
公开(公告)日:2023-05-04
申请号:US17518027
申请日:2021-11-03
发明人: Takashi Fukuda
摘要: A computer-implemented method for preparing training data for a speech recognition model is provided including obtaining a plurality of audio data sets, each audio data set having a different acoustic feature and sorting sentences from the plurality of audio data sets so that similar sentences from different audio data sets are positioned closely, while imposing a weak constraint on audio length, to train the speech recognition model.
-
公开(公告)号:US20220375484A1
公开(公告)日:2022-11-24
申请号:US17326463
申请日:2021-05-21
发明人: Toru Nagano , Takashi Fukuda , Masayuki Suzuki
摘要: A method, computer system, and a computer program product for audio data augmentation are provided. Sets of audio data from different sources may be obtained. A respective normalization factor for at least two sources of the different sources may be calculated. The normalization factors from the at least two sources may be mixed to determine a mixed normalization factor. A first set of the sets may be normalized by using the mixed normalization factor and to obtain training data for training an acoustic model.
-
-
-
-
-
-
-
-
-