-
公开(公告)号:US20220335965A1
公开(公告)日:2022-10-20
申请号:US17635354
申请日:2020-08-07
发明人: Hiroshi SATO , Tsubasa OCHIAI , Keisuke KINOSHITA , Marc DELCROIX , Tomohiro NAKATANI , Atsunori OGAWA
IPC分类号: G10L25/30 , G10L19/008 , G10L21/0272
摘要: An audio signal processing apparatus (10) includes a first auxiliary feature conversion unit (12) and a second auxiliary feature conversion unit (13) that convert a plurality of signals relating to processing of an audio signal of a target speaker into a plurality of auxiliary features for the plurality of signals using a plurality of auxiliary neural networks corresponding to the plurality of signals, and an audio signal processing unit (11) that estimates information regarding an audio signal of the target speaker included in a mixed audio signal using a main neural network based on an input feature of the mixed audio signal and the plurality of auxiliary features, wherein the plurality of signals relating to processing of the audio signal of the target speaker are two or more pieces of information of different modalities.
-
公开(公告)号:US20220189468A1
公开(公告)日:2022-06-16
申请号:US17425696
申请日:2020-01-16
IPC分类号: G10L15/193 , G10L15/22 , G10L15/08
摘要: A speech recognition unit (12) converts an input utterance sequence into a confusion network sequence constituted by a k-best of candidate words of speech recognition results; a lattice generating unit (14) generates a lattice sequence having the candidate words as internal nodes and a combination of k words among the candidate words for an identical speech as an external node, in which edges are extended between internal nodes other than internal nodes included in an identical external node, from the confusion network sequence; an integer programming problem generating unit (16) generates an integer programming problem for selecting a path that maximizes an objective function including at least a coverage score of an important word, of paths following the internal nodes with the edges extended, in the lattice sequence; and the summary generating unit generates a high-quality summary having less speech recognition errors and low redundancy using candidate words indicated by the internal nodes included in the path selected by solving the integer programming problem, under a constraint on the length of a summary to be generated.
-
公开(公告)号:US20220130406A1
公开(公告)日:2022-04-28
申请号:US17437701
申请日:2020-02-28
发明人: Tomohiro NAKATANI , Marc DELCROIX , Keisuke KINOSHITA , Shoko ARAKI , Yuki KUBO
IPC分类号: G10L21/0232 , G10L21/028 , G10K11/175
摘要: A time-variant noise spatial covariance matrix is estimated effectively. Using time-frequency-divided observation signals based on observation signals acquired by collecting acoustic signals emitted from one or a plurality of sound sources and mask information expressing the occupancy probability of a component of each of the time-frequency-divided observation signals that corresponds to each noise source, a time-independent first noise spatial covariance matrix corresponding to the time-frequency-divided observation signals and the mask information belonging to a long time interval is acquired for each noise source. Further, using the mask information of each of a plurality of different short time intervals, a mixture weight corresponding to each noise source in each short time interval is acquired. Furthermore, a time-variant third noise spatial covariance matrix is acquired, the third noise spatial covariance matrix being based on a time-variant second noise spatial covariance matrix, which corresponds to the time-frequency-divided observation signals and the mask information belonging to each short time interval and relates to noise formed by adding together all of the noise sources, and a weighted sum of the first noise spatial covariance matrices with the mixture weights of the respective short time intervals.
-
公开(公告)号:US20210056954A1
公开(公告)日:2021-02-25
申请号:US16963837
申请日:2019-02-01
摘要: A learning device (10) includes a feature extracting unit (11) that extracts features of speech from speech data for training, a probability calculating unit (12) that, on the basis of the features of speech, performs prefix searching using a speech recognition model of which a neural network is representative, and calculates a posterior probability of a recognition character string to obtain a plurality of hypothetical character strings, an error calculating unit (13) that calculates an error by word error rates of the plurality of hypothetical character strings and a correct character string for training, and obtains a parameter for the entire model that minimizes an expected value of summation of loss in the word error rates, and an updating unit (14) that updates a parameter of the model in accordance with the parameter obtained by the error calculating unit (13).
-
公开(公告)号:US20210035564A1
公开(公告)日:2021-02-04
申请号:US16966096
申请日:2019-02-01
摘要: A determination device includes a memory, and processing circuitry coupled to the memory and configured to accept input of a plurality of sequences provided as candidates for a solution to one given input, and determine, for two sequences of the plurality of sequences, a sequence that has a higher accuracy than the other sequence of the two sequences, using a model expressed as a neural network.
-
公开(公告)号:US20180366135A1
公开(公告)日:2018-12-20
申请号:US15779926
申请日:2016-12-01
IPC分类号: G10L21/0232 , G10L21/0308
摘要: An observation feature value vector is calculated based on observation signals recorded at different positions in a situation in which target sound sources and background noise are present in a mixed manner; masks associated with the target sound sources and a mask associated with the background noise are estimated; a spatial correlation matrix of the target sound sources that includes the background noise is calculated based on the masks associated with the observation signals and the target sound sources; a spatial correlation matrix of the background noise is calculated based on the masks associated with the observation signals and the background noise; and a spatial correlation matrix of the target sound sources is estimated based on the matrix obtained by weighting each of the spatial correlation matrices by predetermined coefficients.
-
公开(公告)号:US20240144952A1
公开(公告)日:2024-05-02
申请号:US18277065
申请日:2021-02-15
IPC分类号: G10L21/0308 , G10L21/0208
CPC分类号: G10L21/0308 , G10L21/0208
摘要: A sound source signal is estimated with high accuracy in a noise environment. A sound source signal estimation unit (15) estimates each sound source signal using a separation matrix from an observation signal obtained by collecting a mixed acoustic signal in which a plurality of sound source signals and diffusive noise are mixed by a microphone array formed by a plurality of microphones The separation matrix is configured to convert steering vectors from each sound source to the microphone into unit vectors and convert a spatial covariance matrix of the diffusive noise into a matrix including a diagonal matrix with a size of the number of sound sources.
-
公开(公告)号:US20240062771A1
公开(公告)日:2024-02-22
申请号:US18269761
申请日:2021-01-05
摘要: A learning device includes a conversion unit, a combination unit, an extraction unit, and an update unit. The conversion unit converts a mixed sound, of which sound sources for each component are known, into embedding vectors for each sound source using an embedding neural network. The combination unit combines the embedding vectors using a combination neural network to obtain a combined vector. The extraction unit extracts a target sound from the mixed sound and the combined vector using an extraction neural network. The update unit updates parameters of the embedding neural network such that a loss function calculated based on information regarding the sound sources for each component of the mixed sound and the target sound extracted by the extraction unit is optimized.
-
公开(公告)号:US20230370778A1
公开(公告)日:2023-11-16
申请号:US18030981
申请日:2020-10-15
IPC分类号: H04R5/04 , H04R3/04 , G10L21/0208 , G10L21/0272
CPC分类号: H04R5/04 , H04R3/04 , G10L21/0208 , G10L21/0272 , G10L2021/02082
摘要: Provided is an acoustic signal enhancement device, including
a time-space covariance matrix estimation unit 2 configured to estimate a time-space covariance matrix Rf(n),Pf(n) corresponding to a sound source n, using a power λt,f(n) of the sound source n and an observation signal vector Xt,f composed of an observation signal xm,t,f from a microphone m;
a reverberation suppression unit 3 configured to obtain a reverberation removal filter Gf(n) of the sound source n using the time-space covariance matrix Rf(n),Pf(n), and to generate a reverberation suppression signal vector Zt,f(n) corresponding to the observation signal xm,t,f for an emphasized sound of the sound source n using the reverberation removal filter Gf(n) and the observation signal vector Xt,f; and
a sound source separation unit 4 configured to obtain an emphatic sound yt,f(n) of the sound source n and the power λt,f(n) of the sound source n using the reverberation suppression signal vector Zt,f(n).-
公开(公告)号:US20220301570A1
公开(公告)日:2022-09-22
申请号:US17629423
申请日:2019-08-21
IPC分类号: G10L19/008 , G10L19/02
摘要: A sound source separation filter information estimation device (10) estimates a covariance matrix having information on a correlation between sound source spectra and information on a correlation between channels as information on sound source separation filter information for separating an individual sound source signal from a mixed acoustic signal.
-
-
-
-
-
-
-
-
-