ABSTRACT GENERATION DEVICE, METHOD, PROGRAM, AND RECORDING MEDIUM

    公开(公告)号:US20220189468A1

    公开(公告)日:2022-06-16

    申请号:US17425696

    申请日:2020-01-16

    摘要: A speech recognition unit (12) converts an input utterance sequence into a confusion network sequence constituted by a k-best of candidate words of speech recognition results; a lattice generating unit (14) generates a lattice sequence having the candidate words as internal nodes and a combination of k words among the candidate words for an identical speech as an external node, in which edges are extended between internal nodes other than internal nodes included in an identical external node, from the confusion network sequence; an integer programming problem generating unit (16) generates an integer programming problem for selecting a path that maximizes an objective function including at least a coverage score of an important word, of paths following the internal nodes with the edges extended, in the lattice sequence; and the summary generating unit generates a high-quality summary having less speech recognition errors and low redundancy using candidate words indicated by the internal nodes included in the path selected by solving the integer programming problem, under a constraint on the length of a summary to be generated.

    NOISE SPATIAL COVARIANCE MATRIX ESTIMATION APPARATUS, NOISE SPATIAL COVARIANCE MATRIX ESTIMATION METHOD, AND PROGRAM

    公开(公告)号:US20220130406A1

    公开(公告)日:2022-04-28

    申请号:US17437701

    申请日:2020-02-28

    摘要: A time-variant noise spatial covariance matrix is estimated effectively. Using time-frequency-divided observation signals based on observation signals acquired by collecting acoustic signals emitted from one or a plurality of sound sources and mask information expressing the occupancy probability of a component of each of the time-frequency-divided observation signals that corresponds to each noise source, a time-independent first noise spatial covariance matrix corresponding to the time-frequency-divided observation signals and the mask information belonging to a long time interval is acquired for each noise source. Further, using the mask information of each of a plurality of different short time intervals, a mixture weight corresponding to each noise source in each short time interval is acquired. Furthermore, a time-variant third noise spatial covariance matrix is acquired, the third noise spatial covariance matrix being based on a time-variant second noise spatial covariance matrix, which corresponds to the time-frequency-divided observation signals and the mask information belonging to each short time interval and relates to noise formed by adding together all of the noise sources, and a weighted sum of the first noise spatial covariance matrices with the mixture weights of the respective short time intervals.

    LEARNING DEVICE, LEARNING METHOD AND LEARNING PROGRAM

    公开(公告)号:US20210056954A1

    公开(公告)日:2021-02-25

    申请号:US16963837

    申请日:2019-02-01

    IPC分类号: G10L15/06 G10L15/02 G10L15/16

    摘要: A learning device (10) includes a feature extracting unit (11) that extracts features of speech from speech data for training, a probability calculating unit (12) that, on the basis of the features of speech, performs prefix searching using a speech recognition model of which a neural network is representative, and calculates a posterior probability of a recognition character string to obtain a plurality of hypothetical character strings, an error calculating unit (13) that calculates an error by word error rates of the plurality of hypothetical character strings and a correct character string for training, and obtains a parameter for the entire model that minimizes an expected value of summation of loss in the word error rates, and an updating unit (14) that updates a parameter of the model in accordance with the parameter obtained by the error calculating unit (13).

    EXTRACTION DEVICE, EXTRACTION METHOD, TRAINING DEVICE, TRAINING METHOD, AND PROGRAM

    公开(公告)号:US20240062771A1

    公开(公告)日:2024-02-22

    申请号:US18269761

    申请日:2021-01-05

    IPC分类号: G10L25/30 G10L25/03

    CPC分类号: G10L25/30 G10L25/03

    摘要: A learning device includes a conversion unit, a combination unit, an extraction unit, and an update unit. The conversion unit converts a mixed sound, of which sound sources for each component are known, into embedding vectors for each sound source using an embedding neural network. The combination unit combines the embedding vectors using a combination neural network to obtain a combined vector. The extraction unit extracts a target sound from the mixed sound and the combined vector using an extraction neural network. The update unit updates parameters of the embedding neural network such that a loss function calculated based on information regarding the sound sources for each component of the mixed sound and the target sound extracted by the extraction unit is optimized.

    ACOUSTIC SIGNAL ENHANCEMENT APPARATUS, METHOD AND PROGRAM

    公开(公告)号:US20230370778A1

    公开(公告)日:2023-11-16

    申请号:US18030981

    申请日:2020-10-15

    摘要: Provided is an acoustic signal enhancement device, including



    a time-space covariance matrix estimation unit 2 configured to estimate a time-space covariance matrix Rf(n),Pf(n) corresponding to a sound source n, using a power λt,f(n) of the sound source n and an observation signal vector Xt,f composed of an observation signal xm,t,f from a microphone m;
    a reverberation suppression unit 3 configured to obtain a reverberation removal filter Gf(n) of the sound source n using the time-space covariance matrix Rf(n),Pf(n), and to generate a reverberation suppression signal vector Zt,f(n) corresponding to the observation signal xm,t,f for an emphasized sound of the sound source n using the reverberation removal filter Gf(n) and the observation signal vector Xt,f; and
    a sound source separation unit 4 configured to obtain an emphatic sound yt,f(n) of the sound source n and the power λt,f(n) of the sound source n using the reverberation suppression signal vector Zt,f(n).