Speech instruction recognition method, electronic device, and non-transient computer readable storage medium

    公开(公告)号:US12230275B2

    公开(公告)日:2025-02-18

    申请号:US17611436

    申请日:2021-01-06

    Inventor: Shaoxun Su

    Abstract: A speech instruction recognition method, an electronic device, and a non-transient computer readable storage medium. The speech instruction recognition method comprises: acquiring a target speech; processing the target speech to obtain a target speech vector corresponding to the target speech; performing speech recognition on the target speech to obtain a target speech text of the target speech, and processing the target speech text to obtain a target text vector corresponding to the target speech text; and inputting the target speech vector and the target text vector to a pre-trained instruction recognition model to obtain an instruction category corresponding to the target speech.

    Sound processing method
    4.
    发明授权

    公开(公告)号:US11996115B2

    公开(公告)日:2024-05-28

    申请号:US17435761

    申请日:2019-12-18

    Inventor: Mitsuru Sendoda

    CPC classification number: G10L25/24 G10L25/51

    Abstract: A sound processing apparatus includes a feature value extractor configured to perform a Fourier transform and then a cepstral analysis of a sound signal and to extract, as feature values of the sound signal, values including frequency components obtained by the Fourier transform of the sound signal and a value based on a result obtained by the cepstral analysis of the sound signal.

    Voice conversion system and training method therefor

    公开(公告)号:US11875775B2

    公开(公告)日:2024-01-16

    申请号:US17430793

    申请日:2021-04-20

    CPC classification number: G10L15/063 G10L15/16 G10L25/24

    Abstract: The present disclosure proposes a speech conversion scheme for non-parallel corpus training, to get rid of dependence on parallel text and resolve a technical problem that it is difficult to achieve speech conversion under conditions that resources and equipment are limited. A voice conversion system and a training method therefor are included. Compared with the prior art, according to the embodiments of the present disclosure: a trained speaker-independent automatic speech recognition model can be used for any source speaker, that is, the speaker is independent; and bottleneck features of audio are more abstract as compared with phonetic posteriorGram features, can reflect decoupling of spoken content and timbre of the speaker, and meanwhile are not closely bound with a phoneme class, and are not in a clear one-to-one correspondence relationship. In this way, a problem of inaccurate pronunciation caused by a recognition error in ASR is relieved to some extent. Pronunciation accuracy of audio obtained by performing voice conversion by the bottleneck feature is obviously higher than that of a phonetic posteriorGram based method, and timbre is not significantly different. By means of a transfer learning mode, dependence on training corpus can be greatly reduced.

Patent Agency Ranking