METHOD OF PROCESSING AUDIO DATA, ELECTRONIC DEVICE AND STORAGE MEDIUM

    公开(公告)号:US20230087531A1

    公开(公告)日:2023-03-23

    申请号:US18071187

    申请日:2022-11-29

    Abstract: A method of processing audio data, an electronic device, and a storage medium, which relates to a field of artificial intelligence, in particular to a field of speech processing technology. The method includes: processing spectral data of the audio data to obtain a first feature information; obtaining a fundamental frequency indication information according to the first feature information, wherein the fundamental frequency indication information indicates valid audio data of the first feature information and invalid audio data of the first feature information; obtaining a fundamental frequency information and a spectral energy information according to the first feature information and the fundamental frequency indication information; and obtaining a harmonic structure information of the audio data according to the fundamental frequency information and the spectral energy information.

    SPEECH RECOGNITION AND CODEC METHOD AND APPARATUS, ELECTRONIC DEVICE AND STORAGE MEDIUM

    公开(公告)号:US20230090590A1

    公开(公告)日:2023-03-23

    申请号:US17738651

    申请日:2022-05-06

    Abstract: The present disclosure provides speech recognition and codec methods and apparatuses, an electronic device and a storage medium, and relates to the field of artificial intelligence such as intelligent speech, deep learning and natural language processing. The speech recognition method may include: acquiring an audio feature of to-be-recognized speech; encoding the audio feature to obtain an encoding feature; truncating the encoding feature to obtain continuous N feature fragments, N being a positive integer greater than one; and acquiring, for any one of the feature segments, corresponding historical feature abstraction information, encoding the feature segment in combination with the historical feature abstraction information, and decoding an encoding result to obtain a recognition result corresponding to the feature segment, wherein the historical feature abstraction information is information obtained by feature abstraction of recognized historical feature fragments.

    SPEECH SYNTHESIS METHOD, AND ELECTRONIC DEVICE

    公开(公告)号:US20230005466A1

    公开(公告)日:2023-01-05

    申请号:US17820339

    申请日:2022-08-17

    Abstract: The disclosure provides a speech synthesis method, and an electronic device. The technical solution is described as follows. A text to be synthesized and speech features of a target user are obtained. Predicted first acoustic features based on the text to be synthesized and the speech features are obtained. A target template audio is obtained from a template audio library based on the text to be synthesized. Second acoustic features of the target template audio are extracted. Target acoustic features are generated by splicing the first acoustic features and the second acoustic features. Speech synthesis is performed on the text to be synthesized based on the target acoustic features and the speech features, to generate a target speech of the text to be synthesized.

    SPEECH RECOGNITION
    10.
    发明申请

    公开(公告)号:US20250078839A1

    公开(公告)日:2025-03-06

    申请号:US18819018

    申请日:2024-08-29

    Abstract: A speech recognition method and a method for training a deep learning model are provided. The speech recognition method includes: obtaining a first speech feature of a speech to-be-recognized, which includes a plurality of speech segment features corresponding to a plurality of speech segments; decoding the first speech feature using a first decoder to obtain a plurality of first decoding results corresponding to a plurality of the words, indicating a first recognition result of words; extracting a second speech feature from the first speech feature based on first a priori information, which includes the plurality of first decoding results, and the second speech feature includes first word-level audio features corresponding to the plurality of words; and decoding the second speech feature using a second decoder to obtain a plurality of second decoding results corresponding to the plurality of words, indicating a second recognition result of the word.

Patent Agency Ranking