METHOD FOR SEMANTIC RECOGNITION, ELECTRONIC DEVICE, AND STORAGE MEDIUM

    公开(公告)号:US20220028376A1

    公开(公告)日:2022-01-27

    申请号:US17450714

    申请日:2021-10-13

    Abstract: The disclosure discloses a method for semantic recognition, an electronic device, and a storage medium. The detailed solution includes: obtaining a speech recognition result of a speech to be processed, in which the speech recognition result includes a newly added recognition result fragment and a historical recognition result fragment; obtaining a semantic vector of each historical object in the historical recognition result fragment, and obtaining a semantic vector of each newly added object by inputting the semantic vector of each historical object and each newly added object in the newly added recognition result fragment into a streaming semantic coding layer; and obtaining a semantic recognition result of the speech by inputting the semantic vector of each historical object and the semantic vector of each newly added object into a streaming semantic vector fusion layer and a semantic understanding multi-task layer sequentially arranged.

    METHOD FOR TRAINING SPEECH RECOGNITION MODEL, DEVICE AND STORAGE MEDIUM

    公开(公告)号:US20220310064A1

    公开(公告)日:2022-09-29

    申请号:US17571805

    申请日:2022-01-10

    Abstract: A method for training a speech recognition model, a device and a storage medium, which relate to the field of computer technologies, and particularly to the fields of speech recognition technologies, deep learning technologies, or the like, are disclosed. The method for training a speech recognition model includes: obtaining a fusion probability of each of at least one candidate text corresponding to a speech based on an acoustic decoding model and a language model; selecting a preset number of one or more candidate texts based on the fusion probability of each of the at least one candidate text, and determining a predicted text based on the preset number of one or more candidate texts; and obtaining a loss function based on the predicted text and a standard text corresponding to the speech, and training the speech recognition model based on the loss function.

    SPEECH RECOGNITION
    4.
    发明申请

    公开(公告)号:US20250078839A1

    公开(公告)日:2025-03-06

    申请号:US18819018

    申请日:2024-08-29

    Abstract: A speech recognition method and a method for training a deep learning model are provided. The speech recognition method includes: obtaining a first speech feature of a speech to-be-recognized, which includes a plurality of speech segment features corresponding to a plurality of speech segments; decoding the first speech feature using a first decoder to obtain a plurality of first decoding results corresponding to a plurality of the words, indicating a first recognition result of words; extracting a second speech feature from the first speech feature based on first a priori information, which includes the plurality of first decoding results, and the second speech feature includes first word-level audio features corresponding to the plurality of words; and decoding the second speech feature using a second decoder to obtain a plurality of second decoding results corresponding to the plurality of words, indicating a second recognition result of the word.

    AUDIO RECOGNITION METHOD, METHOD OF TRAINING AUDIO RECOGNITION MODEL, AND ELECTRONIC DEVICE

    公开(公告)号:US20230410794A1

    公开(公告)日:2023-12-21

    申请号:US18237976

    申请日:2023-08-25

    CPC classification number: G10L15/063 G10L15/26 G10L15/02

    Abstract: An audio recognition method, a method of training an audio recognition model, and an electronic device are provided, which relate to fields of artificial intelligence, speech recognition, deep learning and natural language processing technologies. The audio recognition method includes: truncating an audio feature of target audio data to obtain at least one first audio sequence feature corresponding to a predetermined duration; obtaining, according to a peak information of the audio feature, a peak sub-information corresponding to the first audio sequence feature; performing at least one decoding operation on the first audio sequence feature to obtain a recognition result for the first audio sequence feature, a number of times the decoding operation is performed being identical to a number of peaks corresponding to the first audio sequence feature; obtaining target text data for the target audio data according to the recognition result for the at least one first audio sequence feature.

Patent Agency Ranking