Code-switching speech recognition with end-to-end connectionist temporal classification model

    公开(公告)号:US10964309B2

    公开(公告)日:2021-03-30

    申请号:US16410556

    申请日:2019-05-13

    Abstract: A CS CTC model may be initialed from a major language CTC model by keeping network hidden weights and replacing output tokens with a union of major and secondary language output tokens. The initialized model may be trained by updating parameters with training data from both languages, and a LID model may also be trained with the data. During a decoding process for each of a series of audio frames, if silence dominates a current frame then a silence output token may be emitted. If silence does not dominate the frame, then a major language output token posterior vector from the CS CTC model may be multiplied with the LID major language probability to create a probability vector from the major language. A similar step is performed for the secondary language, and the system may emit an output token associated with the highest probability across all tokens from both languages.

Patent Agency Ranking