음성 인식을 위한 전자 장치 및 그 제어 방법

    公开(公告)号:WO2023068552A1

    公开(公告)日:2023-04-27

    申请号:PCT/KR2022/013533

    申请日:2022-09-08

    Abstract: 본 전자 장치는 음성 인식 모델 및 음성 인식 모델을 통해 획득한 제1 사용자 음성에 대응되는 제1 인식 정보를 저장하는 메모리, 상기 음성 인식 모델은 제1 네트워크, 제2 네트워크 및 제3 네트워크를 포함하고, 및 제2 사용자 음성을 에 대응되는 음성 데이터를 상기 제1 네트워크에 입력하여 제1 벡터를 획득하고, 제1 인식 정보를 제1 가중치 정보에 기초하여 벡터를 생성하는 상기 제2 네트워크에 입력하여 제2 벡터를 획득하고, 제1 벡터 및 제2 벡터를 제2 가중치 정보에 기초하여 인식 정보를 생성하는 제3 네트워크에 입력하여 제2 사용자 음성에 대응되는 제2 인식 정보를 획득하는 프로세서를 포함하고, 제2 가중치 정보 중 적어도 일부는 제1 가중치 정보와 동일하다.

    COMBINING DTW AND HMM IN SPEAKER DEPENDENT AND INDEPENDENT MODES FOR SPEECH RECOGNITION
    2.
    发明申请
    COMBINING DTW AND HMM IN SPEAKER DEPENDENT AND INDEPENDENT MODES FOR SPEECH RECOGNITION 审中-公开
    组合DTW和HMM的演讲者依赖和独立的语音识别模式

    公开(公告)号:WO0221513A8

    公开(公告)日:2002-06-20

    申请号:PCT/US0127625

    申请日:2001-09-05

    Applicant: QUALCOMM INC

    CPC classification number: G10L15/32 G10L15/12 G10L15/142

    Abstract: A method and system that combines voice recognition engines (104, 108, 112, 114) and resolves differences between the results of individual voice recognition engines (104, 106, 108, 112, 114) using a mapping function. Speaker independent voice recognition engine (104) and speaker-dependent voice recognition engine (106) are combined. Hidden Markov Model (HMM) engines (108, 114) and Dynamic Time Warping (DTW) engines (104, 106, 112) are combined.

    Abstract translation: 一种组合语音识别引擎(104,108,112,114)的方法和系统,并且使用映射函数来解决各个语音识别引擎(104,106,108,112,114)的结果之间的差异。 扬声器独立语音识别引擎(104)和与扬声器相关的语音识别引擎(106)组合。 组合了隐马尔可夫模型(HMM)引擎(108,114)和动态时间扭曲(DTW)引擎(104,106,112)。

    NOISE PADDING AND NORMALIZATION IN DYNAMIC TIME WARPING
    3.
    发明申请
    NOISE PADDING AND NORMALIZATION IN DYNAMIC TIME WARPING 审中-公开
    动态时间内的噪声消除和正规化

    公开(公告)号:WO00041167A1

    公开(公告)日:2000-07-13

    申请号:PCT/IL2000/000007

    申请日:2000-01-03

    CPC classification number: G10L15/20 G10L15/12 G10L21/0216

    Abstract: Speech recognition uses a wide token builder (66), gain and noise adapter (70) and noise adapted Dynamic Time Warping (60). Wide token builder produces a padded test token expanded with at least one blank frame before and after the input test utterance. Gain and noise adapter adapts each padded reference template with noise and gain qualities producing adapted reference templates having noise frames wherever a blank frame was originally placed and noise adapted speech where speech exists. Dynamic Time Warping (DTW) is performed on the noise adapted templates.

    Abstract translation: 语音识别使用宽令牌构建器(66),增益和噪声适配器(70)和噪声适应的动态时间扭曲(60)。 宽标记构建器生成填充的测试令牌,在输入测试语音之前和之后至少展开一个空白框。 增益和噪声适配器使每个填充的参考模板适应噪声和增益质量,产生具有噪声帧的适应参考模板,无论空白帧最初放置在哪里,以及噪声适应的语音存在于语音中。 动态时间扭曲(DTW)是对噪声适应模板进行的。

    RADIOTELEPHONE VOICE CONTROL DEVICE, IN PARTICULAR FOR USE IN A MOTOR VEHICLE
    4.
    发明申请
    RADIOTELEPHONE VOICE CONTROL DEVICE, IN PARTICULAR FOR USE IN A MOTOR VEHICLE 审中-公开
    无线电话语音控制装置,特别用于电动车辆

    公开(公告)号:WO98045997A1

    公开(公告)日:1998-10-15

    申请号:PCT/FR1998/000687

    申请日:1998-04-03

    Abstract: The invention concerns a device comprising: a memory containing a series of numbers and vocal prints; an acoustic transducer, for picking up a correspondent's name spoken by the user; voice recognition means, for analysing the recorded correspondent's name and transforming it into a voice print; means for selectively addressing the memory, comprising associative means, for finding in the memory a voice print information corresponding to the one supplied by the voice recognition means and, if they match, for addressing the memory on the corresponding position; and means, co-operating with the associative means, for applying to the radiotelephone circuits the addressed directory number. The voice recognition means evaluate and memorise a current sound level picked up by the transducer in the absence of a word signal; in the presence of a word signal, they subtract from the picked up signal the previously evaluated current sound level and apply on the resulting signal a DTW voice recognition algorithm with form recognition by dynamic programming adapted to the word using functions for extracting dynamic parameters, in particular a dynamic predictive algorithm with forward and/or backward and/or frequency masking.

    Abstract translation: 本发明涉及一种装置,包括:存储器,其包含一系列数字和声带; 声学换能器,用于拾取用户说出的记者的姓名; 语音识别装置,用于分析记录的记者的名字并将其转换成语音打印; 用于选择性地寻址存储器的装置,包括关联装置,用于在存储器中发现对应于由语音识别装置提供的语音打印信息,并且如果它们相匹配,则用于寻址相应位置上的存储器; 以及与关联手段合作的方式,用于向无线电话电路应用所寻址的目录号码。 语音识别装置在没有字信号的情况下评估和记忆由换能器拾取的当前声音电平; 在存在字信号的情况下,它们从拾取信号中减去先前评估的当前声级,并将结果信号应用于具有表示识别的DTW语音识别算法,该算法具有适用于使用用于提取动态参数的功能的该动词的动态规划, 特别是具有前向和/或后向和/或频率掩蔽的动态预测算法。

    SEQUENCE MODELING USING IMPUTATION
    5.
    发明申请

    公开(公告)号:WO2021159103A1

    公开(公告)日:2021-08-12

    申请号:PCT/US2021/017131

    申请日:2021-02-08

    Applicant: GOOGLE LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for sequence modeling. One of the methods includes receiving an input sequence having a plurality of input positions; determining a plurality of blocks of consecutive input positions; processing the input sequence using a neural network to generate a latent alignment, comprising, at each of a plurality of input time steps: receiving a partial latent alignment from a previous input time step; selecting an input position in each block, wherein the token at the selected input position of the partial latent alignment in each block is a mask token; and processing the partial latent alignment and the input sequence using the neural network to generate a new latent alignment, wherein the new latent alignment comprises, at the selected input position in each block, an output token or a blank token; and generating, using the latent alignment, an output sequence.

    DIAGNOSTIC TECHNIQUES BASED ON SPEECH-SAMPLE ALIGNMENT

    公开(公告)号:WO2020183256A1

    公开(公告)日:2020-09-17

    申请号:PCT/IB2020/051016

    申请日:2020-02-10

    Inventor: SHALLOM, Ilan D.

    Abstract: Reference-sample feature vectors that quantify acoustic features of different respective portions of at least one reference speech sample (44), which was produced by a subject (22) at a first time while a physiological state of the subject was known, are obtained. At least one test speech sample (56) that was produced by the subject at a second time, while the physiological state of the subject was unknown, is received. Test-sample feature vectors (60) that quantify the acoustic features of different respective portions (58) of the test speech sample are computed. The test-sample feature vectors are mapped to respective ones of the reference-sample feature vectors, under predefined constraints, such that a total distance between the test-sample feature vectors and the respective ones of the reference-sample feature vectors is minimized. In response to the mapping, an output indicating the physiological state of the subject at the second time is generated.

    SYSTEM AND METHOD FOR EFFICIENT STORAGE OF VOICE RECOGNITION MODELS
    8.
    发明申请
    SYSTEM AND METHOD FOR EFFICIENT STORAGE OF VOICE RECOGNITION MODELS 审中-公开
    有效存储语音识别模型的系统和方法

    公开(公告)号:WO02059871A3

    公开(公告)日:2003-03-13

    申请号:PCT/US0200890

    申请日:2002-01-10

    Applicant: QUALCOMM INC

    CPC classification number: G10L15/06

    Abstract: A method and system that improves voice recognition by improving storage of voice recognition (VR) templates. The improved storage means that more VR models can be stored in memory. The more VR models that are stored in memory, the more robust the VR system and therefore the more accurate the VR system. Lossy compression techniques are used to compress VR models. In one embodiment, A-law compression and A-law expansion are used to compress and expand VR models. In another embodiment, Mu-law compression and Mu-law expansion are used to compress and expand VR models. VR models are compressed during a training process and they are expanded during voice recognition.

    Abstract translation: 通过改进语音识别(VR)模板的存储来改善语音识别的方法和系统。 改进的存储意味着更多的VR型号可以存储在存储器中。 存储在存储器中的VR模型越多,VR系统越强大,因此VR系统越准确。 有损压缩技术用于压缩VR模型。 在一个实施例中,使用A律压缩和A律扩展来压缩和扩展VR模型。 在另一个实施例中,Mu法压缩和Mu法扩展用于压缩和扩展VR模型。 VR模型在训练过程中被压缩,并且在语音识别期间被扩展。

    NOISE PADDING AND NORMALIZATIONIN DYNAMIC TIME WARPING
    9.
    发明申请
    NOISE PADDING AND NORMALIZATIONIN DYNAMIC TIME WARPING 审中-公开
    噪音包装和正常化动态时间加热

    公开(公告)号:WO0041167B1

    公开(公告)日:2000-10-19

    申请号:PCT/IL0000007

    申请日:2000-01-03

    Inventor: ERELL ADORAM

    CPC classification number: G10L15/20 G10L15/12 G10L21/0216

    Abstract: Speech recognition uses a wide token builder (66), gain and noise adapter (70) and noise adapted Dynamic Time Warping (60). Wide token builder produces a padded test token expanded with at least one blank frame before and after the input test utterance. Gain and noise adapter adapts each padded reference template with noise and gain qualities producing adapted reference templates having noise frames wherever a blank frame was originally placed and noise adapted speech where speech exists. Dynamic Time Warping (DTW) is performed on the noise adapted templates.

    Abstract translation: 语音识别使用宽令牌构建器(66),增益和噪声适配器(70)和噪声适应的动态时间扭曲(60)。 宽标记构建器生成填充的测试令牌,在输入测试语音之前和之后至少展开一个空白框。 增益和噪声适配器使每个填充的参考模板适应噪声和增益质量,产生具有噪声帧的适应参考模板,无论空白帧最初放置在哪里,以及噪声适应的语音存在于语音中。 动态时间扭曲(DTW)是对噪声适应模板进行的。

    NOISE PADDING AND NORMALIZATIONIN DYNAMIC TIME WARPING
    10.
    发明申请
    NOISE PADDING AND NORMALIZATIONIN DYNAMIC TIME WARPING 审中-公开
    噪声填充与动态时间翘曲的归一化

    公开(公告)号:WO0041167A8

    公开(公告)日:2000-08-31

    申请号:PCT/IL0000007

    申请日:2000-01-03

    Inventor: ERELL ADORAM

    CPC classification number: G10L15/20 G10L15/12 G10L21/0216

    Abstract: Speech recognition uses a wide token builder (66), gain and noise adapter (70) and noise adapted Dynamic Time Warping (60). Wide token builder produces a padded test token expanded with at least one blank frame before and after the input test utterance. Gain and noise adapter adapts each padded reference template with noise and gain qualities producing adapted reference templates having noise frames wherever a blank frame was originally placed and noise adapted speech where speech exists. Dynamic Time Warping (DTW) is performed on the noise adapted templates.

    Abstract translation: 语音识别使用广泛的标记生成器(66),增益和噪声适配器(70)以及适应噪声的动态时间规整(60)。 宽标记生成器生成一个填充测试标记,在输入测试话语之前和之后至少扩展一个空白帧。 增益和噪声适配器使每个填充参考模板都具有噪声和增益质量,从而在原始放置空白帧的任何位置产生具有噪声帧的适应参考模板,以及存在语音的噪声适应语音。 动态时间规整(DTW)是在适应噪声的模板上执行的。

Patent Agency Ranking