A SYSTEM AND A METHOD FOR REPRESENTING UNRECOGNIZED WORDS IN SPEECH TO TEXT CONVERSIONS AS SYLLABLES
    1.
    发明申请
    A SYSTEM AND A METHOD FOR REPRESENTING UNRECOGNIZED WORDS IN SPEECH TO TEXT CONVERSIONS AS SYLLABLES 审中-公开
    一种系统和方法,用于将语音转换为语法转化为未知词

    公开(公告)号:WO2006070373A2

    公开(公告)日:2006-07-06

    申请号:PCT/IL2005/001401

    申请日:2005-12-29

    Inventor: SHPIGEL, Avraham

    CPC classification number: G10L15/26 G10L2015/027

    Abstract: The present invention is a novel system and method for overcoming the shortcomings of existing speech-to-text systems which relates to the processing of unrecognized words. On encountering words which are not decipherable by it the preferred embodiment of the present invention analyzes the syllables which make up these words and translates them into the appropriate phonetic representations. The method described by the present invention ensures that words which were not uttered clearly would not be lost or distorted in the process of transcribing the text. Additionally, it allows using smaller and simpler speech-to-text applications, which are suitable for mobile devices with limited storage and processing resources, since these applications may use smaller dictionaries and may be designed only to identify commonly used words. Also disclosed are several examples for possible implementations of the described system and method.

    Abstract translation: 本发明是一种用于克服现有语音对文本系统的缺点的新颖的系统和方法,涉及无法识别的单词的处理。 在遇到不能解读的单词时,本发明的优选实施例分析构成这些单词并将其转换成适当的语音表示的音节。 本发明描述的方法确保在转录文本的过程中不会明确地说出的词语不会丢失或扭曲。 另外,它允许使用更小和更简单的语音到文本应用程序,这些应用程序适用于具有有限存储和处理资源的移动设备,因为这些应用程序可能使用较小的字典,并且可能仅被设计为识别常用的单词。 还公开了所描述的系统和方法的可能实现的几个示例。

    SOUND SAMPLE VERIFICATION FOR GENERATING SOUND DETECTION MODEL
    2.
    发明申请
    SOUND SAMPLE VERIFICATION FOR GENERATING SOUND DETECTION MODEL 审中-公开
    用于生成声音检测模型的声音样本验证

    公开(公告)号:WO2016064556A1

    公开(公告)日:2016-04-28

    申请号:PCT/US2015/053665

    申请日:2015-10-02

    Abstract: A method for verifying at least one sound sample to be used in generating a sound detection model in an electronic device includes receiving a first sound sample; extracting a first acoustic feature from the first sound sample; receiving a second sound sample; extracting a second acoustic feature from the second sound sample; and determining whether the second acoustic feature is similar to the first acoustic feature.

    Abstract translation: 一种用于验证用于在电子设备中生成声音检测模型的至少一个声音样本的方法包括:接收第一声音样本; 从所述第一声音样本中提取第一声学特征; 接收第二个声音样本; 从所述第二声音样本中提取第二声学特征; 以及确定所述第二声学特征是否类似于所述第一声学特征。

    音声認識方法、音声認識プログラムおよび音声認識装置
    3.
    发明申请
    音声認識方法、音声認識プログラムおよび音声認識装置 审中-公开
    语音识别方法,语音识别程序和语音识别装置

    公开(公告)号:WO2005015544A1

    公开(公告)日:2005-02-17

    申请号:PCT/JP2003/010083

    申请日:2003-08-07

    CPC classification number: G10L15/08 G10L2015/027

    Abstract:  曖昧な言葉や略語など、利用者がどんな発話をするか完全には予測できないすべての音声認識サービスにおいて、入力音声と認識語のよみとを照合して入力音声に対応する認識語を出力するにあたり、入力音声について作成された音韻類似度系列から入力音声を構成する音節の候補を抽出し、抽出された少なくとも一つの候補から構成される照合データを作成し、作成された照合データと認識語のよみとを照合して認識語のスコアを計算し、計算されたスコアにもとづいて入力音声に対応する認識語を出力する。これによって、利用者が認識語を正確に知らず略語で発話した場合、略語が認識語のよみとして登録されていなくとも、認識結果として認識語を提示する。

    Abstract translation: 在所有语音识别服务中,不可能完全预测用户如何发音包括模糊词和收缩词的语音识别服务,当输入语音与识别词的读取匹配以输出与输入语音对应的识别词时 从为输入语音创建的音素相似度序列中提取构成输入语音的音节的候选者,创建由至少一个提取的候选者组成的匹配数据,并将所创建的匹配数据与所识别的词的读取相匹配 计算识别词的分数。 根据计算出的分数,输出与输入语音对应的识别字。 因此,当用户不能正确地识别所识别的单词并且将其作为收缩词发音时,即使合同字未被注册为所识别的单词的读取,则将所识别的单词呈现为识别结果。

    音声処理装置および方法、記録媒体並びにプログラム
    4.
    发明申请
    音声処理装置および方法、記録媒体並びにプログラム 审中-公开
    语音处理设备和方法,记录媒体和程序

    公开(公告)号:WO2004047075A1

    公开(公告)日:2004-06-03

    申请号:PCT/JP2003/014342

    申请日:2003-11-12

    Inventor: 小川 浩明

    Abstract: 本発明は、削除誤りを少なくし、音声認識率を向上させることができるようにする音声処理装置および方法、記録媒体並びにプログラムに関する。図16Cに示されるように、単語「は」と単語「 」の境界に対応する部分において、音節「ハ」を含まないパス91、並びに、音節「ハ」を含むパス92およびパス93が生成され、単語「 」と単語「です」の境界に対応する部分において、音節「ワ」を含まないパス101、並びに、音節「ワ」を含むパス102およびパス103が生成され、単語と音節とのネットワークが生成される。これにより、入力音声に対して最適なネットワーク上のサブワード系列を選択することができる。本発明は、音声認識装置に適用することができる。

    Abstract translation: 语音处理装置和方法,记录介质和用于减少删除错误并增加语音识别率的程序。 如图所示。 如图16C所示,以这样的方式构建单词和音节网络,使得不包括音节“ha”的路径(91),包括音节“ha”的路径(92)和路径(93)在 对应于词“wa”和单词“<00V”之间的边界的部分,以及不包括音节“wa”的路径(101),包括音节“wa”的路径(102)和路径 103)在与“<00V>”和单词“desu”之间的边界对应的部分中生成。 这允许为接收到的语音选择网络上的最佳子字序列。 本发明可应用于语音识别装置。

    METHOD AND DEVICE FOR RECOGNIZING SPEECH
    5.
    发明申请
    METHOD AND DEVICE FOR RECOGNIZING SPEECH 审中-公开
    用于识别语音的方法和设备

    公开(公告)号:WO2014076827A1

    公开(公告)日:2014-05-22

    申请号:PCT/JP2012/079880

    申请日:2012-11-13

    Inventor: ANDO, Yoichi

    Abstract: A speech is recognized using ACF factors extracted from running autocorrelation functions calculated from the speech. The extracted ACF factors are a W φ(0) (width of ACF amplitude around zero-delay origin), a W φ(0)max (maximum value of the W φ(0) ), a τ 1 (pitch period), a φ1 (pitch strength), and a Δφ 1 /Δt (rate of the pitch strength change). Syllables in the speech are identified by comparing the ACF factors with templates stored in a database.

    Abstract translation: 使用从语音计算的运行自相关函数中提取的ACF因子识别语音。 提取的ACF因子是W(ph)(0)(零延迟起源的ACF幅度的宽度),W&phgr(0)max(W&phgr;(0)的最大值)),τ1(音调周期) &phgr。1(俯仰强度),和&Dgr&amp;&Grgr; 1 /&Dgr; t(俯仰强度变化率)。 语音中的音节通过将ACF因子与存储在数据库中的模板进行比较来识别。

    METHOD AND APPARATUS FOR PRODUCING SCRIPT DATA
    6.
    发明申请
    METHOD AND APPARATUS FOR PRODUCING SCRIPT DATA 审中-公开
    用于产生脚本数据的方法和设备

    公开(公告)号:WO2012064110A3

    公开(公告)日:2012-07-12

    申请号:PCT/KR2011008522

    申请日:2011-11-09

    Abstract: The present invention relates to a method and apparatus for producing script data with respect to audio data. The method for producing the script data includes: obtaining the whole time information of an actual sound section of the audio data; obtaining the whole syllable number information with respect to a sound section on the basis of text data; calculating unit syllable time information corresponding to one syllable on the basis of the whole time information and the whole syllable number information; obtaining prediction playback position information with respect to a corresponding sound section of the audio data on the basis of a sound section occupied by a word or paragraph for which prediction is required in the text data and the unit syllable time information; and recording a mute section, which is the closest to a prediction playback position, of mute sections of the audio data located before or after the prediction playback position as actual playback position information.

    Abstract translation: 用于产生关于音频数据的脚本数据的方法和设备。 用于产生脚本数据的方法包括:获得音频数据的实际声音部分的全部时间信息; 根据文本数据获得关于声音部分的整个音节数信息; 根据整个时间信息和整个音节号信息计算对应于一个音节的单位音节时间信息; 根据在文本数据和单位音节时间信息中需要预测的单词或段落占据的声音部分,获得关于音频数据的对应声音部分的预测重放位置信息; 并且将位于预测回放位置之前或之后的音频数据的静音部分的静音部分记录为实际回放位置信息,该静音部分最接近预测回放位置。

    SYSTEM AND METHOD FOR SPEECH RECOGNITION AND TRANSCRIPTION
    7.
    发明申请
    SYSTEM AND METHOD FOR SPEECH RECOGNITION AND TRANSCRIPTION 审中-公开
    用于语音识别和转录的系统和方法

    公开(公告)号:WO2005006307A1

    公开(公告)日:2005-01-20

    申请号:PCT/US2004/000624

    申请日:2004-01-09

    Inventor: KY, Joshua, D.

    Abstract: The present invention , a method for speech recognition, comprises receiving a digital representation of speech, grouping the digital representation of speech into subsets, mapping each subset of the digital representation of speech into a character representation of speech (38) , grouping the character representations of speech into words, determining the number of syllables in the digital representation of each word, and searching a library (44) containing words arranged according to the number of syllables and finding at least one closest match to each word.

    Abstract translation: 本发明是一种用于语音识别的方法,包括接收语音的数字表示,将语音的数字表示分组成子集,将语音的数字表示的每个子集映射成语音的字符表示(38),将字符表示 确定每个单词的数字表示中的音节数,以及搜索包含根据音节数排列的单词的库(44),并找到与每个单词至少一个最接近的匹配。

    PORTABLE DIGITAL MOBILE COMMUNICATION APPARATUS, METHOD FOR CONTROLLING SPEECH AND SYSTEM

    公开(公告)号:WO2004036939A1

    公开(公告)日:2004-04-29

    申请号:PCT/CN2003/000870

    申请日:2003-10-17

    CPC classification number: H04M1/271 G10L15/26 G10L2015/027

    Abstract: The present invention discloses a portable digital mobile communication apparatus with voice operation system and controlling method of voice operation. The feature vector sequences of speech are quantify encoded when the speech is recognized, and in decoding operation, each code in efficiency speech character codes are directly looked up observation probability of on search path from the probability schedule in the decode operation. In association with the present invention, full syllabic speech recognition can be achieved in mobile telephone without the need of training, and input Chinese characters by speech and speech prompting with full syllable. This system comprises semantic analysis, dialogue management and language generation module, and it can also process complicated dialog procedure and feed flexible prompting message back to the user. The present invention can also customize speech command and prompting content by user.

    SYSTEMS AND METHODS FOR PREDICTING PRONUNCIATIONS WITH WORD STRESS
    9.
    发明申请
    SYSTEMS AND METHODS FOR PREDICTING PRONUNCIATIONS WITH WORD STRESS 审中-公开
    用词应力预测语音的系统和方法

    公开(公告)号:WO2017213696A1

    公开(公告)日:2017-12-14

    申请号:PCT/US2016/065759

    申请日:2016-12-09

    Applicant: GOOGLE LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating word pronunciations. One of the methods includes determining, by one or more computers, spelling data that indicates the spelling of a word, providing the spelling data as input to a trained recurrent neural network, the trained recurrent neural network being trained to indicate characteristics of word pronunciations based at least on data indicating the spelling of words, receiving output indicating a stress pattern for pronunciation of the word generated by the trained recurrent neural network in response to providing the spelling data as input, using the output of the trained recurrent neural network to generate pronunciation data indicating the stress pattern for a pronunciation of the word, and providing, by the one or more computers, the pronunciation data to a text-to-speech system or an automatic speech recognition system.

    Abstract translation: 包括在计算机存储介质上编码的计算机程序的方法,系统和装置,用于生成单词发音。 其中一种方法包括由一个或多个计算机确定指示单词拼写的拼写数据,将拼写数据作为输入提供给训练的递归神经网络,训练后的递归神经网络被训练以指示基于单词发音的特征 至少在指示单词拼写的数据上,接收指示由训练的回归神经网络响应于提供拼写数据作为输入而生成的单词的发音的压力模式的输出,使用训练的回归神经网络的输出来生成发音 指示单词发音的压力模式的数据,以及通过一个或多个计算机将发音数据提供给文本到语音系统或自动语音识别系统。

    SYSTEM AND METHOD FOR AUTOMATED SPEECH RECOGNITION
    10.
    发明申请
    SYSTEM AND METHOD FOR AUTOMATED SPEECH RECOGNITION 审中-公开
    用于自动语音识别的系统和方法

    公开(公告)号:WO2015057661A1

    公开(公告)日:2015-04-23

    申请号:PCT/US2014/060420

    申请日:2014-10-14

    Abstract: A method of recognizing speech. The method includes the steps of obtaining an auditory signal; processing the auditory signal into a plurality of frequency components; processing the plurality of frequency components using a plurality of feature detectors, each feature detector producing a feature detector response; generating a spike for each instance in which a feature detector response identifies a characteristic auditory feature to produce a test spike pattern for the auditory signal; and comparing the test spike pattern to a plurality of predetermined spike patterns corresponding to a plurality of speech elements to determine whether the auditory signal includes one of the plurality of speech elements.

    Abstract translation: 识别语音的方法。 该方法包括获得听觉信号的步骤; 将所述听觉信号处理成多个频率分量; 使用多个特征检测器处理多个频率分量,每个特征检测器产生特征检测器响应; 为每个实例产生尖峰,其中特征检测器响应识别特征听觉特征以产生用于听觉信号的测试尖峰图案; 以及将所述测试尖峰模式与对应于多个语音元素的多个预定尖峰模式进行比较,以确定所述听觉信号是否包括所述多个语音元素中的一个。

Patent Agency Ranking