Voice search device, voice search method, and non-transitory recording medium
    2.
    发明授权
    Voice search device, voice search method, and non-transitory recording medium 有权
    语音搜索装置,语音搜索方法和非暂时记录媒体

    公开(公告)号:US09431007B2

    公开(公告)日:2016-08-30

    申请号:US14597958

    申请日:2015-01-15

    发明人: Hiroki Tomita

    摘要: In a voice search device, a processor acquires a search word, converts the search word into a phoneme sequence, acquires, for each frame, an output probability of a feature quantity of a target voice signal being output from each phoneme included in the phoneme sequence, and executes relative calculation of the output probability acquired from each phoneme, based on an output probability acquired from another phoneme included in the phoneme sequence. In addition, the processor successively designates likelihood acquisition zones, acquires a likelihood indicating how likely a designated likelihood acquisition zone is a zone in which voice corresponding to the search word is spoken, and identifies from the target voice signal an estimated zone for which the voice corresponding to the search word is estimated to be spoken, based on the acquired likelihood.

    摘要翻译: 在语音搜索装置中,处理器获取搜索词,将搜索词转换成音素序列,为每个帧获取从包含在音素序列中的每个音素输出的目标语音信号的特征量的输出概率 并且基于从包括在音素序列中的另一音素获取的输出概率,执行从每个音素获取的输出概率的相对计算。 此外,处理器连续地指定可能性获取区域,获取表示指定的可能性获取区域是与哪个语音相对应的语音的区域的可能性,并且从目标语音信号中识别语音的估计区域 基于获得的可能性,估计对应于搜索词的口令。

    System and Method for Performing Dual Mode Speech Recognition
    3.
    发明申请
    System and Method for Performing Dual Mode Speech Recognition 有权
    用于执行双模式语音识别的系统和方法

    公开(公告)号:US20160217788A1

    公开(公告)日:2016-07-28

    申请号:US15085944

    申请日:2016-03-30

    申请人: SoundHound, Inc.

    摘要: A system and method is presented for performing dual mode speech recognition, employing a local recognition module on a mobile device and a remote recognition engine on a server device. The system accepts a spoken query from a user, and both the local recognition module and the remote recognition engine perform speech recognition operations on the query, returning a transcription and confidence score, subject to a latency cutoff time. If both sources successfully transcribe the query, then the system accepts the result having the higher confidence score. If only one source succeeds, then that result is accepted. In either case, if the remote recognition engine does succeed in transcribing the query, then a client vocabulary is updated if the remote system result includes information not present in the client vocabulary.

    摘要翻译: 提出了一种用于执行双模式语音识别的系统和方法,在移动设备上使用本地识别模块和在服务器设备上使用远程识别引擎。 该系统接受来自用户的口语查询,并且本地识别模块和远程识别引擎都对查询执行语音识别操作,返回转录和置信度得分,并受到延迟截止时间的限制。 如果两个来源成功地转录查询,则系统接受具有较高置信度得分的结果。 如果只有一个源成功,则该结果被接受。 在任一情况下,如果远程识别引擎确实成功地转录查询,则如果远程系统结果包括客户端词汇中不存在的信息,则更新客户词汇。

    VOICE SEARCH DEVICE, VOICE SEARCH METHOD, AND NON-TRANSITORY RECORDING MEDIUM
    6.
    发明申请
    VOICE SEARCH DEVICE, VOICE SEARCH METHOD, AND NON-TRANSITORY RECORDING MEDIUM 有权
    语音搜索设备,语音搜索方法和非终端记录介质

    公开(公告)号:US20150255059A1

    公开(公告)日:2015-09-10

    申请号:US14604345

    申请日:2015-01-23

    发明人: Hiroyasu IDE

    IPC分类号: G10L15/08 G06F17/30 G10L15/02

    摘要: A search string acquiring unit acquires a search string. A converting unit converts the search string into a phoneme sequence. A time length deriving unit derives the spoken time length of the voice corresponding to the search string. A zone designating unit designates a likelihood acquisition zone in a target voice signal. A likelihood acquiring device acquires a likelihood indicating how likely the likelihood acquisition interval is an interval in which voice corresponding to the search string is spoken. A repeating unit changes the likelihood acquisition zone designated by the zone designating unit, and repeats the process of the zone designating unit and the likelihood acquiring device. An identifying unit identifies, from the target voice signal, estimated intervals for which the voice corresponding to the search string is estimated to be spoken, on the basis of the likelihoods acquired for each of the likelihood acquisition zones.

    摘要翻译: 搜索字符串获取单元获取搜索串。 转换单元将搜索字符串转换为音素序列。 时间长度导出单元导出与搜索字符串相对应的语音的语音时间长度。 区域指定单元指定目标语音信号中的可能性获取区域。 可能性获取装置获取表示可能性获取间隔是表示与搜索字符串对应的语音的间隔的可能性。 重复单元改变由区域指定单元指定的可能性获取区域,并重复区域指定单元和可能性获取设备的处理。 识别单元根据对于每个可能性获取区域获得的可能性,从目标语音信号中识别估计出与搜索串相对应的语音的估计间隔。

    Dynamic Language Model
    7.
    发明申请
    Dynamic Language Model 有权
    动态语言模型

    公开(公告)号:US20150254334A1

    公开(公告)日:2015-09-10

    申请号:US14719178

    申请日:2015-05-21

    申请人: Google Inc.

    摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for speech recognition. One of the methods includes receiving a base language model for speech recognition including a first word sequence having a base probability value; receiving a voice search query associated with a query context; determining that a customized language model is to be used when the query context satisfies one or more criteria associated with the customized language model; obtaining the customized language model, the customized language model including the first word sequence having an adjusted probability value being the base probability value adjusted according to the query context; and converting the voice search query to a text search query based on one or more probabilities, each of the probabilities corresponding to a word sequence in a group of one or more word sequences, the group including the first word sequence having the adjusted probability value.

    摘要翻译: 方法,系统和装置,包括在计算机存储介质上编码的用于语音识别的计算机程序。 方法之一包括接收用于语音识别的基本语言模型,其包括具有基本概率值的第一字序列; 接收与查询语境相关联的语音搜索查询; 当查询上下文满足与定制语言模型相关联的一个或多个标准时,确定要使用定制语言模型; 获得定制语言模型,包括具有调整概率值的第一单词序列的定制语言模型是根据查询语境调整的基本概率值; 以及基于一个或多个概率将所述语音搜索查询转换为文本搜索查询,所述概率中的每一个对应于一个或多个单词序列的组中的单词序列,所述组包括具有调整后的概率值的第一单词序列。

    SYSTEM AND METHOD FOR COMBINING GEOGRAPHIC METADATA IN AUTOMATIC SPEECH RECOGNITION LANGUAGE AND ACOUSTIC MODELS
    8.
    发明申请
    SYSTEM AND METHOD FOR COMBINING GEOGRAPHIC METADATA IN AUTOMATIC SPEECH RECOGNITION LANGUAGE AND ACOUSTIC MODELS 有权
    用于组合自动语音识别语言和语音模型中的地理元数据的系统和方法

    公开(公告)号:US20150073793A1

    公开(公告)日:2015-03-12

    申请号:US14541738

    申请日:2014-11-14

    摘要: Disclosed herein are systems, methods, and computer-readable storage media for a speech recognition application for directory assistance that is based on a user's spoken search query. The spoken search query is received by a portable device and portable device then determines its present location. Upon determining the location of the portable device, that information is incorporated into a local language model that is used to process the search query. Finally, the portable device outputs the results of the search query based on the local language model.

    摘要翻译: 本文公开了用于基于用户的口语搜索查询的目录帮助的语音识别应用的系统,方法和计算机可读存储介质。 口头搜索查询由便携式设备接收,便携式设备随后确定其当前位置。 在确定便携式设备的位置时,该信息被并入用于处理搜索查询的本地语言模型中。 最后,便携式设备基于本地语言模型输出搜索查询的结果。

    SYSTEM AND METHOD FOR COMBINING GEOGRAPHIC METADATA IN AUTOMATIC SPEECH RECOGNITION LANGUAGE AND ACOUSTIC MODELS
    9.
    发明申请
    SYSTEM AND METHOD FOR COMBINING GEOGRAPHIC METADATA IN AUTOMATIC SPEECH RECOGNITION LANGUAGE AND ACOUSTIC MODELS 有权
    用于组合自动语音识别语言和语音模型中的地理元数据的系统和方法

    公开(公告)号:US20110144973A1

    公开(公告)日:2011-06-16

    申请号:US12638667

    申请日:2009-12-15

    IPC分类号: G06F17/28 G10L15/06 G10L15/04

    摘要: Disclosed herein are systems, methods, and computer-readable storage media for a speech recognition application for directory assistance that is based on a user's spoken search query. The spoken search query is received by a portable device and portable device then determines its present location. Upon determining the location of the portable device, that information is incorporated into a local language model that is used to process the search query. Finally, the portable device outputs the results of the search query based on the local language model.

    摘要翻译: 本文公开了用于基于用户的口语搜索查询的目录帮助的语音识别应用的系统,方法和计算机可读存储介质。 口头搜索查询由便携式设备接收,便携式设备随后确定其当前位置。 在确定便携式设备的位置时,该信息被并入用于处理搜索查询的本地语言模型中。 最后,便携式设备基于本地语言模型输出搜索查询的结果。

    Decoder for searching a path according to a signal sequence, decoding method, and computer program product

    公开(公告)号:US10008200B2

    公开(公告)日:2018-06-26

    申请号:US14574895

    申请日:2014-12-18

    发明人: Manabu Nagao

    IPC分类号: G10L19/00 G10L15/08 G10L15/14

    摘要: According to an embodiment, a decoder searches a finite state transducer and outputs an output symbol string corresponding to a signal that is input or corresponding to a feature sequence of signal that is input. The decoder includes a token operating unit and a duplication eliminator. The token operating unit is configured to, every time the signal or the feature is input, propagate each of a plurality of tokens, which is assigned with a state of the head of a path being searched, according to the finite state transducer. The duplication eliminator is configured to eliminate duplication of two or more tokens which have same state assigned thereto and for which respective previously-passed transitions are assigned with same input symbol.