Multi-state barge-in models for spoken dialog systems
    11.
    发明授权
    Multi-state barge-in models for spoken dialog systems 有权
    用于口语对话系统的多状态插入模型

    公开(公告)号:US08612234B2

    公开(公告)日:2013-12-17

    申请号:US13279443

    申请日:2011-10-24

    Applicant: Andrej Ljolje

    Inventor: Andrej Ljolje

    CPC classification number: G10L15/22 G10L15/142 G10L15/222

    Abstract: A method is disclosed for applying a multi-state barge-in acoustic model in a spoken dialogue system. The method includes receiving an audio speech input from the user during the presentation of a prompt, accumulating the audio speech input from the user, applying a non-speech component having at least two one-state Hidden Markov Models (HMMs) to the audio speech input from the user, applying a speech component having at least five three-state HMMs to the audio speech input from the user, in which each of the five three-state HMMs represents a different phonetic category, determining whether the audio speech input is a barge-in-speech input from the user, and if the audio speech input is determined to be the barge-in-speech input from the user, terminating the presentation of the prompt.

    Abstract translation: 公开了一种在口语对话系统中应用多状态插入声学模型的方法。 该方法包括在呈现提示期间从用户接收音频语音输入,累积从用户输入的音频语音,将具有至少两个一状态隐马尔可夫模型(HMM)的非语音分量应用于音频语音 从用户输入,将具有至少五个三态HMM的语音分量应用于从用户输入的音频语音,其中五个三态HMM中的每一个表示不同的语音类别,确定音频语音输入是否为 来自用户的语音输入,并且如果音频语音输入被确定为来自用户的语音输入输入,则终止提示的呈现。

    Adapting language models with a bit mask for a subset of related words
    12.
    发明授权
    Adapting language models with a bit mask for a subset of related words 有权
    使用相关字词子集的位掩码来适应语言模型

    公开(公告)号:US08589163B2

    公开(公告)日:2013-11-19

    申请号:US12631111

    申请日:2009-12-04

    CPC classification number: G10L15/183 G10L2015/227 G10L2015/228

    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for performing speech recognition based on a masked language model. A system configured to practice the method receives a masked language model including a plurality of words, wherein a bit mask identifies whether each of the plurality of words is allowed or disallowed with regard to an adaptation subset, receives input speech, generates a speech recognition lattice based on the received input speech using the masked language model, removes from the generated lattice words identified as disallowed by the bit mask for the adaptation subset, and recognizes the received speech based on the lattice. Alternatively during the generation step, the system can only add words indicated as allowed by the bit mask. The bit mask can be separate from or incorporated as part of the masked language model. The system can dynamically update the adaptation subset and bit mask.

    Abstract translation: 本文公开了用于基于掩蔽语言模型执行语音识别的系统,方法和计算机可读存储介质。 被配置为实施该方法的系统接收包括多个单词的掩蔽语言模型,其中位掩码识别关于自适应子集是否允许或不允许多个单词中的每一个,接收输入语音,生成语音识别格 基于使用掩蔽语言模型的接收到的输入语音,从由适配子集的位掩码识别为不允许的生成的格子字中移除,并且基于格子识别接收的语音。 或者在生成步骤期间,系统只能添加由位掩码允许的指示的字。 位掩码可以与掩蔽语言模型的一部分分开或并入。 系统可以动态地更新自适应子集和位掩码。

    System and method for pronunciation modeling
    13.
    发明授权
    System and method for pronunciation modeling 有权
    发音建模的系统和方法

    公开(公告)号:US08073693B2

    公开(公告)日:2011-12-06

    申请号:US12328407

    申请日:2008-12-04

    CPC classification number: G10L15/187 G10L15/183 G10L2015/025

    Abstract: Systems, computer-implemented methods, and tangible computer-readable media for generating a pronunciation model. The method includes identifying a generic model of speech composed of phonemes, identifying a family of interchangeable phonemic alternatives for a phoneme in the generic model of speech, labeling the family of interchangeable phonemic alternatives as referring to the same phoneme, and generating a pronunciation model which substitutes each family for each respective phoneme. In one aspect, the generic model of speech is a vocal tract length normalized acoustic model. Interchangeable phonemic alternatives can represent a same phoneme for different dialectal classes. An interchangeable phonemic alternative can include a string of phonemes.

    Abstract translation: 系统,计算机实现的方法和用于生成发音模型的有形计算机可读介质。 该方法包括识别由音素组成的通用语音模型,在通用语音模型中识别音素的可互换音素替代品系列,将可互换音素替代品的家族标记为指相同的音素,以及生成发音模型,其中 将每个家庭的每个音素替代。 在一个方面,语音的通用模型是声道长度归一化声学模型。 可互换的音素替代品可以代表不同方言课程的相同音素。 可互换的音素替代品可以包括一串音素。

    Multi-state barge-in models for spoken dialog systems
    14.
    发明授权
    Multi-state barge-in models for spoken dialog systems 有权
    用于口语对话系统的多状态插入模型

    公开(公告)号:US08046221B2

    公开(公告)日:2011-10-25

    申请号:US11930619

    申请日:2007-10-31

    Applicant: Andrej Ljolje

    Inventor: Andrej Ljolje

    CPC classification number: G10L15/22 G10L15/142 G10L15/222

    Abstract: Disclosed are systems, methods and computer readable media for applying a multi-state barge-in acoustic model in a spoken dialogue system comprising the steps of (1) presenting a prompt to a user from the spoken dialog system. (2) receiving an audio speech input from the user during the presentation of the prompt, (3) accumulating the audio speech input from the user, (4) applying a non-speech component having at least two one-state Hidden Markov Models (HMMs) to the audio speech input from the user, (5) applying a speech component having at least five three-state HMMs to the audio speech input from the user, in which each of the five three-state HMMs represents a different phonetic category, (6) determining whether the audio speech input is a barge-in-speech input from the user, and (7) if the audio speech input is determined to be the barge-in-speech input from the user, terminating the presentation of the prompt.

    Abstract translation: 公开了用于在口语对话系统中应用多状态插入声学模型的系统,方法和计算机可读介质,包括以下步骤:(1)从口头对话系统向用户呈现提示。 (2)在呈现提示期间接收来自用户的音频语音输入,(3)累积从用户输入的音频语音,(4)应用具有至少两个一状态隐马尔可夫模型的非语音分量 HMM)到从用户输入的音频语音,(5)将具有至少五个三态HMM的语音分量应用于从用户输入的音频语音,其中五个三态HMM中的每一个表示不同的语音类别 ,(6)确定音频语音输入是否是来自用户的输入语音输入,以及(7)如果音频语音输入被确定为来自用户的语音输入输入,则终止呈现 提示。

    SYSTEM AND METHOD FOR RESTRICTING LARGE LANGUAGE MODELS
    15.
    发明申请
    SYSTEM AND METHOD FOR RESTRICTING LARGE LANGUAGE MODELS 有权
    限制大型语言模型的系统和方法

    公开(公告)号:US20110137653A1

    公开(公告)日:2011-06-09

    申请号:US12631111

    申请日:2009-12-04

    CPC classification number: G10L15/183 G10L2015/227 G10L2015/228

    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for performing speech recognition based on a masked language model. A system configured to practice the method receives a masked language model including a plurality of words, wherein a bit mask identifies whether each of the plurality of words is allowed or disallowed with regard to an adaptation subset, receives input speech, generates a speech recognition lattice based on the received input speech using the masked language model, removes from the generated lattice words identified as disallowed by the bit mask for the adaptation subset, and recognizes the received speech based on the lattice. Alternatively during the generation step, the system can only add words indicated as allowed by the bit mask. The bit mask can be separate from or incorporated as part of the masked language model. The system can dynamically update the adaptation subset and bit mask.

    Abstract translation: 本文公开了用于基于掩蔽语言模型执行语音识别的系统,方法和计算机可读存储介质。 被配置为实施该方法的系统接收包括多个单词的掩蔽语言模型,其中位掩码识别关于自适应子集是否允许或不允许多个单词中的每一个,接收输入语音,生成语音识别格 基于使用掩蔽语言模型的接收到的输入语音,从由适配子集的位掩码识别为不允许的生成的格子字中移除,并且基于格子识别接收的语音。 或者在生成步骤期间,系统只能添加由位掩码允许的指示的字。 位掩码可以与掩蔽语言模型的一部分分开或并入。 系统可以动态地更新自适应子集和位掩码。

    SYSTEM AND METHOD FOR TRAINING ADAPTATION-SPECIFIC ACOUSTIC MODELS FOR AUTOMATIC SPEECH RECOGNITION
    16.
    发明申请
    SYSTEM AND METHOD FOR TRAINING ADAPTATION-SPECIFIC ACOUSTIC MODELS FOR AUTOMATIC SPEECH RECOGNITION 有权
    用于训练用于自动语音识别的适应特定声学模型的系统和方法

    公开(公告)号:US20110137650A1

    公开(公告)日:2011-06-09

    申请号:US12633334

    申请日:2009-12-08

    Applicant: Andrej LJOLJE

    Inventor: Andrej LJOLJE

    CPC classification number: G10L15/144 G10L15/063

    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for training adaptation-specific acoustic models. A system practicing the method receives speech and generates a full size model and a reduced size model, the reduced size model starting with a single distribution for each speech sound in the received speech. The system finds speech segment boundaries in the speech using the full size model and adapts features of the speech data using the reduced size model based on the speech segment boundaries and an overall centroid for each speech sound. The system then recognizes speech using the adapted features of the speech. The model can be a Hidden Markov Model (HMM). The reduced size model can also be of a reduced complexity, such as having fewer mixture components than a model of full complexity. Adapting features of speech can include moving the features closer to an overall feature distribution center.

    Abstract translation: 本文公开了用于训练适应特定声学模型的系统,方法和计算机可读存储介质。 实施该方法的系统接收语音并生成全尺寸模型和缩小尺寸模型,缩小尺寸模型从接收到的语音中的每个语音的单个分布开始。 该系统使用全尺寸模型在语音中找到语音段边界,并且使用基于语音段边界的缩小尺寸模型和每个语音的整体质心来适应语音数据的特征。 该系统然后使用该语音的适应特征识别语音。 该模型可以是隐马尔可夫模型(HMM)。 缩小的尺寸模型也可以是降低的复杂性,例如具有比完全复杂性的模型更少的混合分量。 适应语音功能可以包括将功能移动到更接近整体功能分配中心。

    SYSTEM AND METHOD FOR STANDARDIZED SPEECH RECOGNITION INFRASTRUCTURE
    17.
    发明申请
    SYSTEM AND METHOD FOR STANDARDIZED SPEECH RECOGNITION INFRASTRUCTURE 有权
    用于标准化语音识别基础结构的系统和方法

    公开(公告)号:US20110119059A1

    公开(公告)日:2011-05-19

    申请号:US12618371

    申请日:2009-11-13

    CPC classification number: G10L15/075 G10L15/063 G10L15/065 G10L15/07 G10L15/08

    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for selecting a speech recognition model in a standardized speech recognition infrastructure. The system receives speech from a user, and if a user-specific supervised speech model associated with the user is available, retrieves the supervised speech model. If the user-specific supervised speech model is unavailable and if an unsupervised speech model is available, the system retrieves the unsupervised speech model. If the user-specific supervised speech model and the unsupervised speech model are unavailable, the system retrieves a generic speech model associated with the user. Next the system recognizes the received speech from the user with the retrieved model. In one embodiment, the system trains a speech recognition model in a standardized speech recognition infrastructure. In another embodiment, the system handshakes with a remote application in a standardized speech recognition infrastructure.

    Abstract translation: 这里公开了用于在标准化语音识别基础设施中选择语音识别模型的系统,方法和计算机可读存储介质。 系统从用户接收语音,并且如果与用户相关联的用户特定的监督语音模型可用,则检索监督的语音模型。 如果用户特定的监督语音模型不可用,并且如果无人监督的语音模型可用,则系统检索无监督语音模型。 如果用户特定的监督语音模型和无监督语音模型不可用,则系统检索与用户相关联的通用语音模型。 接下来,系统使用所检索的模型识别来自用户的接收到的语音。 在一个实施例中,系统在标准化语音识别基础设施中训练语音识别模型。 在另一个实施例中,系统与标准语音识别基础设施中的远程应用握手。

    SYSTEM AND METHOD FOR PERSONALIZATION OF ACOUSTIC MODELS FOR AUTOMATIC SPEECH RECOGNITION
    18.
    发明申请
    SYSTEM AND METHOD FOR PERSONALIZATION OF ACOUSTIC MODELS FOR AUTOMATIC SPEECH RECOGNITION 有权
    用于自动语音识别的声学模型的个性化系统和方法

    公开(公告)号:US20110066433A1

    公开(公告)日:2011-03-17

    申请号:US12561005

    申请日:2009-09-16

    Abstract: Disclosed herein are methods, systems, and computer-readable storage media for automatic speech recognition. The method includes selecting a speaker independent model, and selecting a quantity of speaker dependent models, the quantity of speaker dependent models being based on available computing resources, the selected models including the speaker independent model and the quantity of speaker dependent models. The method also includes recognizing an utterance using each of the selected models in parallel, and selecting a dominant speech model from the selected models based on recognition accuracy using the group of selected models. The system includes a processor and modules configured to control the processor to perform the method. The computer-readable storage medium includes instructions for causing a computing device to perform the steps of the method.

    Abstract translation: 这里公开了用于自动语音识别的方法,系统和计算机可读存储介质。 该方法包括选择一个说话者独立模型,并选择一个说话者依赖模型的数量,说话人依赖模型的数量是基于可用的计算资源,所选择的模型包括与说话者无关的模型和说话者依赖模型的数量。 该方法还包括使用所选择的模型中的每一个并行地识别话语,并且基于使用所选择的模型的组的识别精度从所选择的模型中选择主要语言模型。 该系统包括处理器和被配置为控制处理器执行该方法的模块。 计算机可读存储介质包括用于使计算设备执行该方法的步骤的指令。

    DISCRIMINATIVE TRAINING OF MULTI-STATE BARGE-IN MODELS FOR SPEECH PROCESSING
    19.
    发明申请
    DISCRIMINATIVE TRAINING OF MULTI-STATE BARGE-IN MODELS FOR SPEECH PROCESSING 有权
    用于语音处理的多状态边界模型的辨别性训练

    公开(公告)号:US20090112595A1

    公开(公告)日:2009-04-30

    申请号:US11930656

    申请日:2007-10-31

    Applicant: Andrej Ljolje

    Inventor: Andrej Ljolje

    CPC classification number: G10L15/144 G10L15/063

    Abstract: Disclosed are systems and methods for training a barge-in-model for speech processing in a spoken dialogue system comprising the steps of (1) receiving an input having at least one speech segment and at least one non-speech segment, (2) establishing a restriction of recognizing only speech states during speech segments of the input and non-speech states during non-speech segments of the input, (2) generating a hypothesis lattice by allowing any sequence of speech Hidden Markov Models (HMMs) and non-speech HMMs, (4) generating a reference lattice by only allowing speech HMMs for at least one speech segment and non-speech HMMs for at least one non-speech segment, wherein different iterations of training generates at least one different reference lattice and at least one reference transcription, and (5) employing the generated reference lattice as the barge-in-model for speech processing.

    Abstract translation: 公开了用于在语音对话系统中训练用于语音处理的模型中的模型的系统和方法,包括以下步骤:(1)接收具有至少一个语音段和至少一个非语音段的输入,(2)建立 在输入的非语音段期间仅在语音段中识别语音段的限制,(2)通过允许语音隐马尔可夫模型(HMM)和非语音的任何序列来生成假设格点 HMM,(4)通过仅对至少一个语音段的语音HMM和至少一个非语音段的非语音HMM来产生参考点,其中不同的训练迭代产生至少一个不同的参考点,并且至少一个 参考转录,以及(5)使用所生成的参考网格作为用于语音处理的模型。

    Systems and methods of providing modified media content
    20.
    发明授权
    Systems and methods of providing modified media content 有权
    提供修改的媒体内容的系统和方法

    公开(公告)号:US09414010B2

    公开(公告)日:2016-08-09

    申请号:US13471851

    申请日:2012-05-15

    Applicant: Andrej Ljolje

    Inventor: Andrej Ljolje

    Abstract: A method includes receiving a command to provide media content configured to be sent to a display device for display at a particular scan rate. The media content includes audio data and video data. The method includes identifying high priority segments of the media content based on the audio data. The high priority segments are to be displayed by the display device at a presentation rate such that the high priority segments displayed at the presentation rate correspond to the media content displayed at the particular scan rate. The method also includes sending the high priority segments to the display device to provide video content and audio content of the requested media content for display.

    Abstract translation: 一种方法包括接收命令以提供配置成发送到显示设备以便以特定扫描速率显示的媒体内容。 媒体内容包括音频数据和视频数据。 该方法包括基于音频数据识别媒体内容的高优先级段。 显示设备将以显示速率显示高优先级片段,使得以呈现速率显示的高优先级片段对应于以特定扫描速率显示的媒体内容。 该方法还包括将高优先级段发送到显示设备以提供所请求的媒体内容的视频内容和音频内容以供显示。

Patent Agency Ranking