System and method of recording utterances using unmanaged crowds for natural language processing
    1.
    发明授权
    System and method of recording utterances using unmanaged crowds for natural language processing 有权
    使用非管理人群对自然语言处理进行记录的系统和方法

    公开(公告)号:US09448993B1

    公开(公告)日:2016-09-20

    申请号:US14846926

    申请日:2015-09-07

    摘要: A system and method of recording utterances for building Named Entity Recognition (“NER”) models, which are used to build dialog systems in which a computer listens and responds to human voice dialog. Utterances to be uttered may be provided to users through their mobile devices, which may record the user uttering (e.g., verbalizing, speaking, etc.) the utterances and upload the recording to a computer for processing. The use of the user's mobile device, which is programmed with an utterance collection application (e.g., configured as a mobile app), facilitates the use of crowd-sourcing human intelligence tasking for widespread collection of utterances from a population of users. As such, obtaining large datasets for building NER models may be facilitated by the system and method disclosed herein.

    摘要翻译: 记录用于构建命名实体识别(“NER”)模型的系统和方法,用于构建对话系统,其中计算机监听并响应人声对话。 可以通过其移动设备向用户提供要发出的话语,其可以记录用户发声(例如,言语,说话等)的话语,并将记录上传到计算机进行处理。 使用用话语收集应用(例如,配置为移动应用程序)编程的用户的移动设备有助于使用人群来源的人类智能任务来广泛地收集来自用户群体的话语。 因此,通过本文公开的系统和方法可以获得用于构建NER模型的大数据集。

    System and method for providing words or phrases to be uttered by members of a crowd and processing the utterances in crowd-sourced campaigns to facilitate speech analysis
    3.
    发明授权
    System and method for providing words or phrases to be uttered by members of a crowd and processing the utterances in crowd-sourced campaigns to facilitate speech analysis 有权
    用于提供要由群众成员发出的单词或短语的系统和方法,并处理来自人群的运动中的话语以促进语音分析

    公开(公告)号:US09361887B1

    公开(公告)日:2016-06-07

    申请号:US14846925

    申请日:2015-09-07

    IPC分类号: G10L15/26 G10L15/06

    摘要: Systems and methods of providing text related to utterances, and gathering voice data in response to the text are provide herein. In various implementations, an identification token that identifies a first file for a voice data collection campaign, and a second file for a session script may be received from a natural language processing training device. The first file and the second file may be used to configure the mobile application to display a sequence of screens, each of the sequence of screens containing text of at least one utterance specified in the voice data collection campaign. Voice data may be received from the natural language processing training device in response to user interaction with the text of the at least one utterance. The voice data and the text may be stored in a transcription library.

    摘要翻译: 本文提供了提供与话语相关的文本以及响应于文本收集语音数据的系统和方法。 在各种实施方式中,可以从自然语言处理训练装置接收识别用于语音数据收集活动的第一文件的识别令牌和用于会话脚本的第二文件。 可以使用第一文件和第二文件来配置移动应用程序来显示屏幕序列,每个屏幕序列包含语音数据收集活动中指定的至少一个话语的文本。 响应于用户与至少一个话语的文本交互而可以从自然语言处理训练装置接收语音数据。 语音数据和文本可以存储在转录库中。