Abstract:
A system and method of recording utterances for building Named Entity Recognition ("NER") models, which are used to build dialog systems in which a computer listens and responds to human voice dialog. Utterances to be uttered may be provided to users through their mobile devices, which may record the user uttering (e.g., verbalizing, speaking, etc.) the utterances and upload the recording to a computer for processing. The use of the user's mobile device, which is programmed with an utterance collection application (e.g., configured as a mobile app), facilitates the use of crowd-sourcing human intelligence tasking for widespread collection of utterances from a population of users. As such, obtaining large datasets for building NER models may be facilitated by the system and method disclosed herein.
Abstract:
Systems and methods gathering text commands in response to a command context using a first crowdsourced are discussed herein. A command context for a natural language processing system may be identified, where the command context is associated with a command context condition to provide commands to the natural language processing system. One or more command creators associated with one or more command creation devices may be selected. A first application one the one or more command creation devices may be configured to display command creation instructions for each of the one or more command creators to provide text commands that satisfy the command context, and to display a field for capturing a user-generated text entry to satisfy the command creation condition in accordance with the command creation instructions. Systems and methods for reviewing the text commands using second and crowdsourced jobs are also presented herein.
Abstract:
Systems and methods of providing text related to utterances, and gathering voice data in response to the text are provide herein. In various implementations, an identification token that identifies a first file for a voice data collection campaign, and a second file for a session script may be received from a natural language processing training device. The first file and the second file may be used to configure the mobile application to display a sequence of screens, each of the sequence of screens containing text of at least one utterance specified in the voice data collection campaign. Voice data may be received from the natural language processing training device in response to user interaction with the text of the at least one utterance. The voice data and the text may be stored in a transcription library.
Abstract:
A system and method of tagging utterances with Named Entity Recognition ("NER") labels using unmanaged crowds is provided. The system may generate various annotation jobs in which a user, among a crowd, is asked to tag which parts of an utterance, if any, relate to various entities associated with a domain. For a given domain that is associated with a number of entities that exceeds a threshold N value, multiple batches of jobs (each batch having jobs that have a limited number of entities for tagging) may be used to tag a given utterance from that domain. This reduces the cognitive load imposed on a user, and prevents the user from having to tag more than N entities. As such, a domain with a large number of entities may be tagged efficiently by crowd participants without overloading each crowd participant with too many entities to tag.
Abstract:
Systems and methods of validating transcriptions of natural language content using crowdsourced validation jobs are provided herein. In various implementations, a transcription pair comprising natural language content and text corresponding to a transcription of the natural language content may be gathered. A first group of validation devices may be selected for reviewing the transcription pair. A first crowdsourced validation job may be created for the first group of validation devices. The first crowdsourced validation job may be provided to the first group of validation devices. A vote representing whether or not the text accurately represents the natural language content may be received from each of the first group of validation devices. A validation score may be assigned to the transcription pair based, at least in part, on the votes from each of the first group of validation devices.