摘要:
The invention relates to a system and method for integrating domain information into state transitions of a Finite State Transducer (“FST”) for natural language processing. A system may integrate semantic parsing and information retrieval from an information domain to generate an FST parser that represents the information domain. The FST parser may include a plurality of FST paths, at least one of which may be used to generate a meaning representation from a natural language input. As such, the system may perform domain-based semantic parsing of a natural language input, generating more robust meaning representations using domain information. The system may be applied to a wide range of natural language applications that use natural language input from a user such as, for example, natural language interfaces to computing systems, communication with robots in natural language, personalized digital assistants, question-answer query systems, and/or other natural language processing applications.
摘要:
A system and method of recording utterances for building Named Entity Recognition (“NER”) models, which are used to build dialog systems in which a computer listens and responds to human voice dialog. Utterances to be uttered may be provided to users through their mobile devices, which may record the user uttering (e.g., verbalizing, speaking, etc.) the utterances and upload the recording to a computer for processing. The use of the user's mobile device, which is programmed with an utterance collection application (e.g., configured as a mobile app), facilitates the use of crowd-sourcing human intelligence tasking for widespread collection of utterances from a population of users. As such, obtaining large datasets for building NER models may be facilitated by the system and method disclosed herein.
摘要:
In certain implementations, follow-up responses may be provided for prior natural language inputs of a user. As an example, a natural language input associated with a user may be received at a computer system. A determination of whether information sufficient for providing an adequate response to the natural language input is currently accessible to the computer system may be effectuated. A first response to the natural language input (that indicates that a follow-up response will be provided) may be provided based on a determination that information sufficient for providing an adequate response to the natural language input is not currently accessible. Information sufficient for providing an adequate response to the natural language input may be received. A second response to the natural language input may then be provided based on the received sufficient information.
摘要:
A system and method of tagging utterances with Named Entity Recognition (“NER”) labels using unmanaged crowds is provided. The system may generate various annotation jobs in which a user, among a crowd, is asked to tag which parts of an utterance, if any, relate to various entities associated with a domain. For a given domain that is associated with a number of entities that exceeds a threshold N value, multiple batches of jobs (each batch having jobs that have a limited number of entities for tagging) may be used to tag a given utterance from that domain. This reduces the cognitive load imposed on a user, and prevents the user from having to tag more than N entities. As such, a domain with a large number of entities may be tagged efficiently by crowd participants without overloading each crowd participant with too many entities to tag.
摘要:
The invention relates to a system and method of automatically distinguishing between computers and human based on responses to enhanced Completely Automated Public Turing test to tell Computers and Humans Apart (“e-captcha”) challenges that do not merely challenge the user to recognize skewed or stylized text. A given e-captcha challenge may be specific to a particular knowledge domain. Accordingly, e-captchas may be used not only to distinguish between computers and humans, but also determine whether a respondent has demonstrated knowledge in the particular knowledge domain. For instance, participants in crowd-sourced tasks, in which unmanaged crowds are asked to perform tasks, may be screened using an e-captcha challenge. This not only validates that a participant is a human (and not a bot, for example, attempting to game the crowd-source task), but also screens the participant based on whether they can successfully respond to the e-captcha challenge.
摘要:
A system and method of tagging utterances with Named Entity Recognition (“NER”) labels using unmanaged crowds is provided. The system may generate various annotation jobs in which a user, among a crowd, is asked to tag which parts of an utterance, if any, relate to various entities associated with a domain. For a given domain that is associated with a number of entities that exceeds a threshold N value, multiple batches of jobs (each batch having jobs that have a limited number of entities for tagging) may be used to tag a given utterance from that domain. This reduces the cognitive load imposed on a user, and prevents the user from having to tag more than N entities. As such, a domain with a large number of entities may be tagged efficiently by crowd participants without overloading each crowd participant with too many entities to tag.
摘要:
A system and method for processing multi-modal device interactions in a natural language voice services environment may be provided. In particular, one or more multi-modal device interactions may be received in a natural language voice services environment that includes one or more electronic devices. The multi-modal device interactions may include a non-voice interaction with at least one of the electronic devices or an application associated therewith, and may further include a natural language utterance relating to the non-voice interaction. Context relating to the non-voice interaction and the natural language utterance may be extracted and combined to determine an intent of the multi-modal device interaction, and a request may then be routed to one or more of the electronic devices based on the determined intent of the multi-modal device interaction.
摘要:
The system and method described herein may use various natural language models to deliver targeted advertisements and/or provide natural language processing based on advertisements. In one implementation, an advertisement associated with a product or service may be provided for presentation to a user. A natural language utterance of the user may be received. The natural language utterance may be interpreted based on the advertisement and, responsive to the existence of a pronoun in the natural language utterance, a determination of whether the pronoun refers to one or more of the product or service or a provider of the product or service may be effectuated.
摘要:
Systems and methods of providing text related to utterances, and gathering voice data in response to the text are provide herein. In various implementations, an identification token that identifies a first file for a voice data collection campaign, and a second file for a session script may be received from a natural language processing training device. The first file and the second file may be used to configure the mobile application to display a sequence of screens, each of the sequence of screens containing text of at least one utterance specified in the voice data collection campaign. Voice data may be received from the natural language processing training device in response to user interaction with the text of the at least one utterance. The voice data and the text may be stored in a transcription library.
摘要:
A system and method for processing multi-modal device interactions in a natural language voice services environment may be provided. In particular, one or more multi-modal device interactions may be received in a natural language voice services environment that includes one or more electronic devices. The multi-modal device interactions may include a non-voice interaction with at least one of the electronic devices or an application associated therewith, and may further include a natural language utterance relating to the non-voice interaction. Context relating to the non-voice interaction and the natural language utterance may be extracted and combined to determine an intent of the multi-modal device interaction, and a request may then be routed to one or more of the electronic devices based on the determined intent of the multi-modal device interaction.