Abstract:
Systems and processes for robust end-pointing of speech signals using speaker recognition are provided. In one example process, a stream of audio having a spoken user request can be received. A first likelihood that the stream of audio includes user speech can be determined. A second likelihood that the stream of audio includes user speech spoken by an authorized user can be determined. A start-point or an end-point of the spoken user request can be determined based at least in part on the first likelihood and the second likelihood.
Abstract:
Techniques for providing reminders based on social interactions between users of electronic devices are described. Social reminders can be set to trigger based on social interactions of users. For example, a user may request to be reminded to discuss a certain discussion topic with a particular phonebook contact, when the user next encounters the contact.
Abstract:
In some implementations, a user device can personalize a media stream by converting notifications into audio speech data and presenting the audio speech data at locations within the media stream that do not interrupt the enjoyment of the media stream by the user. In some implementations, the user device can receive notifications from various communication services, applications installed on the user device, and/or other sources, determine information describing the notifications, and present the information to the user using the audio speech data. In some implementations, the user device can generate personalized notifications based on the media stream and/or media items selected by the user. The user device can generate personalized notifications based on the user's context (e.g., environment, location, activity, etc.). The personalized notifications can then be presented to the user using audio speech data at appropriate locations in the media stream.
Abstract:
The method is performed at an electronic device with one or more processors and memory storing one or more programs for execution by the one or more processors. A first speech input including at least one word is received. A first phonetic representation of the at least one word is determined, the first phonetic representation comprising a first set of phonemes selected from a speech recognition phonetic alphabet. The first set of phonemes is mapped to a second set of phonemes to generate a second phonetic representation, where the second set of phonemes is selected from a speech synthesis phonetic alphabet. The second phonetic representation is stored in association with a text string corresponding to the at least one word.
Abstract:
Systems and processes for generating a shared pronunciation lexicon and using the shared pronunciation lexicon to interpret spoken user inputs received by a virtual assistant are provided. In one example, the process can include receiving pronunciations for words or named entities from multiple users. The pronunciations can be tagged with context tags and stored in the shared pronunciation lexicon. The shared pronunciation lexicon can then be used to interpret a spoken user input received by a user device by determining a relevant subset of the shared pronunciation lexicon based on contextual information associated with the user device and performing speech-to-text conversion on the spoken user input using the determined subset of the shared pronunciation lexicon.
Abstract:
While an electronic device with a display and a touch-sensitive surface is in a screen reader accessibility mode, the device displays an application launcher screen including a plurality of application icons. A respective application icon corresponds to a respective application stored in the device. The device detects a sequence of one or more gestures on the touch-sensitive surface that correspond to one or more characters. A respective gesture that corresponds to a respective character is a single finger gesture that moves across the touch-sensitive surface along a respective path that corresponds to the respective character. The device determines whether the detected sequence of one or more gestures corresponds to a respective application icon of the plurality of application icons, and, in response to determining that the detected sequence of one or more gestures corresponds to the respective application icon, performs a predefined operation associated with the respective application icon.
Abstract:
Systems and processes are disclosed for virtual assistant request recognition using live usage data and data relating to future events. User requests that are received but not recognized can be used to generate candidate request templates. A count can be associated with each candidate request template and can be incremented each time a matching candidate request template is received. When a count reaches a threshold level, the corresponding candidate request template can be used to train a virtual assistant to recognize and respond to similar user requests in the future. In addition, data relating to future events can be mined to extract relevant information that can be used to populate both recognized user request templates and candidate user request templates. Populated user request templates (e.g., whole expected utterances) can then be used to recognize user requests and disambiguate user intent as future events become relevant.
Abstract:
Techniques for providing reminders based on social interactions between users of electronic devices are described. Social reminders can be set to trigger based on social interactions of users. For example, a user may request to be reminded to discuss a certain discussion topic with a particular phonebook contact, when the user next encounters the contact.
Abstract:
While an electronic device with a display and a touch-sensitive surface is in a screen reader accessibility mode, the device displays a character input area and a keyboard, the keyboard including a plurality of key icons. The device detects a sequence of one or more gestures on the touch-sensitive surface that correspond to one or more characters. A respective gesture of the one or more gestures that corresponds to a respective character is a single finger gesture that moves across the touch-sensitive surface along a respective path that corresponds to the respective character. The respective path traverses one or more locations on the touch-sensitive surface that correspond to one or more key icons of the plurality of key icons without activating the one or more key icons. In response to detecting the respective gesture, the device enters the corresponding respective character in the character input area of the display.
Abstract:
While an electronic device with a display and a touch-sensitive surface is in a screen reader accessibility mode, the device displays an application launcher screen including a plurality of application icons. A respective application icon corresponds to a respective application stored in the device. The device detects a sequence of one or more gestures on the touch-sensitive surface that correspond to one or more characters. A respective gesture that corresponds to a respective character is a single finger gesture that moves across the touch-sensitive surface along a respective path that corresponds to the respective character. The device determines whether the detected sequence of one or more gestures corresponds to a respective application icon of the plurality of application icons, and, in response to determining that the detected sequence of one or more gestures corresponds to the respective application icon, performs a predefined operation associated with the respective application icon.