Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for contextually disambiguating queries are disclosed. In an aspect, a method includes receiving an image being presented on a display of a computing device and a transcription of an utterance spoken by a user of the computing device, identifying a particular sub-image that is included in the image, and based on performing image recognition on the particular sub-image, determining one or more first labels that indicate a context of the particular sub-image. The method also includes, based on performing text recognition on a portion of the image other than the particular sub-image, determining one or more second labels that indicate the context of the particular sub-image, based on the transcription, the first labels, and the second labels, generating a search query, and providing, for output, the search query.
Abstract:
A device may determine a representation of text that includes a first linguistic term associated with a first set of speech sounds and a second linguistic term associated with a second set of speech sounds. The device may determine a plurality of joins between the first set and the second set. A given join may be indicative of concatenating a first speech sound from the first set with a second speech sound from the second set. A given local cost of the given join may correspond to a weighted sum of individual cost. A given individual cost may be weighted based on a variability of the given individual cost in the plurality of joins. The device may provide a sequence of speech sounds indicative of a pronunciation of the text based on a minimization of a sum of local costs of adjacent speech sounds in the sequence.
Abstract:
A device may determine a representation of text that includes a first linguistic term associated with a first set of speech sounds and a second linguistic term associated with a second set of speech sounds. The device may determine a plurality of joins between the first set and the second set. A given join may be indicative of concatenating a first speech sound from the first set with a second speech sound from the second set. A given local cost of the given join may correspond to a weighted sum of individual cost. A given individual cost may be weighted based on a variability of the given individual cost in the plurality of joins. The device may provide a sequence of speech sounds indicative of a pronunciation of the text based on a minimization of a sum of local costs of adjacent speech sounds in the sequence.
Abstract:
A computer-implemented method is described. The method includes a computing system receiving an item of digital content from a user device. The computing system generates one or more labels that indicate attributes of the item of digital content. The computing system also generates one or more conversational replies to the item of digital content based on the one or more labels that indicate attributes of the item of digital content. The method also includes the computing system selecting a conversational reply from among the one or more conversational replies and providing the conversational reply for output to the user device.
Abstract:
Techniques are described herein for organizing messages exchanged between users and automated assistants into distinct conversations. In various implementations, a chronological transcript of messages exchanged as part of human-to-computer dialog session(s) between a user and an automated assistant may be analyzed. Based on the analyzing, a subset of the chronological transcript of messages relating to a task performed by the user via the human-to-computer dialog session(s) may be identified. Based on content of the subset and the task, conversational metadata may be generated that causes a client computing device to provide a selectable element that conveys the task. Selection of the selectable element may cause the client computing device to present representations associated with at least one of the transcript messages related to the task.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating, in response to a single input operating system command that is invoked independent of a native application, a message that includes an image of a particular environment instance of the native application that was displayed when the single input operating system command and a uniform resource identifier of the particular environment instance of the native application.
Abstract:
The present disclosure relates to user-selected metadata related to images captured by a camera of a client device. User-selected metadata may include contextual information and/or information provided by a user when the images are captured. In various implementations, a free form input may be received at a first client device of one or more client devices operated by a user. A task request may be recognized from the free form input, and it may be determined that the task request includes a request to store metadata related to one or more images captured by a camera of the first client device. The metadata may be selected based on content of the task request. The metadata may then be stored, e.g., in association with one or more images captured by the camera, in computer-readable media. The computer-readable media may be searchable by the metadata.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing contextual information for presented media. In one aspect, a method includes storing in a buffer, on a first user device, media data as buffered media data, the buffered media data being a most recent portion of media data received at the first user device, the most recent portion inclusive of the media data received from a present time to a prior time that is fixed relative to the present time; responsive to a search operation invocation at the present time, sending the buffered media data to a search processing system that is remote from the first user device; and receiving, from the search processing system and in response to the buffered media data, contextual information regarding an entity that the data processing system identified from processing the buffered media data.
Abstract:
Methods, apparatus, and computer readable media are described related to causing processing of sensor data to be performed in response to determining a request related to an environmental object that is likely captured by the sensor data. Some implementations further relate to determining whether the request is resolvable based on the processing of the sensor data. When it is determined that the request is not resolvable, a prompt is determined and provided as user interface output, where the prompt provides guidance on further input that will enable the request to be resolved. In those implementations, the further input (e.g., additional sensor data and/or the user interface input) received in response to the prompt can then be utilized to resolve the request.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing contextual information to a user. In one aspect, a method includes receiving, from a user device, a query-independent request for contextual information relevant to an active resource displayed in an application environment on the user device, determining content described by the active resource in response to the query independent request, and identifying, in response to the query-independent request, multiple resources that are relevant to the content described by the active resource. Additional actions include, for each resource of the multiple resources, determining a corresponding measure of user engagement that reflects engagement of the resource by one or more users, selecting one or more of the multiple resources based on the measures of user engagement for the multiple resources, and providing, to the user device, a user interface element for display with the active resource.