Abstract:
A method to execute computer-actionable directives conveyed in human speech comprises: receiving audio data recording speech from one or more speakers; converting the audio data into a linguistic representation of the recorded speech; detecting a target corresponding to the linguistic representation; committing to the data structure language data associated with the detected target and based on the linguistic representation; parsing the data structure to identify one or more of the computer-actionable directives; and submitting the one or more of the computer-actionable directives to the computer for processing.
Abstract:
Methods and apparatus for capturing motion from a self-tracking device are disclosed. In embodiments, a device self-tracks motion of the device relative to a first reference frame while recording motion of a subject relative to a second reference frame, the second reference frame being a reference frame relative to the device. In the embodiments, the subject may be a real object or, alternately, the subject may be a virtual subject and a motion of the virtual object may be recorded relative to the second reference frame by associating a position offset relative to the device with the position of the virtual object in the recorded motion. The motion of the subject relative to the first reference frame may be determined from the tracked motion of the device relative to the first frame and the recorded motion of the subject relative to the second reference frame.
Abstract:
A computational photography system is described herein including a guidance system and a detail enhancement system. The guidance system uses a first neural network that maps an original image provided by an image sensor to a guidance image, which represents a color-corrected and lighting-corrected version of the original image. A combination unit combines the original image and the guidance image to produce a combined image. A detail-enhancement system then uses a second neural network to map the combined image to a predicted image. The predicted image supplements the guidance provided by the first neural network by sharpening details in the original image. A training system is also described herein for training the first and second neural networks. The training system alternates in the data it feeds the second neural network, first using a guidance image as input to the second neural network, and then using a corresponding ground-truth image.
Abstract:
An intelligent assistant device is configured to communicate non-verbal cues. Image data indicating presence of a human is received from one or more cameras of the device. In response, one or more components of the device are actuated to non-verbally communicate the presence of the human. Data indicating context information of the human is received from one or more of the sensors. Using at least this data one or more contexts of the human are determined, and one or more components of the device are actuated to non-verbally communicate the one or more contexts of the human.
Abstract:
An optical system, comprising a multi-spectral optical element, a switchable filter, a dual bandpass filter, and a sensor. The multi-spectral optical element receives light in at least a first spectral band and a second spectral band. The dual bandpass filter filters out wavelengths of light in a transition region of the switchable filter between the first spectral band and the second spectral band. The switchable filter filters light received from the dual bandpass filter in the first spectral band in a first mode where the switchable filter transmits light in the first spectral band and in a second mode where the switchable filter does not transmit light in the first spectral band. The sensor is disposed at an image plane, and the multi-spectral optical element is configured to produce a modulation transfer function value that is a above a predetermined threshold for each of the spectral bands.
Abstract:
Methods and systems for automatically generating training data for use in machine learning are disclosed. The methods can involve the use of environmental data derived from first and second environmental sensors for a single event. The environmental data types derived from each environmental sensor are different. The event is detected based on first environmental data derived from the first environmental sensor, and a portion of second environmental data derived from the second environmental sensor is selected to generate training data for the detected event. The resulting training data can be employed to train machine learning models.
Abstract:
An entity-tracking computing system receives sensor information from a plurality of different sensors. The positions of entities detected by the various sensors are resolved to an environment-relative coordinate system so that entities identified by one sensor can be tracked across the fields of detection of other sensors.
Abstract:
Embodiments that relate to determining gaze locations are disclosed. In one embodiment a method includes shining light along an outbound light path to the eyes of the user wearing glasses. Upon detecting the glasses, the light is dynamically polarized in a polarization pattern that switches between a random polarization phase and a single polarization phase, wherein the random polarization phase includes a first polarization along an outbound light path and a second polarization orthogonal to the first polarization along a reflected light path. The single polarization phase has a single polarization. During the random polarization phases, glares reflected from the glasses are filtered out and pupil images are captured. Glint images are captured during the single polarization phase. Based on pupil characteristics and glint characteristics, gaze locations are repeatedly detected.
Abstract:
Global and local light detection techniques in optical sensor systems are described. In one or more implementations, a global lighting value is generated that describes a global lighting level for a plurality of optical sensors based on a plurality of inputs received from the plurality of optical sensors. An illumination map is generated that describes local lighting conditions of respective ones of the plurality of optical sensors based on the plurality of inputs received from the plurality of optical sensors. Object detection is performed using an image captured using the plurality of optical sensors along with the global lighting value and the illumination map.
Abstract:
A first intelligent assistant computing device configured to receive and respond to natural language inputs provided by human users syncs to a reference clock of a wireless computer network. The first intelligent assistant computing device receives a communication sent by a second intelligent assistant computing device indicating a signal emission time at which the second intelligent assistant computing device emitted a position calibration signal. The first intelligent assistant computing device records a signal detection time at which the position calibration signal was detected. Based on a difference between 1) the signal emission time and the signal detection time, and 2) a known propagation speed of the position calibration signal, a distance between the first and second intelligent assistant computing devices is calculated.