摘要:
Embodiments of the present invention recite a method and system for improving the fidelity of a dialog system. In one embodiment, a first input generated by a user of a first system operating in a first modality is accessed. In embodiments of the present invention, the first system also generates a first output corresponding to the first input. An second input from a second user, who is engaged in a conversation with the first user, is accessed by a second system. The second input is then utilized to modify the first output of the first system.
摘要:
A set of reference videos is indexed to a reference index in order to facilitate matching of video content. An indexing module receives a set of reference fingerprints representing a set of reference videos and identifies keys contained in the reference fingerprints. Reference identifiers for the reference videos are stored in bins of the reference index associated with the identified keys. The bins in the reference index are sub-sampled to limit the number of reference identifiers stored in a given bin.
摘要:
A set of reference videos is indexed to a reference index in order to facilitate matching of video content. An indexing module receives a set of reference fingerprints representing a set of reference videos and identifies keys contained in the reference fingerprints. Reference identifiers for the reference videos are stored in bins of the reference index associated with the identified keys. The bins in the reference index are sub-sampled to limit the number of reference identifiers stored in a given bin.
摘要:
Digital mapping techniques are disclosed that provide visually-oriented information to the user, such as driving directions that include visual data points along the way of the driving route, thereby improving the user experience. The user may preview the route associated with the driving directions, where the preview is based on, for example, at least one of satellite images, storefront images, and heuristics and/or business listings. The visually-oriented information can be presented to the user in a textual, graphical, or verbal format, or some combination thereof.
摘要:
The estimation of an HRTF for a given individual is accomplished by means of a coupled model, which identifies the dependencies between one or more images of readily observable characteristics of an individual, and the HRTF that is applicable to that individual. Since the HRTF is highly influenced by the shape of the listener's outer ear, as well as the shape of the listener's head, images of a listener which provides this type of information are preferably applied as an input to the coupled model. In addition, dimensional measurements of the listener can be applied to the model. In return, the model provides an estimate of the HRTF for the observed characteristics of the listener.
摘要:
A method including training a plurality of learning systems, each learning system implementing a learning function and having an input and producing an output, initializing one or more data structures, and evaluating a target sample is described. Also described are methods that include initializing one or more data structures and evaluating a target sample for a best match.
摘要:
Embodiments of the present invention recited a method and system for modifying a media stream. In one embodiment, a request is received to modify a media stream from a current display rate to a desired display rate. In response to the request, the media stream dynamically processed to create a modified media stream which is compliant with a pre-determined frame-rate limitation and with a pre-determined bit-rate limitation.
摘要:
A system and method detects matches between portions of video content. A matching module receives an input video fingerprint representing an input video and a set of reference fingerprints representing reference videos in a reference database. The matching module compares the reference fingerprints and input fingerprints to generate a list of candidate segments from the reference video set. Each candidate segment comprises a time-localized portion of a reference video that potentially matches the input video. A classifier is applied to each of the candidate segments to classify the segment as a matching segment or a non-matching segment. A result is then outputted identifying a matching portion of a reference video from the reference video set based on the segments classified as matches.
摘要:
Systems, methods, devices, and computer program products provide social and interactive applications for mass media based on real time ambient-audio and/or video identification. In some implementations, a method includes: receiving descriptors identifying ambient audio associated with a media broadcast; comparing the descriptors to one or more reference descriptors; and determining a rating for the media broadcast based at least in part on the results of the comparison.
摘要:
Systems, methods, devices, and computer program products provide social and interactive applications for detecting repeating content in broadcast media. In some implementations, a method includes: generating a database of audio statistics from content; generating a query from the database of audio statistics; running the query against the database of audio statistics to determine a non-identity match; if a non-identity match exists, identifying the content corresponding to the matched query as repeating content.