Abstract:
Video description generation using neural network training based on relevance and coherence is described. In some examples, long short-term memory with visual-semantic embedding (LSTM-E) can maximize the probability of generating the next word given previous words and visual content and can create a visual-semantic embedding space for enforcing the relationship between the semantics of an entire sentence and visual content. LSTM-E can include a 2-D and/or 3-D deep convolutional neural networks for learning powerful video representation, a deep recurrent neural network for generating sentences, and a joint embedding model for exploring the relationships between visual content and sentence semantics.
Abstract:
Video description generation using neural network training based on relevance and coherence is described. In some examples, long short-term memory with visual-semantic embedding (LSTM-E) can maximize the probability of generating the next word given previous words and visual content and can create a visual-semantic embedding space for enforcing the relationship between the semantics of an entire sentence and visual content. LSTM-E can include a 2-D and/or 3-D deep convolutional neural networks for learning powerful video representation, a deep recurrent neural network for generating sentences, and a joint embedding model for exploring the relationships between visual content and sentence semantics.
Abstract:
Techniques and constructs to facilitate suggestion of image-based search queries can provide personalized trending image search queries. The constructs may enable identification of trending image searches and further personalize those trending image search queries for an identified user based on information about on the user's search history and the search histories of other users. The constructs also may select a representative image for display to the user, such that selection of the representative image will execute the search query. The representative image may be selected from a plurality of candidate images based on its burstiness.
Abstract:
A layout generator generates a layout for communications media content based on an identified domain (e.g., topic). A communications media content analyzer identifies a domain associated with communications media content. A domain-based layout guide selector receives the identified domain from the communications media content analyzer and selects a domain-based layout guide based on the identified domain. The domain-based layout guide is selected from a set of domain-based layout guides stored in memory accessible by the one or more processors. The set of domain-based layout guides is associated with multiple domains. A communications media content layout generator receives the selected domain-based layout guide from the domain-based layout guide selector and generates a communications media content layout incorporating at least a subset of the communications media content. The communications media content layout complies with the selected domain-based layout guide.
Abstract:
This disclosure describes techniques and architectures to morph well-trained networks to other related applications or modified networks with relatively little retraining. For example, a well-trained neural network (e.g., parent network) may be morphed to a new neural network (e.g., child network) so that the new neural network function may be preserved. After morphing a parent network, the child network may inherit the knowledge from its parent network and also may have a potential to continue growing into a more powerful network. Such morphing and growing may occur with a relatively short training time.
Abstract:
Techniques and constructs to facilitate automatic tagging can provide improvements in image storage and searching. The constructs may enable training a deep network using tagged source images and target images. The constructs may also train a top layer of the deep network using a personal photo ontology. The constructs also may select one or more concepts from the ontology for tagging personal digital images.
Abstract:
Video highlight detection using pairwise deep ranking neural network training is described. In some examples, highlights in a video are discovered, then used for generating summarization of videos, such as first-person videos. A pairwise deep ranking model is employed to learn the relationship between previously identified highlight and non-highlight video segments. This relationship is encapsulated in a neural network. An example two stream process generates highlight scores for each segment of a user's video. The obtained highlight scores are used to summarize highlights of the user's video.
Abstract:
Technologies pertaining to calibration of filters of an audio system are described herein. A mobile computing device is configured to compute values for respective filters, such as equalizer filters, and transmit the values to a receiver device in the audio system. The receiver device causes audio to be emitted from a speaker based upon the values for the filters.
Abstract:
A method of acquiring a set of images useable to 3D model a physical object includes imaging the physical object with a camera, and displaying with the camera a current view of the physical object as imaged by the camera from a current perspective. The method further includes displaying with the camera a visual cue overlaying the current view and indicating perspectives from which the physical object is to be imaged to acquire the set of images.
Abstract:
A privacy preserving sensor apparatus is described herein. The privacy preserving sensor apparatus includes a microphone that is configured to output a signal that is indicative of audio in an environment. The privacy preserving sensor apparatus further includes feature extraction circuitry integrated in the apparatus with the microphone, the feature extraction circuitry configured to extract features from the signal output by the microphone that are usable to detect occurrence of an event in the environment, wherein the signal output by the microphone is unable to be reconstructed based solely upon the features.