Abstract:
An image sensor of a camera system captures an image over an image capture interval of time, and waits a blanking interval of time before capturing an additional image. The captured image is provided to a frame controller, and is buffered until an image signal processor accesses the captured image. The image signal processor processes the accessed image over an image processing interval of time, producing a processed image. The image processing interval of time is selected to be greater than the image capture interval of time, but less than the sum of the image capture interval of time and the blanking interval of time. By reducing the image capture interval of time but maintaining an image processing interval of time, rolling shutter artifacts are beneficially reduced without increasing the processing resources or power required by the image signal processor to process the image.
Abstract:
A video identifier uniquely identifying a video captured by a camera is generated. The video includes video frames and optionally concurrently captured audio as well as video metadata describing the video. Video data is extracted from at least two of the video's frames. By combining the extracted video data in an order specified by an identifier generation protocol, an extracted data object is generated. The extracted data object is hashed to generate the unique media identifier, which is stored in association with the video. The identifier generation protocol may indicate the portions of the video data to extract, such as video data corresponding to particular video frames and audio data corresponding to particular audio samples. The extracted data object may include a size of particular video frames, a number of audio samples in the video, or the duration of the video, for example.
Abstract:
Video and corresponding metadata is accessed. Events of interest within the video are identified based on the corresponding metadata, and best scenes are identified based on the identified events of interest. In one example, best scenes are identified based on the motion values associated with frames or portions of a frame of a video. Motion values are determined for each frame and portions of the video including frames with the most motion are identified as best scenes. Best scenes may also be identified based on the motion profile of a video. The motion profile of a video is a measure of global or local motion within frames throughout the video. For example, best scenes are identified from portion of the video including steady global motion. A video summary can be generated including one or more of the identified best scenes.
Abstract:
A cloud video system selectively uploads a high-resolution video and instructs one or more client devices to perform distributed processing on the high-resolution video. A client device registers high-resolution videos accessed by the client device from a camera communicatively coupled to the client device. A portion of interest within a low-resolution video transcoded from the high-resolution video is selected. A task list is generated specifying the selected portion of the high-resolution video and at least one task to perform on the portion of the high-resolution video. Commands are transmitted to prompt the client device to perform the at least one task on the specified portion of the high-resolution video according to the task list. The specified portion of the high-resolution video is modified according to the task list and uploaded to the cloud. Example tasks include transcoding, applying edits, extracting metadata, and generating highlight tags.
Abstract:
A pair of cameras having an overlapping field of view is aligned based on images captured by image sensors of the pair of cameras. A pixel shift is identified between the images. Based on the identified pixel shift, a calibration is applied to one or both of the pair of cameras. To determine the pixel shift, the camera applies correlation methods including edge matching. Calibrating the pair of cameras may include adjusting a read window on an image sensor. The pixel shift can also be used to determine a time lag, which can be used to synchronize subsequent image captures.
Abstract:
A spherical content capture system captures spherical video content. A spherical video sharing platform enables users to share the captured spherical content and enables users to access spherical content shared by other users. In one embodiment, captured metadata or video/audio processing is used to identify content relevant to a particular user based on time and location information. The platform can then generate an output video from one or more shared spherical content files relevant to the user. The output video may include a non-spherical reduced field of view such as those commonly associated with conventional camera systems. Particularly, relevant sub-frames having a reduced field of view may be extracted from each frame of spherical video to generate an output video that tracks a particular individual or object of interest.
Abstract:
A method is described to greatly improve the efficiency of and reduce the complexity of image compression when using single-sensor color imagers for video acquisition. The method in addition allows for this new image compression type to be compatible with existing video processing tools, improving the workflow for film and television production.
Abstract:
A pair of cameras having an overlapping field of view is aligned based on images captured by image sensors of the pair of cameras. A pixel shift is identified between the images. Based on the identified pixel shift, a calibration is applied to one or both of the pair of cameras. To determine the pixel shift, the camera applies correlation methods including edge matching. Calibrating the pair of cameras may include adjusting a read window on an image sensor. The pixel shift can also be used to determine a time lag, which can be used to synchronize subsequent image captures.
Abstract:
A system and method disposed to enable encoding, decoding and manipulation of digital video with substantially less processing load than would otherwise be required. In particular, one disclosed method is directed to generating a compressed video data structure that is selectively decodable to a plurality of resolutions including the full resolution of the uncompressed stream. The desired number of data components and the content of the data components that make up the compressed video data, which determine the available video resolutions, are variable based upon the processing carried out and the resources available to decode and process the data components. During decoding, efficiency is substantially improved because only the data components necessary to generate a desired resolution are decoded. In variations, both temporal and spatial decoding are utilized to reduce frame rates, and hence, further reduce processor load. The system and method are particularly useful for real-time video editing applications.
Abstract:
Methods and apparatus for metadata-based cinematography, production effects, shot selection, and/or other content augmentation. Effective cinematography conveys storyline, emotion, excitement, etc. Unfortunately, most amateur filmmakers lack the knowledge and ability to create cinema quality media. Various aspects of the present disclosure are directed to, among other things, rendering media based on instantaneous metadata. Unlike traditional post-processing techniques that rely on human subjectivity, some of the various techniques described herein leverage the camera's actual experiential data to enable cinema-quality post-processing for the general consuming public. Instantaneous metadata-based cinematography and shot selection advisories and architectures are also described.