Abstract:
Embodiments are directed towards performing depth estimation within a digital camera system based on interpolation of inverse focus statistics. After an image is captured, various statistics or focus measure may be calculated using, for example, a high pass filter. Depth is estimated by interpolating the inverse of the statistics for three positions of focus for the image. The inverse of the statistics, St(n), may be 1/St(n), or 1/St2(n), or even 1/StZ(n), where Z≧1. Several approaches to interpolating the inverse values of the statistics to obtain a depth estimate are disclosed, including a general parabolic minimum approach, using a parabolic minimum within a progressive scheme, or within a continuous AF scheme. The depth estimate may then be used for a variety of applications, including automatic focusing, as well as converting 2D images to 3D images.
Abstract:
Systems and techniques are described herein for adapting a pretrained machine learning model. For instance, a process can include encoding a training image into a first feature vector, the training image including a first object located at a first location; generating a second feature vector based on a set of sinusoidal functions using a set of weights; combining the first feature vector with a second feature vector to generate a combined feature vector; processing the combined feature vector using a visual language model to obtain a second location for the first object; and adjusting the set of weights based on a comparison between the first location and the second location.
Abstract:
A processor-implemented method includes obtaining, with a backbone artificial neural network, an original feature map of point cloud data. The method also includes deforming the point cloud data, with a deformation artificial neural network, into a number of deformed point cloud objects based on the original feature map of point cloud data. The method further includes combining the deformed point cloud objects into a mixed point cloud. The method still further includes extracting, with the backbone artificial neural network, a mixed feature map from the mixed point cloud. The method includes extracting a number of deformed feature maps from the deformed point cloud objects. The method still further includes computing, with a contrastive module, a loss for the backbone artificial neural network and for the deformation artificial neural network based on the mixed feature map and the deformed feature maps.
Abstract:
Bare die package with guard to reduce or prevent material seepage into an air cavity, and related fabrication methods. In exemplary aspects, to avoid or reduce material (e.g., an encapsulation material such as a mold material and/or a coating material) from entering or seeping into the air cavity in the active filter region of a filter, the die package includes a guard structure. The guard structure is a structure on or adjacent to the die operable to be used in a filter that redirects or reduces material from entering the gap between the die and the substrate. The guard structure reduces or prevents the material entering the air cavity of the die so as to avoid such material affecting the acoustic performance of the air cavity of the filter.
Abstract:
Image Signal Processing (ISP) optimization framework for computer vision applications is disclosed. The tuning of the ISP is performed automatically and presented as a nonlinear multi-objective optimization problem, followed by solving the problem using an evolutionary stochastic solver. An improved ISP of the embodiments of the invention includes at least features of search space reduction for reducing a number of ISP configurations, remapping the generated population to the reduced search space via mirroring, and global optimization function processing, which allow tuning all the blocks of the ISP at the same time instead of the prior art tuning of each ISP block separately. Also shown that an ISP tuned for image quality performs inferior compared with an ISP trained for a specific downstream image recognition task.
Abstract:
Certain aspects of the present disclosure provide a method for performing machine learning, comprising: determining a plurality of vertices in a neighborhood associated with a mesh including a target vertex; determining a linear transformation configured to parallel transport signals along all edges in the mesh to the target vertex; applying the linear transformation to the plurality of vertices in the neighborhood to form a combined signal at the target vertex; determining a set of basis filters; linearly combining the basis filters using a set of learned parameters to form a gauge equivariant convolution filter, wherein the gauge equivariant convolution filter is constrained to maintain gauge equivariance; applying the gauge equivariant convolution filter to the combined signal to form an intermediate output; and applying a nonlinearity to the intermediate output to form a convolution output.
Abstract:
A computer-implemented method for contrastive object representation from temporal data using an artificial neural network (ANN) includes receiving, by the ANN, a video. The video comprises a temporal sequence of frames including images of one or more objects. The ANN generates object representations corresponding to the one or more objects based on temporal data of multiple frames of the temporal sequence of frames. The object representations are communicated to a receiver.
Abstract:
Aspects presented herein relate to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may divide at least one scene into a plurality of meshlets, each of the meshlets including a plurality of primitives, and each of the primitives including plurality of vertices. The apparatus may also calculate a pair of texture coordinates for each of the plurality of vertices. Further, the apparatus may select a size of each of the plurality of meshlets in the at least one scene based on the pair of the texture coordinates and based on a perspective projection of each of the plurality of meshlets. The apparatus may also calculate layout information in a meshlet atlas for each of the meshlets in the at least one scene. Moreover, the apparatus may shade each of a plurality of pixels in the meshlet atlas based on the calculated layout information.
Abstract:
Illustrative embodiments enable a MEMS transducer to quickly recover from, acoustic overload events by quickly resetting signal processing circuitry downstream from a MEMS transducer. An acoustic overload sensor detects occurrence of an acoustic overload event, and triggers a reset circuit to operate a set of switches to rapidly drain charge from a corresponding set of capacitances within the transducer, or within the signal processing circuitry, thereby resetting the signal processing circuitry more rapidly than would occur if said transducer or circuitry were allowed to recover on its own.
Abstract:
A method for recognizing long-range activities in videos includes segmenting an input video stream to generate multiple frame sets. For each of the frame sets, a frame with a highest likelihood of including one or more actions of a set of predefined actions is identified regardless of its order in the frame set. A global representation of the input stream is generated based on pooled representations of the identified frames. A long-range activity in the video stream is classified based on the global representation.