Abstract:
Example embodiments disclose a method of generating a feature vector, a method of generating a histogram, a learning unit classifier, a recognition apparatus, and a detection apparatus, in which a feature point is detected from an input image based on a dominant direction analysis of a gradient distribution, and a feature vector corresponding to the detected feature point is generated.
Abstract:
An apparatus and method for analyzing body part association. The apparatus and method may recognize at least one body part from a user image extracted from an observed image, select at least one candidate body part based on association of the at least one body part, and output a user pose skeleton related to the user image based on the selected at least one candidate body part.
Abstract:
A method of estimating a state includes: predicting current state prediction data of a target object by using previous state estimation data of a previous image frame of an image sequence in which the target object is represented, the previous image frame previous to a current image frame; acquiring current target detection data of the target object for the current image frame of the image sequence; and determining current state estimation data of the target object of the current image frame by updating the current state prediction data by using the current target detection data and by using a detection reliability of the current target detection data.
Abstract:
An object detection method and an object detection apparatus for detecting an object based on multi-features are provided. The object detection method includes: obtaining first-sensor data from a first sensor and obtaining second-sensor data from a second sensor, wherein the first sensor is a different type of sensor than the second sensor; extracting a first feature from the first-sensor data and extracting a second feature from the second-sensor data; determining a target feature-type by inputting the first and second features to a feature-type selection model which, based thereon, predicts the target feature-type; determining a target feature to be used for object detection according to the determined target feature-type; and determining an object detection result based on the determined target feature.
Abstract:
A method and apparatus with a feature-level ensemble model are provided. A method of operating an ensemble model based on feature-level consolidation includes: obtaining queries by inputting a same input data item to respective transformer models, the transformer models generating respective queries from the input data item; forming an ensemble query corresponding to the queries; and generating a predicted value of the input data item by applying the ensemble query to a prediction model that includes a transformer decoder, the prediction model inferring the predicted value from the ensemble query.
Abstract:
A method performed by one or more processors of an electronic device includes: processing an input image and point cloud data corresponding to the input image; projecting the point cloud data to generate a first depth map and adding new depth values to the first depth map based on the input image; obtaining a second depth map by inputting the input image to a depth estimation model configured to infer depth maps from input images; and training the depth estimation model based on a loss difference between the first depth map and the second depth map.
Abstract:
An apparatus and method for quantizing a transformer-based target tracking model are provided. The method includes obtaining a transformer-based target tracking model including a template branch, a search branch, a stitching module, and a first transformer module, generating an optimized target tracking model by removing the stitching module from the transformer-based target tracking model and dividing the first transformer module into a second transformer module and a third transformer module, and generating a quantization model corresponding to the optimized target tracking model by quantizing the divided second transformer module independently of quantizing the divided third transformer module.
Abstract:
An apparatus and method for training a neural network model for classification without a teacher model are disclosed. The includes: selecting classes from a database comprising a set of classes; generating a mean feature group comprising mean features extracted from the selected classes; receiving a batch comprising input data and extracting, by the neural network model, a feature from the input data, wherein the neural network model is to be trained according to a mean feature set; determining a first similarity between the extracted feature and a mean feature corresponding to the input data; determining a second similarity comprising a self-similarity of the mean feature; and updating a parameter of the neural network model based on the first similarity and the second similarity.
Abstract:
A processor-implemented method with object tracking includes: performing, using a first template, forward object tracking on first image frames in a first sequence group; determining a template candidate of a second template for second image frames in a second sequence group; performing backward object tracking on the first image frames using the template candidate; determining a confidence of the template candidate using a result of comparing a first tracking result determined by the forward object tracking performed on the first image frames and a second tracking result determined by the backward object tracking performed on the first image frames; determining the second template based on the confidence of the template candidate; and performing forward object tracking on the second image frames using the second template.
Abstract:
A device and method with object recognition is included. In one general aspect, an electronic device includes a camera sensor configured to capture a first image of a scene, the camera sensor is configured to perform at least one type of physical camera motion relative to the electronic device, the at least one type of physical camera motion includes rolling, panning, tilting, or zooming the camera sensor relative to the electronic device, and a processor configured to control the camera sensor to perform a physical motion of the physical camera motion type based on detecting an object in the first image, acquire a second image captured using the camera sensor as adjusted based on the performed physical motion, and recognize the object in the second image.