Abstract:
A computer-implemented method executed by at least one processor for detecting tattoos on a human body is presented. The method includes inputting a plurality of images into a tattoo detector, selecting one or more images of the plurality of images including tattoos, extracting, via a feature extractor, tattoo feature vectors from the tattoos found in the one or more images of the plurality of images including tattoos, applying a deep learning tattoo matching model to determine potential matches between the tattoo feature vectors and preexisting tattoo images stored in a tattoo training database, and generating a similarity score between the tattoo feature vectors and one or more of the preexisting tattoo images stored in the tattoo training database.
Abstract:
A computer-implemented method executed by at least one processor for detecting tattoos on a human body is presented. The method includes inputting a plurality of images into a tattoo detection module, selecting one or more images of the plurality of images including tattoos with at least three keypoints, the at least three keypoints having auxiliary information related to the tattoos, manually labeling tattoo locations in the plurality of images including tattoos to create labeled tattoo images, increasing a size of the labeled tattoo images identified to be below a predetermined threshold by padding a width and height of the labeled tattoo images, training two different tattoo detection deep learning models with the labeled tattoo images defining tattoo training data, and executing either the first tattoo detection deep learning model or the second tattoo detection deep learning model based on a performance of a general-purpose graphical processing unit.
Abstract:
Methods and systems for deploying a video analytics system include determining one or more applications for a security system in an environment, including one or more constraints. Each functional module in a directed graph representation of one or more applications is profiled to generate one or more configurations for each functional module. The nodes of each graph representation represent functional modules of the respective application, and repeated module configurations are skipped. Resource usage for each of the one or more applications is estimated using the one or more configurations of each functional module and the one or more constraints. The one or more applications are deployed in the environment.
Abstract:
A computer-implemented method for emulating an object recognizer includes receiving testing image data, and emulating, by employing a first object recognizer, a second object recognizer. Emulating the second object recognizer includes using the first object recognizer to perform object recognition on a testing object from the testing image data to generate data, the data including a feature representation for the testing object, and classifying the testing object based on the feature representation and a machine learning model configured to predict whether the testing object would be recognized by a second object recognizer. The method further includes triggering an action to be performed based on the classification.
Abstract:
A method is provided for classifying objects. The method detects objects in one or more images. The method tags each object with multiple features. Each feature describes a specific object attribute and has a range of values to assist with a determination of an overall quality of the one or more images. The method specifies a set of training examples by classifying the overall quality of at least some of the objects as being of an acceptable quality or an unacceptable quality, based on a user's domain knowledge about an application program that takes the objects as inputs. The method constructs a plurality of first-level classifiers using the set of training examples. The method constructs a second-level classifier from outputs of the first-level automatic classifiers. The second-level classifier is for providing a classification for at least some of the objects of either the acceptable quality or the unacceptable quality.
Abstract:
Systems and methods are disclosed for speeding up a computer having a graphics processing unit (GPU) and a general purpose processor (GP-GPU) by decoupling a convolution process for a first matrix into a row part and a column part; expanding the row part into a second matrix; performing matrix multiplication using the second matrix and a filter matrix; and performing reduction on an output matrix.
Abstract:
Aspects of the present disclosure are directed to techniques that improve performance of streaming systems. Accordingly we disclose efficient techniques for dynamic topology re-optimization, through the use of a feedback-driven control loop that substantially solve a number of these performance-impacting problems affecting such streaming systems. More particularly, we disclose a novel technique for network-aware tuple routing using consistent hashing that improves stream flow throughput in the presence of large, run-time overhead. We also disclose methods for dynamic optimization of overlay topologies for group communication operations. To enable fast topology re-optimization with least system disruption, we present a lightweight, fault-tolerant protocol. All of the disclosed techniques were implemented in a real system and comprehensively validated on three real applications. We have demonstrated significant improvement in performance (20% to 200%), while overcoming various compute and network bottlenecks. We have shown that our performance improvements are robust to dynamic changes, as well as complex congestion patterns. Given the importance of stream processing systems and the ubiquity of dynamic network state in cloud environments, our results represent a significant and practical solution to these problems and deficiencies.
Abstract:
Systems and methods are disclosed for speeding up a computer having a graphics processing unit (GPU) and a general purpose processor (GP-GPU) by decoupling a convolution process for a first matrix into a row part and a column part; expanding the row part into a second matrix; performing matrix multiplication using the second matrix and a filter matrix; and performing reduction on an output matrix.
Abstract:
Systems and methods for source-to-source transformation for compiler optimization for many integrated core (MIC) coprocessors, including identifying data dependencies in candidate loops and data elements used in each iteration for arrays, profiling candidate loops to find a proper number m, wherein data transfer and computation for m iterations take an equal amount of time, and creating an outer loop outside the candidate loop, with each iteration of the outer loop executing m iterations of the candidate loop. Data streaming is performed by determining optimum buffer size for one or more arrays and inserting code before the outer loop to create optimum sized buffers, overlapping data transfer between central processing units (CPUs) and MICs with the computation; reusing buffers to reduce memory employed on the MICs, and reusing threads on MICs to repeatedly launch kernels on the MICs for asynchronous data transfer.
Abstract:
Methods are provided. A method includes capturing a snapshot of an offload process being executed by one or more many-core processors. The offload process is in signal communication with a host process being executed by a host processor. At least the offload is in signal communication with a monitoring process. The method further includes terminating the offload process on the one or more many-core processors, by the monitor process responsive to a communication between the monitor process and the offload processing being disrupted. The snapshot includes a respective predetermined minimum set of information required to restore a same state of the process as when the snapshot was taken.