Abstract:
The disclosure relates to technology for generating a compressed neural network. A weight tensor is received from a neural network to be compressed, and it is reordered to be compressed to have an inner two-dimensional (2D) shape and a 2D sparse bitmap. A layered structure is generated that represents the reordered weight tensor, and the reordered weight tensor is divided into a group of coefficients (GOCs). An encoding mode is selected to generate a quantized reordered weight tensor using one of a codebook or direct quantization, and a column swapped quantized reordered weigh tensor is generated. A compressed neural network is formed by encoding and the compressed representation of the neural network is transmitted to a target system for decompression.
Abstract:
A computer-implemented method verifies an image based authentication via one or more processors performing operations including receiving image data corresponding to a face identified by a facial recognition system, processing the received raw image data via a deep neural network trained on training data that includes images of both verified and fake faces to perform a temporal facial analysis, and generating a verification signal in response to the temporal facial analysis to indicate whether the raw image data is fake.
Abstract:
A system and method of tracking an object and navigating an object tracking robot includes receiving tracking sensor input representing the object and an environment at multiple times, responsive to the tracking sensor input, calculating positions of the robot and the object at the multiple times, and using a computer implemented deep reinforcement learning (DRL) network trained as a function of tracking quality rewards and robot navigation path quality rewards, the DRL network being responsive to the calculated positions of the robot and the object at the multiple times to determine possible actions specifying movement of the object tracking robot from a current position of the robot and target, determine quality values (Q-values) for the possible actions, and select an action as a function of the Q-values. A method of training the DRL network is also included.
Abstract:
A computer-implemented method for a three-dimensional (3D) reconstruction of a dynamic scene includes receiving a plurality of color image sequences from a plurality of color imaging sensors, and at least one depth image sequence from at least one depth imaging sensor, where a color imaging sensor quantity is larger than a depth imaging sensor quantity. A plurality of calibrated color image sequences and at least one calibrated depth image sequence are generated based on the plurality of color imaging sequences and the at least one depth image sequence. A plurality of initial 3D patches is constructed using the plurality of calibrated color image sequences and the at least one calibrated depth image sequence. A 3D patch cloud is generated by expanding the plurality of initial 3D patches.
Abstract:
A method for fine-grained object recognition in a robotic system is disclosed that includes obtaining an image of an object from an imaging device. Based on the image, a deep category-level detection neural network is used to detect pre-defined categories of objects. A feature map is generated for each pre-defined category of object detected by the deep category-level detection neural network. Embedded features are generated, based on the feature map, using a deep instance-level detection neural network corresponding to the pre-defined category of the object, wherein each pre-defined category of an object comprises a corresponding different instance-level detection neural network. An instance-level of the object is determined based on classification of the embedded features.
Abstract:
A computing device includes a communication interface, a memory, and processing circuitry. The processing circuitry is coupled to the communication interface and to the memory and is configured to execute the operational instructions to perform various functions. The computing device is configured to process a video frame of a video segment on a per-frame basis and based on joint human-object interactive activity (HOIA) to generate a per-frame pairwise human-object interactive (HOI) feature based on a plurality of candidate HOI pairs. The computing device is also configured to process the per-frame pairwise HOI feature to identify a valid HOI pair among the plurality of candidate HOI pairs and to track the valid HOI pair through subsequent frames of the video segment to generate a contextual spatial-temporal feature for the valid HOI pair to be used in activity detection.
Abstract:
A method including determining, for each reachable source-destination (S-D) node pair, a maximal number of optical channel (OCh) paths available for source-destination node pair connections and determining, if a preselected latency threshold is specified, for all available OCh paths a maximal number of paths satisfying the preselected latency threshold. The available OCh paths are reported to a Multi-Domain Service coordinator (MDSC).
Abstract:
An encoding apparatus includes a processor configured to receive a video frame including screen content and generate a block containing an index map of colors for screen content in the video frame. The block includes a first string of index values and a second string of the index values immediately below the first string. The processor is also configured to encode a second string palette_run_type flag corresponding to the second string without referencing a first string palette_run_type flag corresponding to the first string and using a single available context. A transmitter operably coupled to the processor is configured to transmit the second string palette_run_type flag in a bitstream to a decoding apparatus.
Abstract:
An encoding apparatus, decoding apparatus, and coding methods are provided. A method of decoding including receiving, by a decoder, a bitstream from an encoder, scanning, using the decoder, the bitstream to identify a first flag corresponding to a string of index values in a block other than a last string and a second flag corresponding to the last string of index values from the block, determining, by the decoder, that a context model used to encode the first flag is the same as the context model used to encode the second flag, and generating, by the decoder, a video frame using the context model
Abstract:
Provided are an apparatus and a method of multi-stage image recognition. For the multi-stage image recognition, categorized object data is received from a first deep neural network. A second deep neural network is trained on subcategory customization data that relates to a non-ideal environment when the second deep neural network produces invalid subcategorized object data from the categorized object data, and generates an image recognition result using the second deep neural network as trained.