-
1.
公开(公告)号:US20220391693A1
公开(公告)日:2022-12-08
申请号:US17372367
申请日:2021-07-09
Applicant: Apple Inc.
Inventor: Atila Orhon
Abstract: This application relates to use of transformer neural networks to generate dynamic parameters for use in convolutional neural networks. In various embodiments, received image data is encoded and the encoded signal is sent to both a decoder and a transformer neural network. The decoder outputs a decoded data for input into a convolutional neural network. The transformer outputs a set of dynamic parameter values for input into the convolutional neural network. The convolutional neural network may use the decoded data and the set of dynamic parameter values to output instance image data show identifying a number of objects in an image. In various embodiments, the decoded data is also used to generate semantic data. The semantic data may be combined with the instance data to form panoptic image data.
-
公开(公告)号:US11887310B2
公开(公告)日:2024-01-30
申请号:US17078086
申请日:2020-10-22
Applicant: Apple Inc.
Inventor: Vignesh Jagadeesh , Atila Orhon
IPC: G06T7/00 , G06T7/11 , G06T7/10 , G06F17/18 , G06N20/00 , G06N3/047 , G06V10/762 , G06V10/77 , G06V10/82
CPC classification number: G06T7/11 , G06F17/18 , G06N3/047 , G06N20/00 , G06T7/10 , G06V10/763 , G06V10/7715 , G06V10/82
Abstract: A first subset of pixels of an image may be labeled with an object identifier based on user interactions with the image. Pixel data representing the pixels of the image may be passed through an embedding neural network model to generate pixel embedding vectors. A prototype embedding vector associated with the object identifier may be generated based pixel embedding vectors corresponding to the first subset of pixels. For each pixel of a second subset of pixels of the image, a probability that the pixel should be labeled with the object identifier may be determined based on the prototype embedding vector and pixel embedding vectors corresponding to the second subset of pixels. Pixels of the second subset of pixels may be labeled with the object identifier based on the determined probabilities, and the pixels in the image may be segmented based on the pixels labeled with the object identifier.
-
3.
公开(公告)号:US12033075B2
公开(公告)日:2024-07-09
申请号:US17372367
申请日:2021-07-09
Applicant: Apple Inc.
Inventor: Atila Orhon
Abstract: This application relates to use of transformer neural networks to generate dynamic parameters for use in convolutional neural networks. In various embodiments, received image data is encoded and the encoded signal is sent to both a decoder and a transformer neural network. The decoder outputs a decoded data for input into a convolutional neural network. The transformer outputs a set of dynamic parameter values for input into the convolutional neural network. The convolutional neural network may use the decoded data and the set of dynamic parameter values to output instance image data show identifying a number of objects in an image. In various embodiments, the decoded data is also used to generate semantic data. The semantic data may be combined with the instance data to form panoptic image data.
-
公开(公告)号:US20210073589A1
公开(公告)日:2021-03-11
申请号:US16821315
申请日:2020-03-17
Applicant: Apple Inc.
Inventor: Atila Orhon , Marco Zuliani , Vignesh Jagadeesh
Abstract: Training a network for image processing with temporal consistency includes obtaining un-annotated frames from a video feed. A pretrained network is applied to the first frame of first frame set comprising a plurality of frames to obtain a first prediction, wherein the pretrained network is pretrained for a first image processing task. A current version of the pretrained network is applied to each frame of the first frame set to obtain a first prediction. A content loss term is determined, based on the first prediction and a current prediction for the frame, based on the current network. A temporal consistency loss term is also determined based on a determined consistency of pixels within each frame of the first frame set. The pretrained network may be refined based on the content loss term and the temporal term to obtain a refined network.
-
-
-