-
公开(公告)号:US20230079275A1
公开(公告)日:2023-03-16
申请号:US17985000
申请日:2022-11-10
Inventor: Tianyi Wu , Yu Zhu , Guodong Guo
Abstract: The present disclosure provides a method and apparatus for training a semantic segmentation model and a method and apparatus for performing a semantic segmentation on a video. The method comprises: acquiring a training sample set, wherein a training sample in the training sample set comprises at least one sample video stream and a pixel-level annotation result of the sample video stream; modeling a spatiotemporal context between video frames in the sample video stream using an initial semantic segmentation model to obtain a context representation of the sample video stream; calculating a temporal contrastive loss based on the context representation of the sample video stream and the pixel-level annotation result of the sample video stream; and updating a parameter of the initial semantic segmentation model based on the temporal contrastive loss to obtain a trained semantic segmentation model.
-
2.
公开(公告)号:US20220392493A1
公开(公告)日:2022-12-08
申请号:US17887179
申请日:2022-08-12
Inventor: Jimin Pi , Xin Wang , Guodong Guo
IPC: G11B27/036 , G06V30/416 , G10L13/08 , G06F16/735
Abstract: This disclosure provides a video generation method, a video generation apparatus, an electronic device, a storage medium and a program product, and relates to the field of artificial intelligence technology, and in particular to the field of computer vision technology and deep learning technology. A specific implementation includes: obtaining document content information of a document; extracting, from the document content information, populating information for multiple scenes in a preset video template; populating the populating information for the multiple scenes into corresponding scenes in the preset video template, respectively, to obtain image information of the multiple scenes; generating audio information of the multiple scenes according to the populating information for the multiple scenes; generating a video of the document based on the image information and audio information of the multiple scenes.
-
公开(公告)号:US20230135109A1
公开(公告)日:2023-05-04
申请号:US18050672
申请日:2022-10-28
Inventor: Tianyi Wu , Sitong Wu , Guodong Guo
IPC: G06K9/62
Abstract: A method for processing a signal includes: in response to receiving an input feature map of the signal, dividing the input feature map into patches of a plurality of rows and patches of a plurality of columns, in which the input feature map represents features of the signal; selecting a row subset from the plurality of rows and a column subset from the plurality of columns, in which rows in the row subset are at least one row apart from each other, and columns in the column subset are at least one column apart from each other; and obtaining aggregated features by performing self-attention calculation on patches of the row subset and patches of the column subset.
-
4.
公开(公告)号:US11929100B2
公开(公告)日:2024-03-12
申请号:US17887179
申请日:2022-08-12
Inventor: Jimin Pi , Xin Wang , Guodong Guo
IPC: G11B27/036 , G06F16/735 , G06V30/416 , G10L13/08
CPC classification number: G11B27/036 , G06F16/735 , G06V30/416 , G10L13/08
Abstract: This disclosure provides a video generation method, a video generation apparatus, an electronic device, a storage medium and a program product, and relates to the field of artificial intelligence technology, and in particular to the field of computer vision technology and deep learning technology. A specific implementation includes: obtaining document content information of a document; extracting, from the document content information, populating information for multiple scenes in a preset video template; populating the populating information for the multiple scenes into corresponding scenes in the preset video template, respectively, to obtain image information of the multiple scenes; generating audio information of the multiple scenes according to the populating information for the multiple scenes; generating a video of the document based on the image information and audio information of the multiple scenes.
-
公开(公告)号:US20230046088A1
公开(公告)日:2023-02-16
申请号:US17975874
申请日:2022-10-28
Inventor: Tianyi Wu , Yu Zhu , Guodong Guo
IPC: G06V10/77 , G06V10/774
Abstract: Disclosed are a method for training a Student Network and a method for recognizing an image. The method includes: acquiring first prediction feature information of a sample image on the first granularity and second prediction feature information of the sample image on the second granularity by inputting the sample image into a Student Network, and acquiring first feature information of the sample image on the first granularity and second feature information of the sample image on the second granularity by inputting the sample image into a Teacher Network, and acquiring a target Student Network.
-
-
-
-