METHOD AND APPARATUS FOR TRAINING SEMANTIC SEGMENTATION MODEL, AND METHOD AND APPARATUS FOR PERFORMING SEMANTIC SEGMENTATION ON VIDEO

    公开(公告)号:US20230079275A1

    公开(公告)日:2023-03-16

    申请号:US17985000

    申请日:2022-11-10

    Abstract: The present disclosure provides a method and apparatus for training a semantic segmentation model and a method and apparatus for performing a semantic segmentation on a video. The method comprises: acquiring a training sample set, wherein a training sample in the training sample set comprises at least one sample video stream and a pixel-level annotation result of the sample video stream; modeling a spatiotemporal context between video frames in the sample video stream using an initial semantic segmentation model to obtain a context representation of the sample video stream; calculating a temporal contrastive loss based on the context representation of the sample video stream and the pixel-level annotation result of the sample video stream; and updating a parameter of the initial semantic segmentation model based on the temporal contrastive loss to obtain a trained semantic segmentation model.

    VIDEO GENERATION METHOD, APPARATUS, ELECTRONIC DEVICE, STORAGE MEDIUM AND PROGRAM PRODUCT

    公开(公告)号:US20220392493A1

    公开(公告)日:2022-12-08

    申请号:US17887179

    申请日:2022-08-12

    Abstract: This disclosure provides a video generation method, a video generation apparatus, an electronic device, a storage medium and a program product, and relates to the field of artificial intelligence technology, and in particular to the field of computer vision technology and deep learning technology. A specific implementation includes: obtaining document content information of a document; extracting, from the document content information, populating information for multiple scenes in a preset video template; populating the populating information for the multiple scenes into corresponding scenes in the preset video template, respectively, to obtain image information of the multiple scenes; generating audio information of the multiple scenes according to the populating information for the multiple scenes; generating a video of the document based on the image information and audio information of the multiple scenes.

    METHOD FOR PROCESSING SIGNAL, ELECTRONIC DEVICE, AND STORAGE MEDIUM

    公开(公告)号:US20230135109A1

    公开(公告)日:2023-05-04

    申请号:US18050672

    申请日:2022-10-28

    Abstract: A method for processing a signal includes: in response to receiving an input feature map of the signal, dividing the input feature map into patches of a plurality of rows and patches of a plurality of columns, in which the input feature map represents features of the signal; selecting a row subset from the plurality of rows and a column subset from the plurality of columns, in which rows in the row subset are at least one row apart from each other, and columns in the column subset are at least one column apart from each other; and obtaining aggregated features by performing self-attention calculation on patches of the row subset and patches of the column subset.

    Video generation method, apparatus, electronic device, storage medium and program product

    公开(公告)号:US11929100B2

    公开(公告)日:2024-03-12

    申请号:US17887179

    申请日:2022-08-12

    CPC classification number: G11B27/036 G06F16/735 G06V30/416 G10L13/08

    Abstract: This disclosure provides a video generation method, a video generation apparatus, an electronic device, a storage medium and a program product, and relates to the field of artificial intelligence technology, and in particular to the field of computer vision technology and deep learning technology. A specific implementation includes: obtaining document content information of a document; extracting, from the document content information, populating information for multiple scenes in a preset video template; populating the populating information for the multiple scenes into corresponding scenes in the preset video template, respectively, to obtain image information of the multiple scenes; generating audio information of the multiple scenes according to the populating information for the multiple scenes; generating a video of the document based on the image information and audio information of the multiple scenes.

    METHOD FOR TRAINING STUDENT NETWORK AND METHOD FOR RECOGNIZING IMAGE

    公开(公告)号:US20230046088A1

    公开(公告)日:2023-02-16

    申请号:US17975874

    申请日:2022-10-28

    Abstract: Disclosed are a method for training a Student Network and a method for recognizing an image. The method includes: acquiring first prediction feature information of a sample image on the first granularity and second prediction feature information of the sample image on the second granularity by inputting the sample image into a Student Network, and acquiring first feature information of the sample image on the first granularity and second feature information of the sample image on the second granularity by inputting the sample image into a Teacher Network, and acquiring a target Student Network.

Patent Agency Ranking