SKELETON-BASED ACTION RECOGNITION USING BI-DIRECTIONAL SPATIAL-TEMPORAL TRANSFORMER

    公开(公告)号:US20220374629A1

    公开(公告)日:2022-11-24

    申请号:US17315319

    申请日:2021-05-09

    Abstract: A bi-directional spatial-temporal transformer neural network (BDSTT) is trained to predict original coordinates of a skeletal joint in a specific frame through relative relationships of the skeletal joint to other joints and to the state of the skeletal joint in other frames. Obtain a plurality of frames comprising coordinates of the skeletal joint and coordinates of other joints. Produce a spatially masked frame by masking the original coordinates of the skeletal joint. Provide the specific frame, the spatially masked frame, and at least one more frame to a coordinate prediction head of the BDSTT. Obtain, from the coordinate prediction head, a prediction of coordinates for the skeletal joint. Adjust parameters of the BDSTT until a mean-squared error, between the prediction of coordinates for the skeletal joint and the original coordinates of the skeletal joint, converges.

    Automatic generation of presentation slides from documents

    公开(公告)号:US11481425B2

    公开(公告)日:2022-10-25

    申请号:US17181397

    申请日:2021-02-22

    Abstract: Systems and methods for creating presentation slides. A slide title is received and portions of source documents relevant to the title are identified based on a dense vector information retrieval machine learning process. An abstractive summary of the portions is generated based on a long form question answering machine learning process. A first presentation slide is created with the abstractive summary and the title. The first presentation slide is presented to an operator and an input indicating one of accepting or rejection the abstractive summary is received. Based on the input that indicating rejecting the abstractive summary, the abstractive summary is removed from the presentation slide and negative training feedback for the abstractive summary is provided to at least one of the dense vector information retrieval machine learning process or the long form question answering machine learning process.

    UNSUPERVISED VIDEO REPRESENTATION LEARNING

    公开(公告)号:US20220309278A1

    公开(公告)日:2022-09-29

    申请号:US17216605

    申请日:2021-03-29

    Abstract: Unsupervised learning for video classification. One or more features from one or more video clips are extracted using a spatial-temporal encoder. The one or more extracted features are processed, using a video instance discrimination task, to generate a classification label, the classification label indicating whether two of the video clips are from a same video. The one or more extracted features are processed, using a pair-wise speed discrimination task, to generate a comparison label, the comparison label indicating a relative playback speed between two given video clips. A search is performed in a video database for a video that is similar to a given video based on the comparison label.

    Summarizing Videos Via Side Information

    公开(公告)号:US20220129679A1

    公开(公告)日:2022-04-28

    申请号:US17081239

    申请日:2020-10-27

    Abstract: Machine learning-based techniques for summarizing collections of data such as image and video data leveraging side information obtained from related (e.g., video) data are provided. In one aspect, a method for video summarization includes: obtaining related videos having content related to a target video; and creating a summary of the target video using information provided by the target video and side information provided by the related videos to select portions of the target video to include in the summary. The side information can include video data, still image data, text, comments, natural language descriptions, and combinations thereof.

    Iterative approach for weakly-supervised action localization

    公开(公告)号:US11257222B2

    公开(公告)日:2022-02-22

    申请号:US16292847

    申请日:2019-03-05

    Abstract: Embodiments of the present invention are directed to a computer-implemented method for action localization. A non-limiting example of the computer-implemented method includes receiving, by a processor, a video and segmenting, by the processor, the video into a set of video segments. The computer-implemented method classifies, by the processor, each video segment into a class and calculates, by the processor, importance scores for each video segment of a class within the set of video segments. The computer-implemented method determines, by the processor, a winning video segment of the class within the set of video segments based on the importance scores for each video segment within the class, stores, by the processor, the winning video segment from the set of video segments, and removes the winning video segment from the set of video segments.

Patent Agency Ranking