ITERATIVE APPROACH FOR WEAKLY-SUPERVISED ACTION LOCALIZATION

    公开(公告)号:US20200286243A1

    公开(公告)日:2020-09-10

    申请号:US16292847

    申请日:2019-03-05

    Abstract: Embodiments of the present invention are directed to a computer-implemented method for action localization. A non-limiting example of the computer-implemented method includes receiving, by a processor, a video and segmenting, by the processor, the video into a set of video segments. The computer-implemented method classifies, by the processor, each video segment into a class and calculates, by the processor, importance scores for each video segment of a class within the set of video segments. The computer-implemented method determines, by the processor, a winning video segment of the class within the set of video segments based on the importance scores for each video segment within the class, stores, by the processor, the winning video segment from the set of video segments, and removes the winning video segment from the set of video segments.

    Neural-symbolic action transformers for video question answering

    公开(公告)号:US12175384B2

    公开(公告)日:2024-12-24

    申请号:US17381408

    申请日:2021-07-21

    Abstract: Mechanisms are provided for performing artificial intelligence-based video question answering. A video parser parses an input video data sequence to generate situation data structure(s), each situation data structure comprising data elements corresponding to entities, and first relationships between entities, identified by the video parser as present in images of the input video data sequence. First machine learning computer model(s) operate on the situation data structure(s) to predict second relationship(s) between the situation data structure(s). Second machine learning computer model(s) execute on a received input question to predict an executable program to execute to answer the received question. The program is executed on the situation data structure(s) and predicted second relationship(s). An answer to the question is output based on results of executing the program.

    GENERATING A TEST DIFFUSION MODEL
    67.
    发明申请

    公开(公告)号:US20240420455A1

    公开(公告)日:2024-12-19

    申请号:US18451878

    申请日:2023-08-18

    Abstract: Techniques regarding generating a synthetic dataset of objects are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can include a defining component that can define a tractable forward process associated with a diffusion model, with defining the tractable forward process including inputting noise to compromise training data, resulting in compromised training data. The computer executable components can further include a training component that, using the compromised training data, trains the diffusion model to reverse process the tractable forward process, wherein the training results in a compromised diffusion model.

    SELF-SUPERVISED SPEECH REPRESENTATIONS BY DISENTANGLING SPEAKERS

    公开(公告)号:US20240170007A1

    公开(公告)日:2024-05-23

    申请号:US18053056

    申请日:2022-11-07

    CPC classification number: G10L25/30 G10L21/0272

    Abstract: A method, computer system and computer program product is presented for providing a self-supervised speech representation. In one embodiment, audio input is received including speech utterances. A label sequence is generated from these speech utterances by a teacher label generator. A speech representation is generated of a partially masked version of the speech utterance using a speech representation network. The speech utterance is passed into two random transformations that alter only speaker information prior to the partial masking. A predictor will then predict the label sequence. In one embodiment performance-based assessment is made on a cross-entropy loss between the generated label sequence and a predicted label sequence.

Patent Agency Ranking