SYSTEM AND METHOD FOR CONSISTENT CONTENT CATEGORIZATION VIA CONSISTENT SELF-TRAINING

    公开(公告)号:US20250124258A1

    公开(公告)日:2025-04-17

    申请号:US18487487

    申请日:2023-10-16

    Abstract: The present teaching relates to content categorization. Supervised training data and unlabeled data clusters are used to generate augmented training data. Each unlabeled data cluster includes data samples with varying features. Weakly labeled training data is created based on supervised training data and the unlabeled data clusters with data samples therein with cluster labels via consistent self-training so that a labeled data sample in the supervised training data and a data sample in the weakly labeled training data with the same label have varying characteristics. Augmented training data is created from the supervised and the weakly labeled training data and is used to train a robust content categorization model via machine learning.

    SYSTEM AND METHOD FOR CONSISTENT CONTENT CATEGORIZATION VIA GENERATIVE AI

    公开(公告)号:US20250124257A1

    公开(公告)日:2025-04-17

    申请号:US18487460

    申请日:2023-10-16

    Abstract: The present teaching relates to content categorization. Supervised training data and unlabeled data clusters are used to generate augmented training data. Each unlabeled data cluster includes data samples with varying features. Weakly labeled training data is created with new data samples generated via generative augmentation based on supervised training data and the unlabeled data clusters. Each new data sample is assigned a label from a corresponding data sample from the supervised training data with generated varying characteristics. Augmented training data is created from the supervised and the weakly labeled training data and is used to train a robust content categorization model via machine learning.

Patent Agency Ranking