-
公开(公告)号:US20250124258A1
公开(公告)日:2025-04-17
申请号:US18487487
申请日:2023-10-16
Applicant: YAHOO ASSETS LLC
Inventor: Ariel Raviv , Noa Avigdor-Elgrabli , Stav Yanovsky Daye , Michael Viderman , Guy Horowitz
IPC: G06N3/0455 , G06N3/0895
Abstract: The present teaching relates to content categorization. Supervised training data and unlabeled data clusters are used to generate augmented training data. Each unlabeled data cluster includes data samples with varying features. Weakly labeled training data is created based on supervised training data and the unlabeled data clusters with data samples therein with cluster labels via consistent self-training so that a labeled data sample in the supervised training data and a data sample in the weakly labeled training data with the same label have varying characteristics. Augmented training data is created from the supervised and the weakly labeled training data and is used to train a robust content categorization model via machine learning.
-
公开(公告)号:US20250124257A1
公开(公告)日:2025-04-17
申请号:US18487460
申请日:2023-10-16
Applicant: YAHOO ASSETS LLC
Inventor: Ariel Raviv , Noa Avigdor-Elgrabli , Stav Yanovsky Daye , Michael Viderman , Guy Horowitz
IPC: G06N3/0455 , G06N3/0895
Abstract: The present teaching relates to content categorization. Supervised training data and unlabeled data clusters are used to generate augmented training data. Each unlabeled data cluster includes data samples with varying features. Weakly labeled training data is created with new data samples generated via generative augmentation based on supervised training data and the unlabeled data clusters. Each new data sample is assigned a label from a corresponding data sample from the supervised training data with generated varying characteristics. Augmented training data is created from the supervised and the weakly labeled training data and is used to train a robust content categorization model via machine learning.
-