-
公开(公告)号:US20230259829A1
公开(公告)日:2023-08-17
申请号:US18306449
申请日:2023-04-25
Applicant: Adobe Inc.
Inventor: Georgios Theocharous , Zheng Wen , Yasin Abbasi Yadkori , Qingyun Wu
CPC classification number: G06N20/00 , G06N5/04 , G06F18/2193
Abstract: Methods, systems, and non-transitory computer readable storage media are disclosed for utilizing offline models to warm start online bandit learner models. For example, the disclosed system can determine relevant offline models for an environment based on reward estimate differences between the offline models and the online model. The disclosed system can then utilize the relevant offline models (if any) to select an arm for the environment. The disclosed system can update the online model based on observed rewards for the selected arm. Additionally, the disclosed system can also use entropy reduction of arms to determine the utility of the arms in differentiating relevant and irrelevant offline models. For example, the disclosed system can select an arm based on a combination of the entropy reduction of the arm and the reward estimate for the arm and use the observed reward to update an observation history.
-
公开(公告)号:US11669768B2
公开(公告)日:2023-06-06
申请号:US16584082
申请日:2019-09-26
Applicant: Adobe Inc.
Inventor: Georgios Theocharous , Zheng Wen , Yasin Abbasi Yadkori , Qingyun Wu
CPC classification number: G06F18/2193 , G06N5/04 , G06N20/00
Abstract: Methods, systems, and non-transitory computer readable storage media are disclosed for utilizing offline models to warm start online bandit learner models. For example, the disclosed system can determine relevant offline models for an environment based on reward estimate differences between the offline models and the online model. The disclosed system can then utilize the relevant offline models (if any) to select an arm for the environment. The disclosed system can update the online model based on observed rewards for the selected arm. Additionally, the disclosed system can also use entropy reduction of arms to determine the utility of the arms in differentiating relevant and irrelevant offline models. For example, the disclosed system can select an arm based on a combination of the entropy reduction of the arm and the reward estimate for the arm and use the observed reward to update an observation history.
-
公开(公告)号:US20210097350A1
公开(公告)日:2021-04-01
申请号:US16584082
申请日:2019-09-26
Applicant: Adobe Inc.
Inventor: Georgios Theocharous , Zheng Wen , Yasin Abbasi Yadkori , Qingyun Wu
Abstract: Methods, systems, and non-transitory computer readable storage media are disclosed for utilizing offline models to warm start online bandit learner models. For example, the disclosed system can determine relevant offline models for an environment based on reward estimate differences between the offline models and the online model. The disclosed system can then utilize the relevant offline models (if any) to select an arm for the environment. The disclosed system can update the online model based on observed rewards for the selected arm. Additionally, the disclosed system can also use entropy reduction of arms to determine the utility of the arms in differentiating relevant and irrelevant offline models. For example, the disclosed system can select an arm based on a combination of the entropy reduction of the arm and the reward estimate for the arm and use the observed reward to update an observation history.
-
-