WARM STARTING AN ONLINE BANDIT LEARNER MODEL UTILIZING RELEVANT OFFLINE MODELS

    公开(公告)号:US20230259829A1

    公开(公告)日:2023-08-17

    申请号:US18306449

    申请日:2023-04-25

    Applicant: Adobe Inc.

    CPC classification number: G06N20/00 G06N5/04 G06F18/2193

    Abstract: Methods, systems, and non-transitory computer readable storage media are disclosed for utilizing offline models to warm start online bandit learner models. For example, the disclosed system can determine relevant offline models for an environment based on reward estimate differences between the offline models and the online model. The disclosed system can then utilize the relevant offline models (if any) to select an arm for the environment. The disclosed system can update the online model based on observed rewards for the selected arm. Additionally, the disclosed system can also use entropy reduction of arms to determine the utility of the arms in differentiating relevant and irrelevant offline models. For example, the disclosed system can select an arm based on a combination of the entropy reduction of the arm and the reward estimate for the arm and use the observed reward to update an observation history.

    Utilizing relevant offline models to warm start an online bandit learner model

    公开(公告)号:US11669768B2

    公开(公告)日:2023-06-06

    申请号:US16584082

    申请日:2019-09-26

    Applicant: Adobe Inc.

    CPC classification number: G06F18/2193 G06N5/04 G06N20/00

    Abstract: Methods, systems, and non-transitory computer readable storage media are disclosed for utilizing offline models to warm start online bandit learner models. For example, the disclosed system can determine relevant offline models for an environment based on reward estimate differences between the offline models and the online model. The disclosed system can then utilize the relevant offline models (if any) to select an arm for the environment. The disclosed system can update the online model based on observed rewards for the selected arm. Additionally, the disclosed system can also use entropy reduction of arms to determine the utility of the arms in differentiating relevant and irrelevant offline models. For example, the disclosed system can select an arm based on a combination of the entropy reduction of the arm and the reward estimate for the arm and use the observed reward to update an observation history.

    UTILIZING RELEVANT OFFLINE MODELS TO WARM START AN ONLINE BANDIT LEARNER MODEL

    公开(公告)号:US20210097350A1

    公开(公告)日:2021-04-01

    申请号:US16584082

    申请日:2019-09-26

    Applicant: Adobe Inc.

    Abstract: Methods, systems, and non-transitory computer readable storage media are disclosed for utilizing offline models to warm start online bandit learner models. For example, the disclosed system can determine relevant offline models for an environment based on reward estimate differences between the offline models and the online model. The disclosed system can then utilize the relevant offline models (if any) to select an arm for the environment. The disclosed system can update the online model based on observed rewards for the selected arm. Additionally, the disclosed system can also use entropy reduction of arms to determine the utility of the arms in differentiating relevant and irrelevant offline models. For example, the disclosed system can select an arm based on a combination of the entropy reduction of the arm and the reward estimate for the arm and use the observed reward to update an observation history.

Patent Agency Ranking