Patent search ap:("Google LLC") AND inv:"Kazuma Hashimoto" Page 1

1.

发明公开
Corrective Reward Optimization for Sequential Labeling 审中-公开

公开(公告)号：US20240070456A1

公开(公告)日：2024-02-29

申请号：US18240954

申请日：2023-08-31

Applicant: Google LLC

Inventor： Karthik Raman , Kazuma Hashimoto

IPC: G06N3/08

CPC classification number: G06N3/08

Abstract: Provided are systems and methods for corrective reward optimization for generative sequential labeling. In particular, example aspects of the present disclosure are directed to an effective framework for generative reward optimization of text (or other) data sequences, certain example implementations of which can be referred to as “GROOT”. Example implementations of the proposed framework work by training a generative sequential labeling model to match the decoder output distribution with that of the (possibly black-box) reward function. Using an iterative training regime, the framework can first generate prediction candidates and then correct errors in the candidate. Finally, a loss function can be used that contrasts those candidates based on their reward values (e.g., as measured by a reward function that encodes the specific objectives for a particular setting or application).

Patent Agency Ranking