-
公开(公告)号:US20250013915A1
公开(公告)日:2025-01-09
申请号:US18348687
申请日:2023-07-07
Applicant: Google LLC
Inventor: Hyun Jin Park , Dongseong Hwang , Chang Wan Ryu
IPC: G06N20/00
Abstract: In one example aspect, the present disclosure provides an example computer-implemented method for generating feedback signals for training a machine-learned agent model. The example method can include obtaining an output of a machine-learned agent model, the output including a next state feature generated by the machine-learned agent model based on a sequence of preceding states. The example method can include processing, using a machine-learned reward model, the output and the sequence of preceding states to generate a quality indicator indicating a quality of the next state feature in view of the preceding states. The machine-learned reward model could be trained by retrieving reference data from a reference data source and computing one or more quality indicators in view of a respective training input and output(s), and the reference data. The example method can include outputting the quality indicator to a model trainer for updating the machine-learned agent model.