Reinforcement Learning with Information Retrieval Feedback

    公开(公告)号:US20250013915A1

    公开(公告)日:2025-01-09

    申请号:US18348687

    申请日:2023-07-07

    Applicant: Google LLC

    Abstract: In one example aspect, the present disclosure provides an example computer-implemented method for generating feedback signals for training a machine-learned agent model. The example method can include obtaining an output of a machine-learned agent model, the output including a next state feature generated by the machine-learned agent model based on a sequence of preceding states. The example method can include processing, using a machine-learned reward model, the output and the sequence of preceding states to generate a quality indicator indicating a quality of the next state feature in view of the preceding states. The machine-learned reward model could be trained by retrieving reference data from a reference data source and computing one or more quality indicators in view of a respective training input and output(s), and the reference data. The example method can include outputting the quality indicator to a model trainer for updating the machine-learned agent model.

Patent Agency Ranking