METHOD AND APPARATUS FOR SYNCHRONIZING ACTIONS OF LEARNING DEVICES BETWEEN SIMULATED WORLD AND REAL WORLD

    公开(公告)号:US20230342665A1

    公开(公告)日:2023-10-26

    申请号:US18091520

    申请日:2022-12-30

    IPC分类号: G06N20/00 B25J9/16

    CPC分类号: G06N20/00 B25J9/1605

    摘要: Provided is a method and apparatus for synchronizing actions of robots between a simulated world and a real world. The method may include determining whether the learning device of the simulated world and the learning device of the real world reach the target state after one unit time, when the learning device of the simulated world and the learning device of the real world reach the target state, determining a first delay time, which is a time until the learning device of the simulated world reaches the target state, and a second delay time, which is a time until the learning device of the real world reaches the target state, and performing a correction between a state of the learning device of the simulated world and a state of the learning device of the real world based on the first delay time and the second delay time.

    METHOD AND APPARATUS FOR REINFORCEMENT MACHINE LEARNING

    公开(公告)号:US20210019644A1

    公开(公告)日:2021-01-21

    申请号:US16929975

    申请日:2020-07-15

    IPC分类号: G06N7/00 G06N20/00

    摘要: A method and an apparatus for exclusive reinforcement learning are provided, comprising: collecting information of states of an environment through the communication interface and performing a statistical analysis on the states using the collected information; determining a first state value of a first state among the states in a training phase and a second state value of a second state among the states in an inference phase based on analysis results of the statistical analysis; performing reinforcement learning by using one reinforcement learning unit of a plurality of reinforcement learning unit which performs reinforcement learnings from different perspectives according to the first state value; and selecting one of actions determined by the plurality of reinforcement learning unit based on the second state value and applying selected action to the environment.