ITERATIVE GENERATION OF ADVERSARIAL SCENARIOS

    公开(公告)号:WO2020052583A1

    公开(公告)日:2020-03-19

    申请号:PCT/CN2019/105334

    申请日:2019-09-11

    IPC分类号: G05D1/02

    摘要: A method and apparatus for generating adversarial scenarios and training an autonomous driving agent for an autonomous vehicle, using one or more sets of parameters, each set of parameters defining a respective driving scenario. A new set of parameters is generated by changing one or more parameters of one of the sets of parameters to define a new driving scenario, and performance of the autonomous driving agent is evaluated on the new driving scenario. The generating and evaluating is repeated until the autonomous driving agent fails to satisfy a predefined performance threshold for the new driving scenario. Each instance of changing the one or more parameters is based on a prior evaluated performance of the autonomous driving agent. The autonomous driving agent is trained to update a learned policy of the autonomous driving agent using at least one set of parameters, including the new set of parameters.

    SYSTEM AND METHOD FOR MANAGING FLEXIBLE CONTROL OF VEHICLES BY DIVERSE AGENTS IN AUTONOMOUS DRIVING SIMULATION

    公开(公告)号:WO2022022206A1

    公开(公告)日:2022-02-03

    申请号:PCT/CN2021/103339

    申请日:2021-06-30

    IPC分类号: G05D1/02

    摘要: Method and system for controlling the behavior of an object. Behavior of the object is controlled during a first time period by using a first agent that applies a first behavior policy to map observations about the object and the environment in the first time period to a corresponding control action. Control is transitioned from the first agent to a second agent during a transition period following the first time period. Behavior of the object during a second time period following the transition period is controlled by using a second agent that applies a second behavior policy to map observations about the object and the environment in the second time period to a corresponding control action that is applied to the object. During transition the first agent applies the first behavior policy control the object and the second agent applies the second behavior policy to map observations about the object and the environment to corresponding control actions that are not applied to the object.

    METHODS AND SYSTEMS FOR SUPPORT POLICY LEARNING

    公开(公告)号:WO2021227536A1

    公开(公告)日:2021-11-18

    申请号:PCT/CN2020/141884

    申请日:2020-12-31

    IPC分类号: G06K9/62

    摘要: Methods and systems are described for support policy learning in an agent of a robot. A general value function (GVF) is learned for a main policy, where the GVF represents future performance of the agent executing the main policy for a given state of the environment. A master policy selects an action based on the predicted accumulated success value received from the general value function. When the predicted accumulated success value is an acceptable value, the action selected by the master policy is execution of the main policy. When the predicted accumulated success value is not an acceptable value, the master action causes a support policy to be learned. The support policy generates a support action to be performed which causes the robot to transition from to a new state where the predicted accumulated success value has an acceptable value.

    METHODS AND SYSTEMS FOR UPDATING PARAMETERS OF A PARAMETERIZED OPTIMIZATION ALGORITHM IN FEDERATED LEARNING

    公开(公告)号:WO2023061500A1

    公开(公告)日:2023-04-20

    申请号:PCT/CN2022/125538

    申请日:2022-10-15

    IPC分类号: G06N3/04

    摘要: Methods and systems for federated learning using a parameterized optimization algorithm are described. A central server receives, from each of a plurality of user devices, a proximal map and feedback representing a current state of each user device. The server computes an update to optimization parameters of a parameterized optimization algorithm, using the received feedback. Model updates are computed for each user device, using the received proximal maps and the parameterized optimization algorithm having the updated optimization parameters. Each model update is transmitted to each respective client for updating the respective model.

    MACHINE-LEARNING BASED SYSTEM FOR PATH AND/OR MOTION PLANNING AND METHOD OF TRAINING

    公开(公告)号:WO2021174770A1

    公开(公告)日:2021-09-10

    申请号:PCT/CN2020/110389

    申请日:2020-08-21

    IPC分类号: B60W10/06

    摘要: A system and method for path and/or motion planning and for training such a system are described. In one aspect, the method comprises generating a sequence of predicted occupancy grid maps (OGMs) for T-T 1 time steps based on a sequence of OGMs for 0-T 1 time steps, a reference map of an environment in which an autonomous vehicle is operating, and a trajectory. A cost volume is generated for the sequence of predicted OGMs. The cost volume comprises a plurality of cost maps for T-T 1 time steps. Each cost map corresponds to a predicted OGM in the sequence of predicted OGMs and has the same dimensions as the corresponding predicted OGM. Each cost map comprises a plurality of cells. Each cell in the cost map represents a cost of the cell in corresponding predicted OGM being occupied in accordance with a policy defined by a policy function.

    METHOD AND APPARATUS FOR AN OPTICAL TRANSCEIVER

    公开(公告)号:WO2021007706A1

    公开(公告)日:2021-01-21

    申请号:PCT/CN2019/095790

    申请日:2019-07-12

    IPC分类号: H04B3/04

    摘要: The disclosed systems, structures, and methods are directed to an optical transceiver, employing a first optical time domain reflectometer (OTDR) module configured to generate a first OTDR signal, and a second OTDR signal, the second OTDR signal being a delayed version of the first OTDR signal, a first optical supervisory channel (OSC) transmitter configured to generate a first OSC signal, and a second OSC signal, the second OSC signal being a delayed version of the first OSC signal, a first wavelength division multiplexer (WDM) configured to transmit the first OSC signal interleaved with the first OTDR signal on a first optical fiber and a second WDM configured to transmit the second OSC signal interleaved with the second OTDR signal on a second optical fiber.

    DATA TRANSMISSION METHOD AND APPARATUS IN WLAN
    9.
    发明申请
    DATA TRANSMISSION METHOD AND APPARATUS IN WLAN 审中-公开
    WLAN中的数据传输方法和设备

    公开(公告)号:WO2017121404A1

    公开(公告)日:2017-07-20

    申请号:PCT/CN2017/071258

    申请日:2017-01-16

    IPC分类号: H04W28/18

    CPC分类号: H04W74/006 H04W74/0816

    摘要: A method for indicating TXOP duration in a wireless communication system, comprising: generating, by a TXOP holder, a physical layer protocol data unit (PPDU), the High Efficiency Signal field A (HE-SIGA) in the PPDU carries a TXOP duration field, wherein the TXOP duration field is used to indicate the remaining time for using the channel by the station to other stations; wherein the TXOP duration field includes a first part which is used to indicate the granularity used, and a second part which is used to indicate the TXOP duration using the granularity indicated by the first part; so as that different granularities are able to be used to indicate different TXOP duration in the system; sending, the generated PPDU.

    摘要翻译: 一种用于在无线通信系统中指示TXOP持续时间的方法,包括:由TXOP持有者生成物理层协议数据单元(PPDU),高效率信号字段A(HE-SIGA) 在PPDU中携带TXOP持续时间字段,其中,TXOP持续时间字段用于指示站点使用该信道到其他站点的剩余时间; 其中,所述TXOP持续时间字段包括用于指示所使用的粒度的第一部分以及用于使用由所述第一部分指示的粒度来指示所述TXOP持续时间的第二部分; 以便不同粒度能够用于指示系统中的不同TXOP持续时间; 发送,生成的PPDU。

    METHOD AND SYSTEM FOR TRAINING REINFORCEMENT LEARNING AGENT USING ADVERSARIAL SAMPLING

    公开(公告)号:WO2021004435A1

    公开(公告)日:2021-01-14

    申请号:PCT/CN2020/100473

    申请日:2020-07-06

    IPC分类号: G06N3/08

    摘要: Methods and systems of training RL agent (108) for autonomous operation of a vehicle (100) are described. The RL agent (108) is trained using uniformly sampled training samples and learning a policy. After the RL agent (108) has achieved a predetermined performance goal, data is collected including a sequence of sampled states, and for each sequence of sampled states, agent parameters, and an indication of failure of the RL agent (108) for the sequence. A failure predictor (126) is trained, using samples from the collected data, to predict a probability of failure of the RL agent (108) for a given sequence of states. Sequences of states are collected by simulating interaction of the vehicle (100) with the environment. Based on a probability of failure outputted by the failure predictor (126), a sequence of states is selected. The RL agent (108) is further trained based on the selected sequence of states.