-
公开(公告)号:WO2020052583A1
公开(公告)日:2020-03-19
申请号:PCT/CN2019/105334
申请日:2019-09-11
申请人: HUAWEI TECHNOLOGIES CO., LTD. , THE ROYAL INSTITUTION FOR THE ADVANCEMENT OF LEARNING/MCGILL UNIVERSITY
发明人: SHKURTI, Florian , DUDEK, Gregory , ABEYSIRIGOONAWARDENA, Yasasa , AMIRLOO ABOLFATHI, Elmira , LUO, Jun
IPC分类号: G05D1/02
摘要: A method and apparatus for generating adversarial scenarios and training an autonomous driving agent for an autonomous vehicle, using one or more sets of parameters, each set of parameters defining a respective driving scenario. A new set of parameters is generated by changing one or more parameters of one of the sets of parameters to define a new driving scenario, and performance of the autonomous driving agent is evaluated on the new driving scenario. The generating and evaluating is repeated until the autonomous driving agent fails to satisfy a predefined performance threshold for the new driving scenario. Each instance of changing the one or more parameters is based on a prior evaluated performance of the autonomous driving agent. The autonomous driving agent is trained to update a learned policy of the autonomous driving agent using at least one set of parameters, including the new set of parameters.
-
2.
公开(公告)号:WO2022022206A1
公开(公告)日:2022-02-03
申请号:PCT/CN2021/103339
申请日:2021-06-30
发明人: LUO, Jun , VILLELLA, Julian , ROHANI, Mohsen , RUSU, David , ALBAN, Montgomery , BANIJAMALI, Seyed Ershad
IPC分类号: G05D1/02
摘要: Method and system for controlling the behavior of an object. Behavior of the object is controlled during a first time period by using a first agent that applies a first behavior policy to map observations about the object and the environment in the first time period to a corresponding control action. Control is transitioned from the first agent to a second agent during a transition period following the first time period. Behavior of the object during a second time period following the transition period is controlled by using a second agent that applies a second behavior policy to map observations about the object and the environment in the second time period to a corresponding control action that is applied to the object. During transition the first agent applies the first behavior policy control the object and the second agent applies the second behavior policy to map observations about the object and the environment to corresponding control actions that are not applied to the object.
-
公开(公告)号:WO2021227536A1
公开(公告)日:2021-11-18
申请号:PCT/CN2020/141884
申请日:2020-12-31
发明人: GRAVES, Daniel Mark , JIN, Jun , LUO, Jun
IPC分类号: G06K9/62
摘要: Methods and systems are described for support policy learning in an agent of a robot. A general value function (GVF) is learned for a main policy, where the GVF represents future performance of the agent executing the main policy for a given state of the environment. A master policy selects an action based on the predicted accumulated success value received from the general value function. When the predicted accumulated success value is an acceptable value, the action selected by the master policy is execution of the main policy. When the predicted accumulated success value is not an acceptable value, the master action causes a support policy to be learned. The support policy generates a support action to be performed which causes the robot to transition from to a new state where the predicted accumulated success value has an acceptable value.
-
4.
公开(公告)号:WO2023061500A1
公开(公告)日:2023-04-20
申请号:PCT/CN2022/125538
申请日:2022-10-15
发明人: SHALOUDEGI, Kiarash , YU, Yaoliang , LUO, Jun
IPC分类号: G06N3/04
摘要: Methods and systems for federated learning using a parameterized optimization algorithm are described. A central server receives, from each of a plurality of user devices, a proximal map and feedback representing a current state of each user device. The server computes an update to optimization parameters of a parameterized optimization algorithm, using the received feedback. Model updates are computed for each user device, using the received proximal maps and the parameterized optimization algorithm having the updated optimization parameters. Each model update is transmitted to each respective client for updating the respective model.
-
5.
公开(公告)号:WO2023041022A1
公开(公告)日:2023-03-23
申请号:PCT/CN2022/119254
申请日:2022-09-16
发明人: HAIGH, Cameron Goeffrey Watmough , ZHANG, Zichen , HASSANPOUR, Negar , JAVED, Khurram , LUO, Jun , FU, Yingying
摘要: Systems and methods for computer-assisted design of an inductor are described. Target specifications for an inductor are received. An inductor design is generated segment-by-segment using a reinforcement learning agent to generate segment parameters for each added segment. The reinforcement learning agent implements a policy that is learned using a reward computed based on performance of the generated inductor design relative to the target specifications. The generated inductor design is outputted as a candidate inductor design after determining that the generated inductor design satisfies a predefined performance threshold.
-
6.
公开(公告)号:WO2022087751A1
公开(公告)日:2022-05-05
申请号:PCT/CA2021/051539
申请日:2021-11-01
申请人: MALEKMOHAMMADI, Saber , YAU, Tiffany Yee Kay , RASOULI, Amir , ROHANI, Mohsen , LUO, Jun , HUAWEI TECHNOLOGIES CO., LTD.
摘要: The present disclosure relates to methods and systems for spatiotemporal graph modelling of road users in observed frames of an environment in which an autonomous vehicle operates (i.e. a traffic scene), clustering of the road users into categories, and providing the spatiotemporal graph to a trained graphical convolutional neural network (GNN) to predict a future pedestrian action. The future pedestrian action can be: one of the pedestrian will cross a road and the pedestrian will not cross the road. The spatiotemporal graph includes a better understanding of the observed frames (i.e. traffic scene).
-
公开(公告)号:WO2021174770A1
公开(公告)日:2021-09-10
申请号:PCT/CN2020/110389
申请日:2020-08-21
IPC分类号: B60W10/06
摘要: A system and method for path and/or motion planning and for training such a system are described. In one aspect, the method comprises generating a sequence of predicted occupancy grid maps (OGMs) for T-T 1 time steps based on a sequence of OGMs for 0-T 1 time steps, a reference map of an environment in which an autonomous vehicle is operating, and a trajectory. A cost volume is generated for the sequence of predicted OGMs. The cost volume comprises a plurality of cost maps for T-T 1 time steps. Each cost map corresponds to a predicted OGM in the sequence of predicted OGMs and has the same dimensions as the corresponding predicted OGM. Each cost map comprises a plurality of cells. Each cell in the cost map represents a cost of the cell in corresponding predicted OGM being occupied in accordance with a policy defined by a policy function.
-
公开(公告)号:WO2021007706A1
公开(公告)日:2021-01-21
申请号:PCT/CN2019/095790
申请日:2019-07-12
发明人: JIANG, Zhiping , LUO, Jun , SUN, Haitao
IPC分类号: H04B3/04
摘要: The disclosed systems, structures, and methods are directed to an optical transceiver, employing a first optical time domain reflectometer (OTDR) module configured to generate a first OTDR signal, and a second OTDR signal, the second OTDR signal being a delayed version of the first OTDR signal, a first optical supervisory channel (OSC) transmitter configured to generate a first OSC signal, and a second OSC signal, the second OSC signal being a delayed version of the first OSC signal, a first wavelength division multiplexer (WDM) configured to transmit the first OSC signal interleaved with the first OTDR signal on a first optical fiber and a second WDM configured to transmit the second OSC signal interleaved with the second OTDR signal on a second optical fiber.
-
公开(公告)号:WO2017121404A1
公开(公告)日:2017-07-20
申请号:PCT/CN2017/071258
申请日:2017-01-16
发明人: ZHANG, Jiayin , MA, Chixiang , LUO, Jun
IPC分类号: H04W28/18
CPC分类号: H04W74/006 , H04W74/0816
摘要: A method for indicating TXOP duration in a wireless communication system, comprising: generating, by a TXOP holder, a physical layer protocol data unit (PPDU), the High Efficiency Signal field A (HE-SIGA) in the PPDU carries a TXOP duration field, wherein the TXOP duration field is used to indicate the remaining time for using the channel by the station to other stations; wherein the TXOP duration field includes a first part which is used to indicate the granularity used, and a second part which is used to indicate the TXOP duration using the granularity indicated by the first part; so as that different granularities are able to be used to indicate different TXOP duration in the system; sending, the generated PPDU.
摘要翻译: 一种用于在无线通信系统中指示TXOP持续时间的方法,包括:由TXOP持有者生成物理层协议数据单元(PPDU),高效率信号字段A(HE-SIGA) 在PPDU中携带TXOP持续时间字段,其中,TXOP持续时间字段用于指示站点使用该信道到其他站点的剩余时间; 其中,所述TXOP持续时间字段包括用于指示所使用的粒度的第一部分以及用于使用由所述第一部分指示的粒度来指示所述TXOP持续时间的第二部分; 以便不同粒度能够用于指示系统中的不同TXOP持续时间; 发送,生成的PPDU。 p>
-
10.
公开(公告)号:WO2021004435A1
公开(公告)日:2021-01-14
申请号:PCT/CN2020/100473
申请日:2020-07-06
IPC分类号: G06N3/08
摘要: Methods and systems of training RL agent (108) for autonomous operation of a vehicle (100) are described. The RL agent (108) is trained using uniformly sampled training samples and learning a policy. After the RL agent (108) has achieved a predetermined performance goal, data is collected including a sequence of sampled states, and for each sequence of sampled states, agent parameters, and an indication of failure of the RL agent (108) for the sequence. A failure predictor (126) is trained, using samples from the collected data, to predict a probability of failure of the RL agent (108) for a given sequence of states. Sequences of states are collected by simulating interaction of the vehicle (100) with the environment. Based on a probability of failure outputted by the failure predictor (126), a sequence of states is selected. The RL agent (108) is further trained based on the selected sequence of states.
-
-
-
-
-
-
-
-
-