应用于非循迹智能小车避障系统的强化学习算法

发明公开

CN105139072A 应用于非循迹智能小车避障系统的强化学习算法无效 - 撤回

请登陆查看更多内容

专利标题： 应用于非循迹智能小车避障系统的强化学习算法
专利标题（英）： Reinforcement learning algorithm applied to non-tracking intelligent trolley barrier-avoiding system
申请号： CN201510570592.5

申请日： 2015-09-09
公开(公告)号： CN105139072A

公开(公告)日： 2015-12-09
发明人: 王佛伟 , 沈波 , 王栋 , 张似晶 , 谭海龙
申请人： 东华大学
申请人地址： 上海市松江区人民北路2999号
专利权人： 东华大学
当前专利权人： 东华大学
当前专利权人地址： 上海市松江区人民北路2999号
代理机构： 上海申汇专利代理有限公司
代理商 翁若莹
主分类号： G06N3/08
IPC分类号： G06N3/08

摘要：

本发明公开了一种强化学习算法，其包括新Q学习算法，新Q学习算法包括以下实现步骤：将采集好的数据输入到BP神经网络中，计算状态隐含层和输出层各个单元的输入和输出；在t状态就算出其最大输出值m，基于这个输出判断是否与障碍物发生碰撞，如果发生了碰撞则记录下BP神经网络的各单元阈值和各连接权值；否则计算T+1时刻采集数据并归一化，计算t+1状态隐含层和输出层各个单元的输入和输出，计算t状态期望输出值，调整输出和隐含层各个单元的阈值，判断误差是否小于给定阈值或学习次数大于给定值，如果不符合条件则重新学习，不然记录下各个单元的阈值和各个连接权值，结束学习。本发明实时性好、快速性好、可后期重学习。

摘要（英）：

The invention discloses a reinforcement learning algorithm, including a new Q learning algorithm. The new Q learning algorithm includes the implementation steps of: inputting collected data to a BP neural network, and calculating input and output of each unit of a hidden layer and an output layer in the state; calculating a maximum output value m in a t state, based on the output, judging whether a collision with a barrier occurs, if a collision occurs, recording each unit threshold value and each connection weight of the BP neural network, and otherwise calculating T+1 moment, collecting data and performing normalization, calculating input and output of each unit of the hidden layer and the output layer in the t+1 state, calculating an expected output value of a t state, adjusting output and the threshold value of each unit of the hidden layer, judging whether an error is smaller than a given threshold value or the number of times of learning is larger than a given value, if the condition is not satisfied, performing learning again, and otherwise recording the threshold value of each unit and each connection weight, finishing learning. The reinforcement learning algorithm provided by the invention has good real-time performance and good rapidity, and allows relearning in a later period.

信息查询

中国专利公布公告 Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06N	基于特定计算模型的计算机系统
G06N3/00	基于生物学模型的计算机系统
G06N3/02	.采用神经网络模型
G06N3/08	..学习方法