摘要:
An information processing device includes: a learning section configured to learn a state transition probability model defined by state transition probability for each action of a state making a state transition due to an action performed by an agent capable of performing action and observation probability of a predetermined observed value being observed from the state, using an action performed by the agent and an observed value observed in the agent when the agent has performed the action.
摘要:
An information processing device includes: a learning section configured to learn a state transition probability model defined by state transition probability for each action of a state making a state transition due to an action performed by an agent capable of performing action and observation probability of a predetermined observed value being observed from the state, using an action performed by the agent and an observed value observed in the agent when the agent has performed the action.
摘要:
An HMM (Hidden Markov Model) learning device includes: a learning unit for learning a state transition probability as the function of actions that an agent can execute, with learning with HMM performed based on actions that the agent has executed, and time series information made up of an observation signal; and a storage unit for storing learning results by the learning unit as internal model data including a state-transition probability table and an observation probability table; with the learning unit calculating frequency variables used for estimation calculation of HMM state-transition and HMM observation probabilities; with the storage unit holding the frequency variables corresponding to each of state-transition probabilities and each of observation probabilities respectively, of the state-transition probability table; and with the learning unit using the frequency variables held by the storage unit to perform learning, and estimating the state-transition probability and the observation probability based on the frequency variables.
摘要:
An information processing device includes: a calculating unit configured to calculate a current-state series candidate that is a state series for an agent capable of actions reaching the current state, based on a state transition probability model obtained by performing learning of the state transition probability model stipulated by a state transition probability that a state will be transitioned according to each of actions performed by an agent capable of actions, and an observation probability that a predetermined observation value will be observed from the state, using an action performed by the agent, and an observation value observed at the agent when the agent performs an action; and a determining unit configured to determine an action to be performed next by the agent using the current-state series candidate in accordance with a predetermined strategy.
摘要:
A data processing device includes a state value calculation unit which calculates a state value of which the value increases as much as a state with a high transition probability for each state of the state transition model, an action value calculation unit which calculates an action value, of which the value increases as a transition probability increases for each state of the state transition model and each action that the agent can perform, a target state setting unit which sets a state with great unevenness in the action value among states of the state transition model to a target state that is the target to reach by action performed by the agent, and an action selection unit which selects an action of the agent so as to move toward the target state.
摘要:
A learning system is provided, which includes network storage means for storing a network including a plurality of nodes, each of which holds a dynamics; and learning means for self-organizationally updating the dynamics of the network on the basis of measured time-series data.
摘要:
A learning apparatus includes a storage unit configured to store a network formed by a plurality of nodes each holding dynamics; a learning unit configured to learn the dynamics of the network in a self-organizing manner on the basis of observed time-series data; a winner-node determiner configured to determine a winner node, the winner node being a node having dynamics that best match the time-series data; and a weight determiner configured to determine learning weights for the dynamics held by the individual nodes according to distances of the individual nodes from the winner node. The learning unit is configured to learn the dynamics of the network in a self-organizing manner by degrees corresponding to the learning weights.
摘要:
A behavior control system and a behavior control method for a robot apparatus are disclosed. The behavior control system and the behavior control method for a robot apparatus include a function of adaptively switching between a behavior selection standard, taking into account the own state, required of an autonomous robot, and a behavior selection standard, taking into account the state of a counterpart, responsive to a situation. A behavior selection control system in a robot apparatus includes a situation-dependent behavior layer (SBL), capable of selecting a particular behavior from plural behaviors, and outputting the so selected behavior, and an AL calculating unit 120 for calculating the AL (activation level), indicating the priority of execution of the behaviors, for behavior selection. This AL calculating unit 120 includes a self AL calculating unit 122 and a counterpart AL calculating unit 124 for calculating the self AL and the counterpart AL, and an AL integrating unit 125 for summing the self AL and the counterpart AL with weighting by a parameter used for determining whether emphasis is to be placed on the self state or on the counterpart state, to output an ultimate AL. The counterpart is a subject of interaction of the robot apparatus. The self AL and the counterpart AL indicate the priority of execution of the behavior with the self and with the co8unbterpart as a reference, respectively.
摘要:
An information processing apparatus includes a storage unit configured to store a node holding dynamics; an input-weight-coefficient adjuster configured to adjust input-weight coefficients on a dimension-by-dimension basis, the input-weight coefficients being weight coefficients for individual dimensions of input data input to input units of the node, the input data being observed time-series data having a plurality of dimensions; and an output-weight-coefficient adjuster configured to adjust output-weight coefficients on a dimension-by-dimension basis, the output-weight coefficients being weight coefficients for individual dimensions of output data having a plurality of dimensions and output from output units of the node.
摘要:
An information processing apparatus includes a storage unit configured to store a node holding dynamics; an input-weight-coefficient adjuster configured to adjust input-weight coefficients on a dimension-by-dimension basis, the input-weight coefficients being weight coefficients for individual dimensions of input data input to input units of the node, the input data being observed time-series data having a plurality of dimensions; and an output-weight-coefficient adjuster configured to adjust output-weight coefficients on a dimension-by-dimension basis, the output-weight coefficients being weight coefficients for individual dimensions of output data having a plurality of dimensions and output from output units of the node.