With Deep-Q Learning we can program AI agents that can operate in environments with discrete actions spaces.

A discrete action space refers to actions that are well-defined, e. The AI agent can move either left or right.

A and q The movement in each direction is happening with a certain velocity. If the agent could determine the velocity, then we would have a continues action space with an infinite amount of possible actions movement with a different velocity. This case will be considered in the future. In the last article, I introduced the concept of the action-value function Q s,agiven by Eq.

Q s,a tells the agent the value or quality of a possible action a in a particular state s.

Higher quality means a better action with regards to the given objective. If we execute the expectation operator E in Eq. Our goal in Deep Q-Learning is to solve the action-value function Q s,a.

Why do we want this? The reason for this is the fact that the knowledge of Q s,a would enable the agent to determine the quality of any possible action in any given state. Thus the agent could behave accordingly.

But since we are considering recursion and furthermore dealing with probabilities using this equation is not practical. Rather we must use the so-called Temporal Difference TD learning algorithm to solve Q s,a iteratively.

