11-MDP-QLearning

MDP 
Markov Property 
transition probability 
State Transition Matrix 
Markov Process - Formal 
Markov Reward Process 
Markov Reward Process - Formal 
Return 
Return - Formal 
Value Function 
Value Function - MRP 
Markov decision process 
Markov decision process - Formal 
policy 
state-value function 
action-value function 
optimal state-value function 
optimal action-value function 
optimal policy 
Bellman optimality equation 
Q Learning - Formal 
learning rate 
Epsilon-greedy Action Selection 
Boltzmann action selection