Reward-Conditioned Diffusion Policy

For the Introduction to Robot Learning course, our project is on Reward-Conditioned Diffusion Policy. The idea is to make use of classifier-free guidance for conditioning using a diffusion model with attention-based concatenation to learn policies. We test on maze environments in the D4RL dataset and condition the policy on waypoints.