The virtual course "Sample-Based Learning Methods - Virtual Course - Coursera", is a course with different contents and offers video classes of Approx. 22 hours to complete. Explore its essential features, and click the orange button for detailed information on the Coursera e-Learning platform.
In this course, you will learn about various algorithms that can learn near-optimal policies based on trial and error interaction with the environment, learning from the agent's own experience. Learning from actual experience is amazing because it requires no prior knowledge of environment dynamics and you can still achieve optimal behavior. We will cover intuitively simple yet powerful Monte Carlo methods, and time difference learning methods, including Q-learning. We'll end this course by investigating how we can get the best of both worlds: algorithms that can combine model-based scheduling (similar to dynamic programming) and temporary difference updates to radically speed up learning. By the end of this course you will be able to: - Understand time difference and Monte Carlo learning as two strategies for estimating value functions from sampled experience - Understand the importance of exploration, when sampled experience is used instead of dynamic programming sweeps within a model - Understand the connections between Monte Carlo and dynamic programming and TD. - Implement and apply the TD algorithm, to estimate functions of value - Implement and apply Expected Sarsa and Q-learning (two TD methods for control) - Understand the difference between in-policy and out-of-policy control - Understand planning with experience simulation (as opposed to classical planning strategies) - Implement a model-based approach to RL, called Dyna, which uses simulated experience - Conduct an empirical study to see the improvements in sample efficiency when using Dyna when using Sampled experience rather than dynamic programming sweeps within a model: Understanding the connections between Monte Carlo and dynamic programming and TD. - Implement and apply the TD algorithm, to estimate functions of value - Implement and apply Expected Sarsa and Q-learning (two TD methods for control) - Understand the difference between in-policy and out-of-policy control - Understand planning with experience simulation (as opposed to classical planning strategies) - Implement a model-based approach to RL, called Dyna, which uses simulated experience - Conduct an empirical study to see the improvements in sample efficiency when using Dyna when using Sampled experience rather than dynamic programming sweeps within a model: Understanding the connections between Monte Carlo and dynamic programming and TD.
Prepare yourself from home with the most prestigious universities in the world.
The quality of Coursera's courses is supported by its instructors, who are often deans and have doctorates.
More than 85% of Coursera students report career benefits, such as promotions or salary increases.
Millions of students around the world are meeting their personal and professional goals with Coursera.
Coursera offers courses from over 200 leading universities and companies to deliver online learning around the world. With a Coursera Plus subscription, you get unlimited access to over 90% of all courses, and the most popular professional certificates and specializations on Coursera.
Data science, business and personal development. You can enroll in multiple courses at once, earn unlimited certificates, and learn in-demand job skills to start, grow, and even change careers.
DISCOVER HOW TO GET THE MOST, AND SAVE OVER USD $500 WITH AN ANNUAL SUBSCRIPTION TO COURSERA PLUS*
*You save up to USD$500 in 12 months, when you go from paying USD$59 for a monthly subscription, to an annual subscription with the promotion. The normal annual subscription is USD $399. With the promotion you will only pay USD $299. Find out everything by clicking the yellow button.
Hello how can I help you? Are you interested in a course? About what subject?
Add a review