Can RL Agents Think? How do they Understand their Environment?

Understanding Model-Based vs Model-Free Reinforcement Learning

Darshana

Apr 11, 2025

Photo by Aideal Hwa on Unsplash

Do you remember the feeling of exploring a new city?

I recently traveled to one.

Something about me, I like to stay prepared. New and unknown roads ring an alarm inside my head like a security system.

So naturally, I did my homework: stay booked, restaurants shortlisted, maps downloaded. I even skimmed through tourist reviews about places to avoid. I had practically built a mental model of the city.

This approach might resonate with some. For others, it might sound overly cautious.

Over the years, I’ve met people who don’t believe in doing any of the things I just mentioned.

Thats your YOLO Gang!

The ones who just throw a few clothes in a backpack and leave. No plans, no research. Their attitude?

“I’ll deal with whatever is thrown at me.”

They wander into random streets, pick restaurants on a whim, and rely on tastebuds — or sometimes their stomach — to tell them if the decision was good.

Both styles work.

If you’re wondering why I brought up that analogy, it’s because RL agents work in a similar way.

Just like the travellers,

You have agents that plan ahead, build a mental map of the environment, and try to make smart moves based on predictions.
And the ones that just wing it, learning as they go from trial and error.

The planners fall under what's called Model-Based Reinforcement Learning, while the wingers belong to the Model-Free category.

Through the lens of Model-Based and Model-Free Reinforcement Learning, in this article, we’ll explore how agents make decisions as they interact with their environment.

I have written an article on the basics of Reinforcement Learning. Do check it out if you haven’t yet!

What is Model-Free Reinforcement Learning?

In Model-Free RL, the agent isn’t given any information about the environment it’s going to interact with. It learns and makes decisions purely through trial and error.

This means the agent doesn’t think ahead or try to predict what state an action might lead to. Instead, it learns from the outcome — and the next time it encounters a similar situation, it either goes for it or avoids it based on what happened before. Note, the agent doesn’t build any kind of mental model of the environment.

What is Model Based Reinforcement Learning?

In model based RL, the agent is given some rules or information about the environment so it does not have to go in completely blind. Using this information, the agent creates a simulation or a kind of mental map of the environment it is going to interact with.

For example, in training self-driving cars, the car acts as the agent. It is provided with basic data, like traffic rules — stop when the light is red, go when it is green. The agent builds a mental map or a simulated version of the road. This helps save a lot of processing effort, because the car does not need to learn things from scratch when the rules are already known.

To understand better, consider the following.

Let’s say there’s a new restaurant that just opened next to your house. You have no idea what it’s like — you didn’t read any reviews. You just decided to go try it out. But unfortunately, it didn’t go well. You didn’t like it.

Now, if your thought process was similar to a model-free agent, you would simply accept the outcome as: xyz = not visiting again. And that’s it. It wouldn’t matter if the reason you didn’t enjoy it was temporary, or if you came across a different branch of the same restaurant somewhere else. You’d just decide it’s a bad choice, because you didn’t create any mental map of what actually went wrong.

On the other hand, if you were anything like a model based RL agent, you’d dig a little deeper. You’d try to figure out what exactly you didn’t like. Was it the ambience? Was it the food? Was the waiter rude? Let’s say it was the rude waiter. The next time you visit, you might ask for a different waiter — because now you have a mental map of what caused the discomfort the last time. And with that one variable changed, you’re still open to going to the same restaurant again.

From the previous example, it might seem like model based RL is always the better way to go. But hear me out. Both approaches have their own use cases.

Let’s say you need to train a robot. To keep it simple, imagine you are just training the robot's arm. Now think about this. How do you feed the robot detailed information like when it needs to be gentle while picking up a glass, or when it should apply more force to lift something heavy? How do you tell it that some objects might be slippery and need a firmer grip?

Trying to make it understand everything like air resistance, object shape, softness, slipperiness, and then turning all that into a clear set of rules can get really complicated.

This is a classic case where model free RL works better. Instead of trying to hard code all the rules of physics, which would get very complex, it is often easier to let the robot learn through trial and error.

Over time, it figures out how much force is needed, what slips, what breaks, and so on, just by interacting with the environment and adjusting based on the outcomes.

Model Based + Model Free RL

As much as both model free and model based methods have their benefits and use cases, they are often combined in the real world to get the best results.

For instance, consider the example we discussed earlier — training a self driving car. We talked about how basic road rules are provided to it in advance.

But what you cannot feed in as data is the intuitive road sense that humans develop. And humans have learned that through trial and error and experience.

For example, only through trial and error can the agent learn about rash drivers, slow drivers, or how different people behave on the road. It has to adjust its actions accordingly to avoid accidents. It also has to deal with jaywalkers in the real environment.

Now, most of this learning happens in a simulation. But do not confuse a simulation with the mental map created by model based reinforcement learning. The simulation here is a simulated environment which to the agent acts as the real environment where it can learn through trial and error without causing any physical damage or accidents.

No one wants accidents to happen in real life, so all the trial and error takes place in a safe virtual space. Snapshots of different human driving styles can be simulated, but they cannot be written down as a strict set of rules. So the self driving car does all its learning and testing inside a simulation software before ever hitting the real road.

We’ve barely scratched the surface — there’s a lot more coming up. I’ve tried my best to keep things simple and relatable while walking through the basics of reinforcement learning. Hope it helped! Feel free to drop a comment if there’s anything you’d like me to improve or any topics you want me to cover next.

The Dev Insight Journal

Discussion about this post