The Total Depravity of the No Free Lunch Theorem

Many Christians believe that we can only do good by the grace of God. In its most extreme form, this theory of “total depravity” suggests that we are literally incapable of choosing the good, choosing to follow God, or to even believe in Him without His direct supernatural aid, offered as a free gift.

Total depravity is false, but it contains an important truth about why we need God’s help not to screw up.

In artificial intelligence, we model smart things like people as “intelligent agents”. An agent, broadly stated, is something that exists in a larger environment, observing situations, taking actions, and receiving rewards – a bit like the entities navigating through Markov decision processes last time.

But agents are a broader concept, not strictly tied to the Markov property: anything that makes decisions about actions in a larger environment can be an agent. The line between agent and environment can be clear, as with humans contained within our skins; or it might be fuzzy, like a control system for a factory.

While the idea of “intelligence” is fuzzy, one of the things that makes an agent smart is rational behavior – making the right choices. Another thing that makes an agent smart is learning – improving your behavior in the future based on the experiences that you’ve had in the past.

The field I work in, deep reinforcement learning, focuses on building learning agents that improve their rationality based on their experiences, generally within a partially-observable Markov decision process in which it’s reasonably clear what counts as rational, even if the agent can’t clearly see the whole world.

This “partial observability” is one real-world limitation that virtually all agents in Creation share. Robot sensors have a limited range, the factory controller doesn’t have a sensor on all its circuits, and we can’t see behind our own heads (hey, there’s a creepy man standing behind you right now – don’t look!)

Partial observability means we need to make the best decisions we can based on the information that is available to us. We look both ways at a crosswalk to try to reduce our uncertainty, waiting if a car is coming, and we call out “corner” in a restaurant kitchen to try to reduce the uncertainty of others.

Obviously, if you don’t know which door holds the lady or the tiger, it’s hard to pick. But even if an agent had perfect knowledge of the current state of the world around it – not that current state is well-defined in general relativity / quantum mechanics, but nevermind – making perfectly correct decisions is impossible.

Well, not necessarily impossible: a perfectly omniscient agent could make perfectly optimal decisions, because it would know the true value of each action, not just its immediate reward. But without that kind of revelation of information from the future, we can only learn from our past experiences.

And that’s where the no free lunch theorem comes in: there is no guaranteed way to learn correctly.

Imagine a simple decision problem: to turn left or right on a forking path in a garden. (Perhaps only one of those directions leads to the “straight and narrow” – sorry, this is a Lenten series, gotta bring that in somewheres). At each fork in the road, there are two more potential paths than there were before.

A path that forks at each stage is like that problem where you double the number of pennies you give someone each day for a whole month. It starts with small change – first day a penny; the second, two, the third, four, and so on – but last day of the month, you’re shelling out ten million bucks – a billion pennies.

In this garden of forking paths, there are a billion possible destinations. But in the mind of an agent trying to learn what to do, the problem is even harder: there are also a billion intermediate steps, and at each point, the agent must make a decision, with two possible choices.

If you perfect knowledge and tried to write down a guidebook, it would have a billion entries, with a recommended decision at each point. But if you don’t have perfect knowledge, if you’re a learning agent, then your best option is to go into the garden and fill out that guidebook yourself.

This is almost inconceivably hard. If you imagine a library with every possible guidebook, one in which each book differed from every other by at least one decision out of those billions, then there are two to the power of a billion possible books – that’s a number with roughly three hundred million digits.

The only way to fill out the guidebook correctly is to visit all billion possible paths. If you can’t do that, then at some point, you’re going to need to guess the entries for the parts of the garden that you haven’t visited. And then it gets tricky, because there are two to the power of a billion possible gardens.

If you’re in a garden where the straight and narrow can be approximated by alternating left and right to stay near the middle, you might guess that outer entries of the table should turn inward, the far left turning right, and the far right turning left. But for all you know, more reward can be found further out.

The no free lunch theorem says that there is no principled way to fill in parts of the book you haven’t seen. At best, you can assume that parts of the garden you’ve seen are similar to the ones you haven’t, but if you could be in literally any possible garden, then those assumptions will inevitably fail.

What does this all mean for free will versus total depravity?

Well, first off, if you are an intelligent agent, then you can sample actions from your action space. The actions you can take aren’t good or evil, they’re decisions in your brain and actions of your body. Some of those actions can, by chance, be good ones; God has not so ordered the world to exclude the good.

And if you do good works and see that they are good, why, then, you could learn to do them again. There’s nothing preventing this; again, God has not so ordered the world to exclude the good. But there’s no guarantee that you’re going to learn the right lessons, and there lies the problem.

In deep reinforcement learning, we see this problem writ large. I teach robots the size of people how to navigate buildings meant for people, and while you think that would be simple, we often observe robot control policies that have completed thousands of successful runs suddenly run straight into a wall.

Deep learning systems do not generalize the way human beings would. While a human that learns to drive without hitting things in their hometown will often be able to transfer this skill when they go off for college, a robot moving to a new environment may expose strange “pathologies” in its behavior.

This is the meaning of “my thoughts are not your thoughts, neither are your ways my ways” in Scripture: even if a human being honestly chooses to believe in God, sincerely tries to do good, and accidentally gets it right, there is no guarantee that what they’ve learned from that experience will transfer.

In fact, it’s more likely to not transfer. Sins of pride, self-righteousness, scrupulousness, and intolerance lead us astray as much as temptations to indulge in things that are “lawful but not expedient”. We can turn to Scripture, to church Tradition, or to our own Reason to try to improve, but we’ll likely screw up.

This is why God’s grace is so important. God is actively and spiritually trying to help us come to believe, know and love him, and hopes that this love will prompt us to do the right thing, bringing the Kingdom of Heaven into being here on this Earth.

But across a broad spectrum of possible universes, it’s mathematically impossible for us to always get it right even if we’re trying really hard – literally the only way that we could actually be consistently good is to have perfectly omniscient knowledge of the entire future of the Universe – to actually be God.

We can’t be God. The position is taken. We don’t know what He knows, and we are going to screw it up. Fortunately He’s ordered the universe so it’s possible to get it right, He’s sent his Son as an example of how to get it right, and His Spirit acts in the world to give us the grace we need to actually get it right.

-the Centaur

Pictured: David Wolpert, who discovered one of the depressingly many No Free Lunch theorems.

The Total Depravity of the No Free Lunch Theorem

centaur

Leave a Reply Cancel reply