Press "Enter" to skip to content

Posts tagged as “Intelligence”

Announcing Logical Robotics

centaur 0

So, I'm proud to announce my next venture: Logical Robotics, a robot intelligence firm focused on making learning robots work better for people. My research agenda is to combine the latest advances of deep learning with the rich history of classical artificial intelligence, using human-robot interaction research and my years of experience working on products and benchmarking to help robots make a positive impact.

Recent advances in large language model planning, combined with deep learning of robotic skills, have enabled almost magical developments in explainable artificial intelligence, where it is now possible to ask robots to do things in plain language and for the robots to write their own programs to accomplish those goals, building on deep learned skills but reporting results back in plain language. But applying these technologies to real problems will require a deep understanding of both robot performance benchmarks to refine those skills and human psychological studies to evaluate how these systems benefit human users, particularly in the areas of social robotics where robots work in crowds of people.

Logical Robotics will begin accepting new clients in May, after my obligations to my previous employer have come to a close (and I have taken a break after 17 years of work at the Search Engine That Starts With a G). In the meantime, I am available to answer general questions about what we'll be doing; if you're interested, please feel free to drop me a line at via centaur at logicalrobotics.com or take a look at our website.

-the Centaur

phewww ….

centaur 0

... finally, a chance to catch a break.

It's been a difficult few weeks due to "the Kerfluffle" which I hope to blog about shortly (those on my LinkedIn have seen it already) but equally as much from a Stanford extension class I was taking on Deep Reinforcement Learning (XCS234 - speaking as an expert in this area seeking to keep my skills sharp, I can highly recommend it: I definitely learned some things, and according to the graphs, so did my programs).

Finally, that's over, and I have a moment to breathe.

And maybe start blogging again.

-the Centaur

Pictured: A mocha from Red Rock Cafe, excellent as always, and a learning curve from one of my programs from class (details suppressed since we're not supposed to share the assignments).

Once again, I’m running deep learning on a Macbook …

centaur 0

... and the trick to getting it working was, as usual, "working just a little bit harder than you want to". Shortly after my last post, I got REINFORCE, a classic reinforcement learning algorithm, successfully training on my local machine, with apparent learning for all three environments in the assignment (though whether my solution is able to reach the expected final level of performance or not is still an open question).

-the Centaur

Sales for Dakota, Jeremiah and Nicole!

centaur 0

So a number of sales are running this month featuring stories about Dakota, Jeremiah or Nicole.

First off, Dakota: on February 16-28 of this month, BLOOD ROCK will be back on sale for $0.99 wherever fine eBooks are sold - for example, Amazon and Barnes and Noble. BLOOD ROCK continues the story of Dakota and her weretiger daughter Cinnamon, facing their greatest challenge yet: getting Cinnamon into a good middle school. Oh, and there may also be magic graffiti, deadly fires, cranky werewolves, magical police investigators, and vampire romance. You should check it out.

Second, Jeremiah: the LATER series anthologies are on sale at DriveThruFiction through the 11th. Each features pairs of stories set in Jeremiah Willstone's time at Liberation Academy. We're adapting these for audio, so I feel this is a good way to get into this universe early!

Finally, Nicole. Everyone's favorite killer computer's debut story, SIBLING RIVALRY, is also on sale at DriveThruFiction: https://www.drivethrufiction.com/product/372242/Sibling-Rivalry .

Also, a few of my friends who are published through Thinking Ink Press also have books on sale at DriveThruFiction:

The DriveThruFiction sales ends February 11, so strike while the iron's hot!

-the Centaur

P.S. Also, while I'm under the hood, let me point out that the new WordPress editor continues not just to suck, but to get worse over time, as it was markedly harder to write a blog post on the Library of Dresan than the Jeremiah Willstone and Dakota Frost websites, both on older versions of WordPress. It's harder to edit text, insert links, navigate texts, add lists, or add tags, and the tags no longer show the relative category sizes. "Modern looking" is not a substitute for usability, and never will be. End of rant.

The Embodied AI Workshop is Tomorrow, Sunday, June 20th!

centaur 0
embodied AI workshop

What happens when deep learning hits the real world? Find out at the Embodied AI Workshop this Sunday, June 20th! We’ll have 8 speakers, 3 live Q&A sessions with questions on Slack, and 10 embodied AI challenges. Our speakers will include:

  • Motivation for Embodied AI Research
    • Hyowon Gweon, Stanford
  • Embodied Navigation
    • Peter Anderson, Google
    • Aleksandra Faust, Google
  • Robotics
    • Anca Dragan, UC Berkeley
    • Chelsea Finn, Stanford / Google
    • Akshara Rai, Facebook AI Research
  • Sim-2-Real Transfer
    • Sanja Fidler, University of Toronto, NVIDIA
      Konstantinos Bousmalis, Google

You can find us if you’re signed up to #cvpr2021, through our webpage embodied-ai.org or at the livestream on YouTube.

Come check it out!

-the Centaur

The Embodied AI Workshop at CVPR 2021

centaur 0
embodied AI workshop

Hail, fellow adventurers: to prove I do something more than just draw and write, I'd like to send out a reminder of the Second Embodied AI Workshop at the CVPR 2021 computer vision conference. In the last ten years, artificial intelligence has made great advances in recognizing objects, understanding the basics of speech and language, and recommending things to people. But interacting with the real world presents harder problems: noisy sensors, unreliable actuators, incomplete models of our robots, building good simulators, learning over sequences of decisions, transferring what we've learned in simulation to real robots, or learning on the robots themselves.

interactive vs social navigation

The Embodied AI Workshop brings together many researchers and organizations interested in these problems, and also hosts nine challenges which test point, object, interactive and social navigation, as well as object manipulation, vision, language, auditory perception, mapping, and more. These challenges enable researchers to test their approaches on standardized benchmarks, so the community can more easily compare what we're doing. I'm most involved as an advisor to the Stanford / Google iGibson Interactive / Social Navigation Challenge, which forces robots to maneuver around people and clutter to solve navigation problems. You can read more about the iGibson Challenge at their website or on the Google AI Blog.

the iGibson social navigation environment

Most importantly, the Embodied AI Workshop has a call for papers, with a deadline of TODAY.

Call for Papers

We invite high-quality 2-page extended abstracts in relevant areas, such as:

  •  Simulation Environments
  •  Visual Navigation
  •  Rearrangement
  •  Embodied Question Answering
  •  Simulation-to-Real Transfer
  •  Embodied Vision & Language

Accepted papers will be presented as posters. These papers will be made publicly available in a non-archival format, allowing future submission to archival journals or conferences.

The submission deadline is May 14th (Anywhere on Earth). Papers should be no longer than 2 pages (excluding references) and styled in the CVPR format. Paper submissions are now open.

I assume anyone submitting to this already has their paper well underway, but this is your reminder to git'r done.

-the Centaur

Wrong, Wrong, Wrong

centaur 0
chipman, pruitt, bolton tl;dr: Opponents of things should never be appointed to oversee them. So President Biden has nominated David Chipman to lead the ATF - and he was wrong to do so. It's not that Chipman isn't qualified to lead the ATF - he's a 25-year ATF veteran. It's that Chipman is explicitly disqualified to lead an agency that oversees firearms - because he's a gun control advocate. It's not that he can't be trusted to make good decisions: he can be trusted to make bad ones. Previous presidents have made the same mistake. To lead the EPA, Donald Trump nominated Scott Pruitt, a noted environmental skeptic who had sued the EPA. To serve as ambassador to the UN, George Bush nominated John Bolton, a noted United Nations skeptic who said that it does not really exist. Political cards on the table: I voted for Joe Biden, and I'm happy with him. And while I'm a gun rights proponent - if the Second Amendment didn't exist, we'd need to invent it - I recognize both the need for and constitutional legitimacy of gun legislation, which shouldn't be set in stone as our society evolves. But intellectual and moral integrity demands that if I call an opponent out for their misbehavior, that I also call out allies for the same behavior. Calling out misbehavior only on one side is worse than hypocritical: it undermines trust in the political system, and encourages further distorted value judgments. And humans are great at distorting value judgments when emotions are involved. From the most basic arguments all the way up to the most complex adjudication of fact and law, our moods and emotions affect whether we judge something to be true or false. In a way, we should expect this: researchers like Antonio Damasio have shown that rational decision making breaks down in people whose emotions are impaired, because the value judgments provided by our emotions are necessary for making mental decisions. But a functioning emotional system can also lead us astray: emotions can impair our judgments. Studies show we're more likely to screw up simple if-then syllogisms if they're emotionally charged. Even judges, trained to be impartial, are more likely to make mistakes with legal arguments on "hot" political topics. Heightened emotion distorts perceptions, leads us to attribute our feelings to arbitrary targets we come across, and reduces self-control - precisely what you don't want to have in someone who needs to make impartial decisions about something, and precisely what you do have in the person of a political activist. Now, I'm not questioning Chipman or Bolton's integrity (Pruitt's lack of integrity is well documented, down to his sound-proof booth), or Chipman or Bolton or Pruitt's patriotism, or their expertise. But all three of them are interested enough in the areas they later oversaw to have gone into them as opponents. In our public life, there is politics, and there is civics, and the two should not mix. Politics literally means deciding how to allocate scarce resources, and it is right and expected for us to dive in rough and tumble to ask for what we want - a participatory political system grants moral authority to a government. But government's purpose is to bring the use of force under rational control, and more broadly, to allocate resources correctly when policy has been made. Inevitably, decisions will need to be made on matters of fact at an agency - and a political partisan can be trusted to screw them up even if they're trying not to. When a partisan appoints a opponent of something to oversee it, the person that they've appointed will, very likely, whether they want to or not, "lean their hand on the till" to make things come out for their own partisan ends - meaning they will, sooner or later, fail in their civic duty to make an honest decision. If you're passionate about something, you might feel that it's all right to put a partisan in charge of it,  because then you'll get what you want. But that's evil, on two grounds - first of all, because you are subverting the political process to get a result through the back door that you can't through the front. But more importantly, impartial decisions will need to be made - and by putting a partisan in charge, you're explicitly hoping for them to make a wrong decision to help implement your political desires. Tyrants, bigots and the corrupt throughout history have employed the same tactic. Stop doing it. Regardless of our political desires, we need to step back and decouple our understanding of people into (at least) two parts: their politics, and their competence. If their political orientation isn't a direct conflict of interest for to the matter at hand, their basic competence is the primary qualification for doing the job. I was happy when Trump picked Bolton as National Security Advisor: whether I agree with their politics or not, Bolton had the experience to do the job and the attitude towards the job to do it right. Bush should never have appointed Bolton to the UN: even when he made the right decisions, we couldn't trust them. I might not have agreed with Scott Pruitt politically, but as a lawyer and state Senator, he was well qualified to be Attorney General of Oklahoma. It was morally wrong for Donald Trump to appoint a climate change denier to lead the EPA, and, predictably, that led to Pruitt lying about climate issues. I thank David Chipman for his service at the ATF, and would approve of his nomination to another agency. But the moment that he joined a political movement against guns, he disqualified himself from overseeing gun law enforcement, and if confirmed, he will inevitably make some serious mistakes. -the Centaur Pictured: Chipman, Pruitt, Bolton

Pascal’s Wager and Purchasing Parsley

centaur 1
pascal headshot

Hang out with philosophers or theologians long enough, you're likely to run into "Pascal's Wager": the Blaise Pascal's idea that you should believe in God, because if He exists, betting on Him wins you everything and betting against Him loses you everything, whereas if He doesn't, you lose nothing.

Right off the bat, we can see this original version of the wager is an intellectually dishonest argument: you don't "lose nothing" if you choose to believe that God exists and He doesn't. At best, you're being credulous; at worst, if you're being cynical about your belief, you're sacrificing your intellectual integrity.

Pascal backs off from all or nothing a bit as he's trying to dig himself out of the hole, claiming that he's comparing infinite gains of eternity in heaven against finite losses you can experience here on Earth. Some may have sincere trouble in believing, but he argues they should try to convince themselves.

Now, let's be fair to Pascal here: if you read his original text, he wasn't actually trying to convince atheists to believe per se, but instead, trying to show that the world is too uncertain for logical proofs of the existence of God, but we're probably better off acting like God exists, in case it moves us to faith.

Unfortunately, Pascal died before he could fully explain himself: the wager appears to be the introduction of a book on the value of faith that he never finished. But, like a philosophical zombie, the argument has continued its life, hollowed out from its original intent, eating brains in every new generation.

Let's slay this zombie, shall we?

Pascal's wager first appears to be an exercise in game theory: a mathematical formalism for analyzing the best choices in games. In this case, you are playing a game against the Cosmos. Your move is to believe, or not, and the Cosmos's "move" is whether God exists, or not.

[Now, the theologically savvy among you might feel like pointing out that God created Creation, and is not a part of it - which is why I used Carl Sagan's more inclusive formulation of the Cosmos as "all that is, was, and ever shall be," and I'm going to run you off with a broom if you argue about what "is" means].

This leads to a simple table: your choice of belief times the existence of God. If He is, and you choose to believe: payout plus infinity; choose not to believe: payout minus infinity. If He is not, whether you choose to believe or not, the payout is zero, or at least finite. Pick the cell with the highest value.

The emotional force of this argument is strong - for the believer - for, in decision theory, we should weigh the probability of one cell against the other, and intuitively, unless we judge the possibility of God to be literally zero, the infinite payout of the God-exists column dominates finite payouts of God-doesn't.

Mathematically, that's, um, specious at best - it looks true, but it's not a valid decision-theoretic argument. First off, Pascal put infinity in the God column specifically to outweigh any possible finite payout, but technically, we can't multiply infinite quantities by finite quantities this way.

Now, when it comes down to the question of whether infinities are actually real, or just a bad metaphor that leads people astray, I'm firmly ready to go to infinity - and beyond! But, technically mathematically, most of the time "infinity" is just a stand in for "this process can go on indefinitely without a limit."

As soon as you admit that the payout of Heaven might be finite for the purposes of modeling, then the probability assigned to the "God exists" column can be set so low that the "God doesn't" column becomes attractive. But that gets us no further than Pascal and his strict (zero-probability) unbelievers.

To me, the key flaw in Pascal's wager is what physicist E. T. Jaynes called the "mind projection fallacy": assuming that the constructs you're using in your mental models exist in reality. That's how Pascal can even put the wager to someone in the first place: he sets up the board and says "you must wager".

But the gameboard Pascal sets up doesn't exist in reality, and there's no reason for someone else to model the problem the same way. A student of religion might add columns for different views of God: Jesus who saves, Zeus who's a jerk, the Great Electron, which doesn't judge, but just is, whoa whoa.

Equally well, a student of epistemology might add many columns for belief: strict disbelief, partial belief, certain belief; an evangelical might add columns for "the hope so's" and "the know so's". Even the probabilities of columns are up for grabs. We've got a matrix of confusing possibilities.

This flaw in the wager, like the flaws in much science and folk psychology about belief, is that we do not reason about facts provided by others according to the models in the other's head: we reason about the claims that others make about facts, which we internalize based on own beliefs - and trust of the other.

Even in the simplest form, moment you start counting the columns of the wager as beliefs, the infinities disappear: there's only a claim of infinite goods in heaven, and a claim of infinite punishment in hell - and a claim that the alternative yields you only finite rewards.

And those claims are mixed in with everything else we know. As a mathematical exercise, the self-contained four-cell version of the wager has a maximum payout in the "believe in a God who exists" cell; as something that corresponds to reality, the cells of the wager start to leak.

Mathematics is an abstraction of reality - an act of creative human imagination to create repeatable forms of reasoning. I'm on the side that there is an actual reality behind this repeatability of mathematics, or it would not work; but applying mathematics to any particular problem must leave out certain details.

This is leads to the law of leaky abstractions: the notion that, no matter how good the abstraction, sooner or later it is going to fail to model the world. Forget game theory, decision matrices, and probabilities: even something as simple as the mathematical concept of number can break down.

One of the reasons I haven't published my tabbouleh recipe is that it's hard to quantify the ingredients - two bunches of parsley, four bunches of scallions, six tomatoes, two cups of fine bulgur, the juice of a lemon, etc - but since tomatoes are of different sizes, that "six" is a messy number.

But at least tomatoes come in integral quantities. Parsley comes in bunches, which are not just of different sizes; they're composed of individual stems, picked from different plants, which have different degrees of growth, freshness and wilt. Parsley needs to be cleaned and picked to use in tabbouleh.

Sometimes, you need to buy three bunches of parsley in order to end up with two. That's the law of leaky abstractions for you: you have to purchase parsley in integral units of bunches, but the bunches themselves don't correspond to the quantities that you can actually use in your recipe.

Picking beliefs for use in our minds is far more complicated than assembling a heritage Lebanese salad. There are thousands of potential facts affecting any given problem, more intertwined than the branching leaves of those leafy greens; but like them, some are fresh and edible, others black and wilted.

This was the actual point of Pascal's argument, the one he hoped to expound on his unfinished book. But the wager, because it's a mathematical abstraction - because it's repeatable reasoning - has lived on, a zombie argument which purports to give a rational reason why you should believe in God.

Ultimately, we need to carefully winnow through information that we get from others before incorporating it into our beliefs; there is no royal road to convincing anyone of anything, much less God. As for belief in God, many Christians think that must ultimately come not from reason, but from grace.

Fortunately, God gives that gift of belief for free, if we want it.

-the Centaur

Pictured: Blaise Pascal.

The Total Depravity of the No Free Lunch Theorem

centaur 0
wolpert headshot

Many Christians believe that we can only do good by the grace of God. In its most extreme form, this theory of "total depravity" suggests that we are literally incapable of choosing the good, choosing to follow God, or to even believe in Him without His direct supernatural aid, offered as a free gift.

Total depravity is false, but it contains an important truth about why we need God's help not to screw up.

In artificial intelligence, we model smart things like people as "intelligent agents". An agent, broadly stated, is something that exists in a larger environment, observing situations, taking actions, and receiving rewards - a bit like the entities navigating through Markov decision processes last time.

But agents are a broader concept, not strictly tied to the Markov property: anything that makes decisions about actions in a larger environment can be an agent. The line between agent and environment can be clear, as with humans contained within our skins; or it might be fuzzy, like a control system for a factory.

While the idea of "intelligence" is fuzzy, one of the things that makes an agent smart is rational behavior - making the right choices. Another thing that makes an agent smart is learning - improving your behavior in the future based on the experiences that you've had in the past.

The field I work in, deep reinforcement learning, focuses on building learning agents that improve their rationality based on their experiences, generally within a partially-observable Markov decision process in which it's reasonably clear what counts as rational, even if the agent can't clearly see the whole world.

This "partial observability" is one real-world limitation that virtually all agents in Creation share. Robot sensors have a limited range, the factory controller doesn't have a sensor on all its circuits, and we can't see behind our own heads (hey, there's a creepy man standing behind you right now - don't look!)

Partial observability means we need to make the best decisions we can based on the information that is available to us. We look both ways at a crosswalk to try to reduce our uncertainty, waiting if a car is coming, and we call out "corner" in a restaurant kitchen to try to reduce the uncertainty of others.

Obviously, if you don't know which door holds the lady or the tiger, it's hard to pick. But even if an agent had perfect knowledge of the current state of the world around it - not that current state is well-defined in general relativity / quantum mechanics, but nevermind - making perfectly correct decisions is impossible.

Well, not necessarily impossible: a perfectly omniscient agent could make perfectly optimal decisions, because it would know the true value of each action, not just its immediate reward. But without that kind of revelation of information from the future, we can only learn from our past experiences.

And that's where the no free lunch theorem comes in: there is no guaranteed way to learn correctly.

Imagine a simple decision problem: to turn left or right on a forking path in a garden. (Perhaps only one of those directions leads to the "straight and narrow" - sorry, this is a Lenten series, gotta bring that in somewheres). At each fork in the road, there are two more potential paths than there were before.

A path that forks at each stage is like that problem where you double the number of pennies you give someone each day for a whole month. It starts with small change - first day a penny; the second, two, the third, four, and so on - but last day of the month, you're shelling out ten million bucks - a billion pennies.

In this garden of forking paths, there are a billion possible destinations. But in the mind of an agent trying to learn what to do, the problem is even harder: there are also a billion intermediate steps, and at each point, the agent must make a decision, with two possible choices.

If you perfect knowledge and tried to write down a guidebook, it would have a billion entries, with a recommended decision at each point. But if you don't have perfect knowledge, if you're a learning agent, then your best option is to go into the garden and fill out that guidebook yourself.

This is almost inconceivably hard. If you imagine a library with every possible guidebook, one in which each book differed from every other by at least one decision out of those billions, then there are two to the power of a billion possible books - that's a number with roughly three hundred million digits.

The only way to fill out the guidebook correctly is to visit all billion possible paths. If you can't do that, then at some point, you're going to need to guess the entries for the parts of the garden that you haven't visited. And then it gets tricky, because there are two to the power of a billion possible gardens.

If you're in a garden where the straight and narrow can be approximated by alternating left and right to stay near the middle, you might guess that outer entries of the table should turn inward, the far left turning right, and the far right turning left. But for all you know, more reward can be found further out.

The no free lunch theorem says that there is no principled way to fill in parts of the book you haven't seen. At best, you can assume that parts of the garden you've seen are similar to the ones you haven't, but if you could be in literally any possible garden, then those assumptions will inevitably fail.

What does this all mean for free will versus total depravity?

Well, first off, if you are an intelligent agent, then you can sample actions from your action space. The actions you can take aren't good or evil, they're decisions in your brain and actions of your body. Some of those actions can, by chance, be good ones; God has not so ordered the world to exclude the good.

And if you do good works and see that they are good, why, then, you could learn to do them again. There's nothing preventing this; again, God has not so ordered the world to exclude the good. But there's no guarantee that you're going to learn the right lessons, and there lies the problem.

In deep reinforcement learning, we see this problem writ large. I teach robots the size of people how to navigate buildings meant for people, and while you think that would be simple, we often observe robot control policies that have completed thousands of successful runs suddenly run straight into a wall.

Deep learning systems do not generalize the way human beings would. While a human that learns to drive without hitting things in their hometown will often be able to transfer this skill when they go off for college, a robot moving to a new environment may expose strange "pathologies" in its behavior.

This is the meaning of "my thoughts are not your thoughts, neither are your ways my ways" in Scripture: even if a human being honestly chooses to believe in God, sincerely tries to do good, and accidentally gets it right, there is no guarantee that what they've learned from that experience will transfer.

In fact, it's more likely to not transfer. Sins of pride, self-righteousness, scrupulousness, and intolerance lead us astray as much as temptations to indulge in things that are "lawful but not expedient". We can turn to Scripture, to church Tradition, or to our own Reason to try to improve, but we'll likely screw up.

This is why God's grace is so important. God is actively and spiritually trying to help us come to believe, know and love him, and hopes that this love will prompt us to do the right thing, bringing the Kingdom of Heaven into being here on this Earth.

But across a broad spectrum of possible universes, it's mathematically impossible for us to always get it right even if we're trying really hard - literally the only way that we could actually be consistently good is to have perfectly omniscient knowledge of the entire future of the Universe - to actually be God.

We can't be God. The position is taken. We don't know what He knows, and we are going to screw it up. Fortunately He's ordered the universe so it's possible to get it right, He's sent his Son as an example of how to get it right, and His Spirit acts in the world to give us the grace we need to actually get it right.

-the Centaur

Pictured: David Wolpert, who discovered one of the depressingly many No Free Lunch theorems.

Original Sin and Markov Decision Processes

centaur 0
markov headshot

Original Sin is the idea that all humans are irretrievably flawed by one bad decision made by Adam in the Garden of Eden. One bite of that apple (well, it wasn't an apple, but nevermind), broke Creation in the Fall, corrupted everyone's souls from birth, leading to the requirement of baptism to liberate us.

But the Fall didn't happen. The universe is not broken, but is unimaginably old and vast. The evolution of humans on the earth is one story out of myriads. The cosmology of the early Hebrews recorded in Genesis is myth - myth in the Catholic sense, a story, not necessarily true, designed to teach a lesson.

What lessons does Genesis teach, then?

Well, first off, that God created the universe; that it is well designed for life; that humanity is an important part of that creation; and that humans are vulnerable to temptation. Forget the Fall: the story of the serpent shows that humans out of the box can make shortsighted decisions that go horribly wrong.

But what's the cause of this tendency to sin, if it isn't a result of one bad choice in the Fall? The answer is surprisingly deep: it's a fundamental flaw in the decision making process, a mathematical consequence of how we make decisions in a world where things change as a result of our choices.

Artificial intelligence researchers often model how we make choices using Markov decision processes - the idea that we can model the world as a sequence of states - I'm at my desk, or in the kitchen, without a soda - in which we can take actions - like getting a Coke Zero from the fridge - and get rewards.

Ah, refreshing.

Markov decision processes are a simplification of the real world. They assume time steps discretely, that states and actions are drawn from known sets, and the reward is a number. Most important is the Markov property: the idea that history doesn't matter: only the current state dictates the result of an action.

Despite these simplifications, Markov decision processes expose many of the challenges of learning to act in the world. Attempts to make MDP more realistic - assuming time is continuous, or states are only partially observable, or multidimensional rewards - only make the problem more challenging, not less.

Hm, I've finished that soda. It was refreshing. Time for another?

Good performance at MDPs is hard because we can only observe our current state: you can't be at two places or two times at once. The graph of states of an MDP is not a map of locations you can survey, but a set of possible moments in time which we may or may not reach as a result of our choices.

In an earlier essay, I described navigating this graph like trying to traverse a minefield, but it's worse, since there's no way to survey the landscape. The best you can do is to enumerate the possible actions in your current state and model what might happen, like waving a metal detector over the ground.

Should I get a Cherry Coke Zero, or a regular?

This kind of local decision making is sometimes called reactive, because we're just reacting to what's right in front of us, and it's also called greedy, because we're choosing the best actions out of the information available in the current state, despite what might come two or three steps later.

If you took the wrong path in a minefield, even if you don't get blown up, you might go down a bad path, forcing you to backtrack ... or wandering into the crosshairs of the badguys hiding in a nearby bunker. A sequence of locally good actions can lead us to a globally suboptimal outcome.

Excuse me for a moment. After drinking all those sodas, I need a bio break.

That's the problem of local decision making: if you exist in a just very slightly complicated world - say, one where the locally optimal action of having a cool fizzy soda can lead to a bad outcome three steps later like bathroom breaks and a sleepless night - then those local choices can lead you astray.

The most extreme example is a Christian one. Imagine you have two choices: a narrow barren road versus a lovely garden path. Medieval Christian writers loved to show that the primrose path led straight to the everlasting bonfire, whereas the straight and narrow led to Paradise.

Or, back to the Garden of Eden, where eating the apple gave immediate knowledge and long-term punishment, and not eating it would have kept them in good grace with God. This is a simple two-stage, two-choice Markov decision process, in which the locally optimal action leads to a worse reward.

The solution to this problem is to not use a locally greedy policy operating over the reward given by each action, but to instead model the long-term reward of sequences of actions over the entire space, and to develop a global decision policy which takes in account the true ultimate value of each action.

Global decision policies sometimes mean delaying gratification. To succeed at life, we often need to do the things which are difficult right now, like skipping dessert, in favor of getting more reward later, like seeing the numbers on your scale going back down to their pre-Covid numbers.

Global decision policies also resemble moral rules. Whether based on revelation from God, as discussed in an earlier essay, or based on the thinking of moral philosophers, or just the accumulated knowledge of a culture, our moral rules provide us a global decision policy that helps us avoid bad consequences.

The flaw in humanity which inspired Original Sin and is documented in the Book of Genesis is simply this: we're finite beings that exist in a single point in time and can't see the long-term outcome of our choices. To make good decisions, we must develop global policies which go beyond what we see.

Or, for a Christian, we must trust God to give us moral standards to guide us towards the good.

-the Centaur

Pictured: Andrey Markov.