I<tab-complete> welcome our new robot overlords.

Hoisted from a recent email exchange with my friend Gordon Shippey:

Re: Whassap?
Sounds like a plan.
(That was an actual GMail suggested response. Grumble-grumble AI takeover.)

I<tab-complete> welcome our new robot overlords.

I am constantly amazed by the new autocomplete. While, anecdotally, autocorrect of spell checking is getting worse and worse (I blame the nearly-universal phenomenon of U-shaped development, where a system trying to learn new generalizations gets worse before it gets better), I have written near-complete emails to friends and colleagues with Gmail’s suggested responses, and when writing texts to my wife, it knows our shorthand!

One way of doing this back in the day were Markov chain text models, where we learn predictions of what patterns are likely to follow each other; so if I write “love you too boo boo” to my wife enough times, it can predict “boo boo” will follow “love you too” and provide it as a completion. More modern systems use recurrent neural networks to learn richer sets of features with stateful information carried down the chain, enabling modern systems to capture subtler relationships and get better results, as described in the great article  “The Unreasonable Effectiveness of Recurrent Neural Networks“.

-the<tab-complete> Centaur


PRM-RL Won a Best Paper Award at ICRA!

So, this happened! Our team’s paper on “PRM-RL” – a way to teach robots to navigate their worlds which combines human-designed algorithms that use roadmaps with deep-learned algorithms to control the robot itself – won a best paper award at the ICRA robotics conference!

I talked a little bit about how PRM-RL works in the post “Learning to Drive … by Learning Where You Can Drive“, so I won’t go over the whole spiel here – but the basic idea is that we’ve gotten good at teaching robots to control themselves using a technique called deep reinforcement learning (the RL in PRM-RL) that trains them in simulation, but it’s hard to extend this approach to long-range navigation problems in the real world; we overcome this barrier by using a more traditional robotic approach, probabilistic roadmaps (the PRM in PRM-RL), which build maps of where the robot can drive using point to point connections; we combine these maps with the robot simulator and, boom, we have a map of where the robot thinks it can successfully drive.

We were cited not just for this technique, but for testing it extensively in simulation and on two different kinds of robots. I want to thank everyone on the team – especially Sandra Faust for her background in PRMs and for taking point on the idea (and doing all the quadrotor work with Lydia Tapia), for Oscar Ramirez and Marek Fiser for their work on our reinforcement learning framework and simulator, for Kenneth Oslund for his heroic last-minute push to collect the indoor robot navigation data, and to our manager James for his guidance, contributions to the paper and support of our navigation work.

Woohoo! Thanks again everyone!

-the Centaur

Why I’m Solving Puzzles Right Now

When I was a kid (well, a teenager) I’d read puzzle books for pure enjoyment. I’d gotten started with Martin Gardner’s mathematical recreation books, but the ones I really liked were Raymond Smullyan’s books of logic puzzles. I’d go to Wendy’s on my lunch break at Francis Produce, with a little notepad and a book, and chew my way through a few puzzles. I’ll admit I often skipped ahead if they got too hard, but I did my best most of the time.

I read more of these as an adult, moving back to the Martin Gardner books. But sometime, about twenty-five years ago (when I was in the thick of grad school) my reading needs completely overwhelmed my reading ability. I’d always carried huge stacks of books home from the library, never finishing all of them, frequently paying late fees, but there was one book in particular – The Emotions by Nico Frijda – which I finished but never followed up on.

Over the intervening years, I did finish books, but read most of them scattershot, picking up what I needed for my creative writing or scientific research. Eventually I started using the tiny little notetabs you see in some books to mark the stuff that I’d written, a “levels of processing” trick to ensure that I was mindfully reading what I wrote.

A few years ago, I admitted that wasn’t enough, and consciously  began trying to read ahead of what I needed to for work. I chewed through C++ manuals and planning books and was always rewarded a few months later when I’d already read what I needed to to solve my problems. I began focusing on fewer books in depth, finishing more books than I had in years.

Even that wasn’t enough, and I began – at last – the re-reading project I’d hoped to do with The Emotions. Recently I did that with Dedekind’s Essays on the Theory of Numbers, but now I’m doing it with the Deep Learning. But some of that math is frickin’ beyond where I am now, man. Maybe one day I’ll get it, but sometimes I’ve spent weeks tackling a problem I just couldn’t get.

Enter puzzles. As it turns out, it’s really useful for a scientist to also be a science fiction writer who writes stories about a teenaged mathematical genius! I’ve had to simulate Cinnamon Frost’s staggering intellect for the purpose of writing the Dakota Frost stories, but the further I go, the more I want her to be doing real math. How did I get into math? Puzzles!

So I gave her puzzles. And I decided to return to my old puzzle books, some of the ones I got later but never fully finished, and to give them the deep reading treatment. It’s going much slower than I like – I find myself falling victim to the “rule of threes” (you can do a third of what you want to do, often in three times as much time as you expect) – but then I noticed something interesting.

Some of Smullyan’s books in particular are thinly disguised math books. In some parts, they’re even the same math I have to tackle in my own work. But unlike the other books, these problems are designed to be solved, rather than a reflection of some chunk of reality which may be stubborn; and unlike the other books, these have solutions along with each problem.

So, I’ve been solving puzzles … with careful note of how I have been failing to solve puzzles. I’ve hinted at this before, but understanding how you, personally, usually fail is a powerful technique for debugging your own stuck points. I get sloppy, I drop terms from equations, I misunderstand conditions, I overcomplicate solutions, I grind against problems where I should ask for help, I rabbithole on analytical exploration, and I always underestimate the time it will take for me to make the most basic progress.

Know your weaknesses. Then you can work those weak mental muscles, or work around them to build complementary strengths – the way Richard Feynman would always check over an equation when he was done, looking for those places where he had flipped a sign.

Back to work!

-the Centaur

Pictured: my “stack” at a typical lunch. I’ll usually get to one out of three of the things I bring for myself to do. Never can predict which one though.

Nailed It (Sorta)

Here’s what was in the rabbit hole from last time (I had been almost there):

I had way too much data to exploit, so I started to think about culling it out, using the length of the “mumbers” to cut off all the items too big to care about. That led to the key missing insight: my method of mapping mumbers mapped the first digit of each item to the same position – that is, 9, 90, 900, 9000 all had the same angle, just further out. This distance was already a logarithm of the number, but once I dropped my resistance to taking the logarithm twice…

… then I could create a transition plot function which worked for almost any mumber in the sets of mumbers I was playing with …

Then I could easily visualize the small set of transitions – “mumbers” with 3 digits – that yielded the graph above; for reference these are:

The actual samples I wanted to play with were larger, like this up to 4 digits:

This yields a still visible graph:

And this, while it doesn’t let me visualize the whole space that I wanted, does provide the insight I wanted. The “mumbers” up to 10000 do indeed “produce” most of the space of the smaller “mumbers” (not surprising, as the “mumber” rule 2XYZ produces XYZ, and 52XY produces XYXY … meaning most numbers in the first 10,000 will be produced by one in that first set). But this shows that sequences of 52 rule transitions on the left produce a few very, very large mumbers – probably because 552552 produces 552552552552 which produces 552552552552552552552552552552552552 which quickly zooms away to the “mumberOverflow” value at the top of my chart.

And now the next lesson: finishing up this insight, which more or less closes out what I wanted to explore here, took 45 minutes. I had 15 allotted to do various computer tasks before leaving Aqui, and I’m already 30 minutes over that … which suggests again that you be careful going down rabbit holes; unlike leprechaun trails, there isn’t likely to be a pot of gold down there, and who knows how far down it can go?

-the Centaur

P.S. I am not suggesting this time spent was not worthwhile; I’m just trying to understand the option cost of various different problem solving strategies so I can become more efficient.

Don’t Fall Into Rabbit Holes

SO! There I was, trying to solve the mysteries of the universe, learn about deep learning, and teach myself enough puzzle logic to create credible puzzles for the Cinnamon Frost books, and I find myself debugging the fine details of a visualization system I’ve developed in Mathematica to analyze the distribution of problems in an odd middle chapter of Raymond Smullyan’s The Lady or the Tiger.

I meant well! Really I did. I was going to write a post about how finding a solution is just a little bit harder than you normally think, and how insight sometimes comes after letting things sit.

But the tools I was creating didn’t do what I wanted, so I went deeper and deeper down the rabbit hole trying to visualize them.

The short answer seems to be that there’s no “there” there and that further pursuit of this sub-problem will take me further and further away from the real problem: writing great puzzles!

I learned a lot – about numbers, about how things could combinatorially explode, about Ulam Spirals and how to code them algorithmically. I even learned something about how I, particularly, fail in these cases.

But it didn’t provide the insights I wanted. Feynman warned about this: he called it “the computer disease”, worrying about the formatting of the printout so much you forget about the answer you’re trying to produce, and it can strike anyone in my line of work.

Back to that work.

-the Centaur

Learning to Drive … by Learning Where You Can Drive

I often say “I teach robots to learn,” but what does that mean, exactly? Well, now that one of the projects that I’ve worked on has been announced – and I mean, not just on arXiv, the public access scientific repository where all the hottest reinforcement learning papers are shared, but actually, accepted into the ICRA 2018 conference – I  can tell you all about it!

When I’m not roaming the corridors hammering infrastructure bugs, I’m trying to teach robots to roam those corridors – a problem we call robot navigation. Our team’s latest idea combines “traditional planning,” where the robot tries to navigate based on an explicit model of its surroundings, with “reinforcement learning,” where the robot learns from feedback on its performance.

For those not in the know, “traditional” robotic planners use structures like graphs to plan routes, much in the same way that a GPS uses a roadmap. One of the more popular methods for long-range planning are probabilistic roadmaps, which build a long-range graph by picking random points and attempting to connect them by a simpler “local planner” that knows how to navigate shorter distances. It’s a little like how you learn to drive in your neighborhood – starting from landmarks you know, you navigate to nearby points, gradually building up a map in your head of what connects to what.

But for that to work, you have to know how to drive, and that’s where the local planner comes in. Building a local planner is simple in theory – you can write one for a toy world in a few dozen lines of code – but difficult in practice, and making one that works on a real robot is quite the challenge. These software systems are called “navigation stacks” and can contain dozens of components – and in my experience they’re hard to get working and even when you do, they’re often brittle, requiring many engineer-months to transfer to new domains or even just to new buildings.

People are much more flexible, learning from their mistakes, and the science of making robots learn from their mistakes is reinforcement learning, in which an agent learns a policy for choosing actions by simply trying them, favoring actions that lead to success and suppressing ones that lead to failure. Our team built a deep reinforcement learning approach to local planning, using a state-of-the art algorithm called DDPG (Deep Deterministic Policy Gradients) pioneered by DeepMind to learn a navigation system that could successfully travel several meters in office-like environments.

But there’s a further wrinkle: the so-called “reality gap“. By necessity, the local planner used by a probablistic roadmap is simulated – attempting to connect points on a map. That simulated local planner isn’t identical to the real-world navigation stack running on the robot, so sometimes the robot thinks it can go somewhere on a map which it can’t navigate safely in the real world. This can have disastrous consequences – causing robots to tumble down stairs, or, worse, when people follow their GPSes too closely without looking where they’re going, causing cars to tumble off the end of a bridge.

Our approach, PRM-RL, directly combats the reality gap by combining probabilistic roadmaps with deep reinforcement learning. By necessity, reinforcement learning navigation systems are trained in simulation and tested in the real world. PRM-RL uses a deep reinforcement learning system as both the probabilistic roadmap’s local planner and the robot’s navigation system. Because links are added to the roadmap only if the reinforcement learning local controller can traverse them, the agent has a better chance of attempting to execute its plans in the real world.

In simulation, our agent could traverse hundreds of meters using the PRM-RL approach, doing much better than a “straight-line” local planner which was our default alternative. While I didn’t happen to have in my back pocket a hundred-meter-wide building instrumented with a mocap rig for our experiments, we were able to test a real robot on a smaller rig and showed that it worked well (no pictures, but you can see the map and the actual trajectories below; while the robot’s behavior wasn’t as good as we hoped, we debugged that to a networking issue that was adding a delay to commands sent to the robot, and not in our code itself; we’ll fix this in a subsequent round).

This work includes both our group working on office robot navigation – including Alexandra Faust, Oscar Ramirez, Marek Fiser, Kenneth Oslund, me, and James Davidson – and Alexandra’s collaborator Lydia Tapia, with whom she worked on the aerial navigation also reported in the paper.  Until the ICRA version comes out, you can find the preliminary version on arXiv:

PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning

We present PRM-RL, a hierarchical method for long-range navigation task completion that combines sampling-based path planning with reinforcement learning (RL) agents. The RL agents learn short-range, point-to-point navigation policies that capture robot dynamics and task constraints without knowledge of the large-scale topology, while the sampling-based planners provide an approximate map of the space of possible configurations of the robot from which collision-free trajectories feasible for the RL agents can be identified. The same RL agents are used to control the robot under the direction of the planning, enabling long-range navigation. We use the Probabilistic Roadmaps (PRMs) for the sampling-based planner. The RL agents are constructed using feature-based and deep neural net policies in continuous state and action spaces. We evaluate PRM-RL on two navigation tasks with non-trivial robot dynamics: end-to-end differential drive indoor navigation in office environments, and aerial cargo delivery in urban environments with load displacement constraints. These evaluations included both simulated environments and on-robot tests. Our results show improvement in navigation task completion over both RL agents on their own and traditional sampling-based planners. In the indoor navigation task, PRM-RL successfully completes up to 215 meters long trajectories under noisy sensor conditions, and the aerial cargo delivery completes flights over 1000 meters without violating the task constraints in an environment 63 million times larger than used in training.


So, when I say “I teach robots to learn” … that’s what I do.

-the Centaur

Welcome to the Future


Welcome to the future, ladies and gentlemen. Here in the future, the obscure television shows of my childhood rate an entire section in the local bookstore, which combines books, games, music, movies, and even vinyl records with a coffeehouse and restaurant.


Here in the future, the heretofore unknown secrets of my discipline, artificial intelligence, are now conveniently compiled in compelling textbooks that you can peruse at your leisure over a cup of coffee.


Here in the future, genre television shows play on the monitors of my favorite bar / restaurant, and the servers and I have meaningful conversations about the impact of robotics on the future of labor.


And here in the future, Monty Python has taken over the world.

Perhaps that explains 2016.

-the Centaur

The Two Fear Channels


Hoisted from a recent email thread with the estimable Jim Davies:

“You wrote to me once that the brain has two fear channels, cognitive and reactive. Do you have a citation I can look at for an introduction to that idea?”

So I didn’t have a citation off the top of my head, though I do now – LeDoux’s 1998 book The Emotional Brain – but I did remember what I told Jim: that we have two fear channels, one fast, one slow. The fast one is primarily sensory, reactive, and can learn bad associations which are difficult to unlearn, as in PTSD (post-traumatic stress disorder); the slow one is more cognitive, deliberative, and has intellectual fear responses.

It turns out that it ain’t that simple, but I was almost right. Spoiling the lead a bit, there are two conditioned fear channels, the fast “low road” and slow “high road” and they do function more or less as I described: the low road has quick reactions to stimuli, a direct hotline from sensory processing in your thalamus to the amygdala which is a clearinghouse for emotional information; the high road involves the sensory cortex and confirms the quick reaction of the low road. The low road’s implicated in PTSD, though PTSD seems to involve broader areas of brain damage brought on by traumatic events.

Where that needs tweaking is that there’s also a third fear channel, the instructed or cognitive fear channel. This allows us to become scared if we’re told that there’s a tiger behind a door, even if we haven’t seen the fearsome beast. This one relies on an interaction between the hippocampus and the amygdala; if your hippocampus is damaged, you will likely not remember what you’re told, whereas if your amygdala is damaged, you may react appropriately to instruction, but you might not feel the appropriate emotional response to your situation (which could lead you to make poor choices).

So, anyway, that’s the gist. But, in the spirit of Check Your Work, let me show my work from my conversation with Jim.

Ok, I have an answer for you (description based on [Gazzaniga et al 2002], though I found similar information in [Lewis et al 2010]).

There are two fear channels: one involving fast sensory processing and one involving slower perceptual information. Based on the work of LeDoux [1996] these are sometimes called the “low road” (quick and dirty connection of the thalamus to the amygdala, a crude signal that a stimulus resembles a conditioned stimulus) and the “high road” (thalamus to sensory cortex to amygdala, a more refined signal which is more reliable); both of these channels help humans learn implicit conditioned fear responses to stimuli.

This “low road” and “high road” concept was what my understanding of PTSD is based on, that individuals acquire a fast low-road response to stimuli that they cannot readily suppress; I don’t have a reference for you, but I’ve heard it many times (and it’s memorably portrayed in Born on the Fourth of July when veterans in a parade react to firecrackers with flinches, and later the protagonist after his experience has the same reaction). A little research seems to indicate that PTSD may actually involve events traumatic enough to damage the amygdala or hippocampus or both, but likely involving other brain areas as well ([Bremner 2006], [Chen et al 2012]).

There’s a couple more wrinkles. Even patients with amygdala damage have unconditioned fear responses; conditioned responses seem to involve the amygdala [Phelps et al 1998]. Instructed fear (warning a subject about a loud noise that will follow a flashing light, for example) seems to involve the hippocampus as well, though patients with amygdala damage don’t show fear responses even though they may behave appropriately when instructed (e.g., not showing a galvanic skin response even though they flinch [Phelps et al 2001]). This amygdala response can influence storage of emotional memories [Ferry et al 2000]. Furthermore, there’s evidence the amygdala is even involved in perceptual processing of emotional expression [Dolan and Morris 2000].

So to sum, the primary reference that I was talking about was the “low road” (fast connection from thalamus to amygdala, implicated in fast conditioned fear responses and PTSD, though PTSD may involve trauma-induced damage to more brain areas) and “high road” (slow reliable connection from thalamus to sensory cortex to amygdala, implicated in conditioned fear responses), but there’s also a “sensory” path (conditioned fear response via the thalamus to the amygdala, with or without the sensory cortex involvement) vs “cognitive” path (instructed fear response via the hippocampus, which functions but shows reduced emotional impact in case of amygdala damage).

Hope this helps!

Bremner, J. D. (2006). Traumatic stress: effects on the brain. Dialogues in clinical neuroscience, 8(4), 445.

Chen, Y., Fu, K., Feng, C., Tang, L., Zhang, J., Huan, Y., … & Ma, C. (2012). Different regional gray matter loss in recent onset PTSD and non PTSD after a single prolonged trauma exposure. PLoS One, 7(11), e48298.

Dolan, R. J., & Morris, J. S. (2000). The functional anatomy of innate and acquired fear: Perspectives from neuroimaging. Cognitive neuroscience of emotion, 225-241.

Ferry, B., Roozendaal, B., & McGaugh, J. L. (1999). Basolateral amygdala noradrenergic influences on memory storage are mediated by an interaction between β-and α1-adrenoceptors. The Journal of Neuroscience, 19(12), 5119-5123.

Gazzaniga, M.S., Ivry, R.B., & Mangun, G.R. (2002) Cognitive Neuroscience – The Biology of the Mind (2e) W. W. Norton & Company.

LeDoux, J. (1998). The emotional brain: The mysterious underpinnings of emotional life. Simon and Schuster.

Lewis, M., Haviland-Jones, J. M., & Barrett, L. F. (Eds.). (2010). Handbook of emotions. Guilford Press.

Phelps, E. A., LaBar, K. S., Anderson, A. K., O’connor, K. J., Fulbright, R. K., & Spencer, D. D. (1998). Specifying the contributions of the human amygdala to emotional memory: A case study. Neurocase, 4(6), 527-540.

Phelps, E. A., O’Connor, K. J., Gatenby, J. C., Gore, J. C., Grillon, C., & Davis, M. (2001). Activation of the left amygdala to a cognitive representation of fear. Nature neuroscience, 4(4), 437-441.

-the Centaur
Pictured: a few of the books I looked at to answer Jim’s question.

“Sibling Rivalry” returning to print


Wow. After nearly 21 years, my first published short story, “Sibling Rivalry”, is returning to print. Originally an experiment to try out an idea I wanted to use for a longer novel, ALGORITHMIC MURDER, I quickly found that I’d caught a live wire with “Sibling Rivalry”, which was my first sale to The Leading Edge magazine back in 1995.

“Sibling Rivalry” was borne of frustrations I had as a graduate student in artificial intelligence (AI) watching shows like Star Trek which Captain Kirk talks a computer to death. No-one talks anyone to death outside of a Hannibal Lecter movie or a bad comic book, much less in real life, and there’s no reason to believe feeding a paradox to an AI will make it explode.

But there are ways to beat one, depending on how they’re constructed – and the more you know about them, the more potential routes there are for attack. That doesn’t mean you’ll win, of course, but … if you want to know, you’ll have to wait for the story to come out.

“Sibling Rivalry” will be the second book in Thinking Ink Press’s Snapbook line, with another awesome cover by my wife Sandi Billingsley, interior design by Betsy Miller and comments by my friends Jim Davies and Kenny Moorman, the latter of whom uses “Sibling Rivalry” to teach AI in his college courses. Wow! I’m honored.

Our preview release will be at the Beyond the Fence launch party next week, with a full release to follow.

Watch this space, fellow adventurers!

-the Centaur

All the Transitions of Tic-Tac-Toe, Redux

What was supposed to be a quick exercise to help me visualize a reinforcement learning problem has turned into a much larger project, one which I’m reluctantly calling a temporary halt to: a visualization of all the states of Tic-Tac-Toe.

What I found is that it’s surprisingly hard to make this work: all the states want to pile on top of each other, and there are a few subtleties to representing it correctly. To make it work, I had to separately represent board positions – the typical X’es and Oh’s used in play – from game states, such as Start, X Wins, O Wins, and Stalemate.

The Mathematica for this is gnarly and a total hack; it probably could be made more efficient to process all 17,000+ transitions of the game, and I definitely need to think of a way to make each state appear in its own, non-overlapping position. But that will require more thought than my crude jitter function above, the time it takes to run each render is way too long to quickly iterate, and I have a novel to finish. I don’t want to get stuck in a grind against a game known for its stalemate.

Ugh. You can see the jumble there; it’s hard to see which transitions lead to X’s or O’s victory and which lead to stalemate. I have ideas on how to fix this, but I want my novel done more and first, dag nab it. So let me give you all the transitions of Tic-Tac-Toe in their full glory (22.8mb). I could say more about this problem – or I can say what I have, call it victory, and move on.

On to the novel. It’s going well.

-the Centaur