Press "Enter" to skip to content

Posts published in “Research”

Posts and pages about my scientific research and the scientific topics I’m interested in.

do, or do not. there is no blog

centaur 0

One reason blogging suffers for me is that I always prioritize doing over blogging. That sounds cool and all, but it's actually just another excuse. There's always something more important than doing your laundry ... until you run out of underwear. Blogging has no such hard failure mode, so it's even easier to fall out of the habit. But the reality is, just like laundry, if you set aside a little time for it, you can stay ahead - and you'll feel much healthier and more comfortable if you do.

-the Centaur

Pictured: "Now That's A Steak Burger", a 1-pound monster from Willard Hicks, where I took a break from my million other tasks to catch up on Plans and the Structure of Behavior, the book that introduced idea of the test-operate-test-exit (TOTE) loop as a means for organizing behavior, a device I'm finding useful as I delve into the new field of large language model planning.

What is “Understanding”?

taidoka 1
When I was growing up - or at least when I was a young graduate student in a Schankian research lab - we were all focused on understanding: what did it mean, scientifically speaking, for a person to understand something, and could that be recreated on a computer? We all sort of knew it was what we'd call nowadays an ill-posed problem, but we had a good operational definition, or at least an operational counterexample: if a computer read a story and could not answer the questions that a typical human being could answer about that story, it didn't understand it at all. But there are at least two ways to define a word. What I'll call a practical definition is what a semanticist might call the denotation of a word: a narrow definition, one which you might find in a dictionary, which clearly specifies the meaning of the concept, like a bachelor being an unmarried man. What I'll call a philosophical definition, the connotations of a word, are the vast web of meanings around the core concept, the source of the fine sense of unrightness that one gets from describing Pope Francis as a bachelor, the nuances of meaning embedded in words that Socrates spent his time pulling out of people, before they went and killed him for being annoying. It's those connotations of "understanding" that made all us Schankians very leery of saying our computer programs fully "understood" anything, even as we were pursuing computer understanding as our primary research goal. I care a lot about understanding, deep understanding, because, frankly, I cannot effectively do my job of teaching robots to learn if I do not deeply understand robots, learning, computers, the machinery surrounding them, and the problem I want to solve; when I do not understand all of these things, I stumble in the dark, I make mistakes, and end up sad. And it's pursuing a deeper understanding about deep learning where I got a deeper insight into deep understanding. I was "deep reading" the Deep Learning book (a practice in which I read, or re-read, a book I've read, working out all the equations in advance before reading the derivations), in particular section 5.8.1 on Principal Components Analysis, and the authors made the same comment I'd just seen in the Hands-On Machine Learning book: "the mean of the samples must be zero prior to applying PCA." Wait, what? Why? I mean, thank you for telling me, I'll be sure to do that, but, like ... why? I didn't follow up on that question right away, because the authors also tossed off an offhand comment like, "XX is the unbiased sample covariance matrix associated with a sample x" and I'm like, what the hell, where did that come from? I had recently read the section on variance and covariance but had no idea why this would be associated with the transpose of the design matrix X multiplied by X itself. (In case you're new to machine learning, if x stands for an example input to a problem, say a list of the pixels of an image represented as a column of numbers, then the design matrix X is all the examples you have, but each example listed as a row. Perfectly not confusing? Great!) So, since I didn't understand why Var[x] = XX, I set out to prove it myself. (Carpenters say, measure twice, cut once, but they'd better have a heck of a lot of measuring and cutting under their belts - moreso, they'd better know when to cut and measure before they start working on your back porch, or you and they will have a bad time. Same with trying to teach robots to learn: it's more than just practice; if you don't know why something works, it will come back to bite you, sooner or later, so, dig in until you get it). And I quickly found that the "covariance matrix of a variable x" was a thing, and quickly started to intuit that the matrix multiplication would produce it. This is what I'd call surface level understanding: going forward from the definitions to obvious conclusions. I knew the definition of matrix multiplication, and I'd just re-read the definition of covariance matrices, so I could see these would fit together. But as I dug into the problem, it struck me: true understanding is more than just going forward from what you know: "The brain does much more than just recollect; it inter-compares, it synthesizes, it analyzes, it generates abstractions" - thank you, Carl Sagan. But this kind of understanding is a vast, ill-posed problem - meaning, a problem without a unique and unambiguous solution. But as I was continuing to dig through the problem, reading through the sections I'd just read on "sample estimators," I had a revelation. (Another aside: "sample estimators" use the data you have to predict data you don't, like estimating the height of males in North America from a random sample of guys across the country; "unbiased estimators" may be wrong but their errors are grouped around the true value). The formula for the unbiased sample estimator for the variance actually doesn't look quite the matrix transpose - but it depends on the unbiased estimator of sample mean. Suddenly, I felt that I understood why PCA data had to have a mean of 0. Not driving forward from known facts and connecting their inevitable conclusions, but driving backwards from known facts to hypothesize a connection which I could explore and see. I even briefly wrote a draft of the ideas behind this essay - then set out to prove what I thought I'd seen. Setting the mean of the samples to zero made the sample mean drop out of sample variance - and then the matrix multiplication formula dropped out. Then I knew I understood why PCA data had to have a mean of 0 - or how to rework PCA to deal with data which had a nonzero mean. This I'd call deep understanding: reasoning backwards from what we know to provide reasons for why things are the way they are. A recent book on science I read said that some regularities, like the length of the day, may be predictive, but other regularities, like the tides, cry out for explanation. And once you understand Newton's laws of motion and gravitation, the mystery of the tides is readily solved - the answer falls out of inertia, angular momentum, and gravitational gradients. With apologies to Larry Niven, of course a species that understands gravity will be able to predict tides. The brain does do more than just remember and predict to guide our next actions: it builds structures that help us understand the world on a deeper level, teasing out rules and regularities that help us not just plan, but strategize. Detective Benoit Blanc from the movie Knives Out claimed to "anticipate the terminus of gravity's rainbow" to help him solve crimes; realizing how gravity makes projectiles arc, using that to understand why the trajectory must be the observed parabola, and strolling to the target. So I'd argue that true understanding is not just forward-deriving inferences from known rules, but also backward-deriving causes that can explain behavior. And this means computing the inverse of whatever forward prediction matrix you have, which is a more difficult and challenging problem, because that matrix may have a well-defined inverse. So true understanding is indeed a deep and interesting problem! But, even if we teach our computers to understand this way ... I suspect that this won't exhaust what we need to understand about understanding. For example: the dictionary definitions I've looked up don't mention it, but the idea of seeking a root cause seems embedded in the word "under - standing" itself ... which makes me suspect that the other half of the word, standing, itself might hint at the stability, the reliability of the inferences we need to be able to make to truly understand anything. I don't think we've reached that level of understanding of understanding yet. -the Centaur Pictured: Me working on a problem in a bookstore. Probably not this one.

Robots in Montreal

centaur 1
A cool hotel in old Montreal.

"Robots in Montreal," eh? Sounds like the title of a Steven Moffat Doctor Who episode. But it's really ICRA 2019 - the IEEE Conference on Robotics and Automation, and, yes, there are quite a few robots!

Boston Dynamics quadruped robot with arm and another quadruped.

My team presented our work on evolutionary learning of rewards for deep reinforcement learning, AutoRL, on Monday. In an hour or so, I'll be giving a keynote on "Systematizing Robot Navigation with AutoRL":

Keynote: Dr. Anthony Francis
Systematizing Robot Navigation with AutoRL: Evolving Better Policies with Better Evaluation

Abstract: Rigorous scientific evaluation of robot control methods helps the field progress towards better solutions, but deploying methods on robots requires its own kind of rigor. A systematic approach to deployment can do more than just make robots safer, more reliable, and more debuggable; with appropriate machine learning support, it can also improve robot control algorithms themselves. In this talk, we describe our evolutionary reward learning framework AutoRL and our evaluation framework for navigation tasks, and show how improving evaluation of navigation systems can measurably improve the performance of both our evolutionary learner and the navigation policies that it produces. We hope that this starts a conversation about how robotic deployment and scientific advancement can become better mutually reinforcing partners.

Bio: Dr. Anthony G. Francis, Jr. is a Senior Software Engineer at Google Brain Robotics specializing in reinforcement learning for robot navigation. Previously, he worked on emotional long-term memory for robot pets at Georgia Tech's PEPE robot pet project, on models of human memory for information retrieval at Enkia Corporation, and on large-scale metadata search and 3D object visualization at Google. He earned his B.S. (1991), M.S. (1996) and Ph.D. (2000) in Computer Science from Georgia Tech, along with a Certificate in Cognitive Science (1999). He and his colleagues won the ICRA 2018 Best Paper Award for Service Robotics for their paper "PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning". He's the author of over a dozen peer-reviewed publications and is an inventor on over a half-dozen patents. He's published over a dozen short stories and four novels, including the EPIC eBook Award-winning Frost Moon; his popular writing on robotics includes articles in the books Star Trek Psychology and Westworld Psychology. as well as a Google AI blog article titled Maybe your computer just needs a hug. He lives in San Jose with his wife and cats, but his heart will always belong in Atlanta. You can find out more about his writing at his website.

Looks like I'm on in 15 minutes! Wish me luck.

-the Centaur

 

Information Hygiene

centaur 0

Our world is big. Big, and complicated, filled with many more things than any one person can know. We rely on each other to find out things beyond our individual capacities and to share them so we can succeed as a species: there's water over the next hill, hard red berries are poisonous, and the man in the trading village called Honest Sam is not to be trusted.

To survive, we must constantly take information, just as we must eat to live. But just like eating, consuming information indiscriminately can make us sick. Even when we eat good food, we must clean our teeth and got to the bathroom - and bad food should be avoided. In the same way, we have to digest information to make it useful, we need to discard information that's no longer relevant, and we need to avoid misinformation so we don't pick up false beliefs. We need habits of information hygiene.

Whenever you listen to someone, you absorb some of their thought process and make it your own. You can't help it: that the purpose of language, and that's what understanding someone means. The downside is your brain is a mess of different overlapping modules all working together, and not all of them can distinguish between what's logically true and false. This means learning about the beliefs of someone you violently disagree with can make you start to believe in them, even if you consciously think they're wrong. One acquaintance I knew started studying a religion with the intent of exposing it. He thought it was a cult, and his opinion about that never changed. But at one point, he found himself starting to believe what he read, even though, then and now, he found their beliefs logically ridiculous.

This doesn't mean we need to shut out information from people we disagree with - but it does mean we can't uncritically accept information from people we agree with. You are the easiest person for yourself to fool: we have a cognitive flaw called confirmation bias which makes us more willing to accept information that confirms our prior beliefs rather than ones that deny it. Another flaw called cognitive dissonance makes us want to actively resolve conflicts between our beliefs and new information, leading to a rush of relief when they are reconciled; combined with confirmation bias, people's beliefs can actually be strengthened by contradictory information.

So, as an exercise in information hygiene for those involved in one of those charged political conversations that dominate our modern landscape, try this. Take one piece of information that you've gotten from a trusted source, and ask yourself: how might this be wrong? Take one piece of information from an untrusted source, and ask yourself, how might this be right? Then take it one step further: research those chinks in your armor, or those sparks of light in your opponent's darkness, and see if you can find evidence pro or con. Try to keep an open mind: no-one's asking you to actually change your mind, just to see if you can tell whether the situation is actually as black and white as you thought.

-the Centaur

Pictured: the book pile, containing some books I'm reading to answer a skeptical friend's questions, and other books for my own interest.

Learning to Drive … by Learning Where You Can Drive

centaur 1
I often say "I teach robots to learn," but what does that mean, exactly? Well, now that one of the projects that I've worked on has been announced - and I mean, not just on arXiv, the public access scientific repository where all the hottest reinforcement learning papers are shared, but actually, accepted into the ICRA 2018 conference - I  can tell you all about it! When I'm not roaming the corridors hammering infrastructure bugs, I'm trying to teach robots to roam those corridors - a problem we call robot navigation. Our team's latest idea combines "traditional planning," where the robot tries to navigate based on an explicit model of its surroundings, with "reinforcement learning," where the robot learns from feedback on its performance. For those not in the know, "traditional" robotic planners use structures like graphs to plan routes, much in the same way that a GPS uses a roadmap. One of the more popular methods for long-range planning are probabilistic roadmaps, which build a long-range graph by picking random points and attempting to connect them by a simpler "local planner" that knows how to navigate shorter distances. It's a little like how you learn to drive in your neighborhood - starting from landmarks you know, you navigate to nearby points, gradually building up a map in your head of what connects to what. But for that to work, you have to know how to drive, and that's where the local planner comes in. Building a local planner is simple in theory - you can write one for a toy world in a few dozen lines of code - but difficult in practice, and making one that works on a real robot is quite the challenge. These software systems are called "navigation stacks" and can contain dozens of components - and in my experience they're hard to get working and even when you do, they're often brittle, requiring many engineer-months to transfer to new domains or even just to new buildings. People are much more flexible, learning from their mistakes, and the science of making robots learn from their mistakes is reinforcement learning, in which an agent learns a policy for choosing actions by simply trying them, favoring actions that lead to success and suppressing ones that lead to failure. Our team built a deep reinforcement learning approach to local planning, using a state-of-the art algorithm called DDPG (Deep Deterministic Policy Gradients) pioneered by DeepMind to learn a navigation system that could successfully travel several meters in office-like environments. But there's a further wrinkle: the so-called "reality gap". By necessity, the local planner used by a probablistic roadmap is simulated - attempting to connect points on a map. That simulated local planner isn't identical to the real-world navigation stack running on the robot, so sometimes the robot thinks it can go somewhere on a map which it can't navigate safely in the real world. This can have disastrous consequences - causing robots to tumble down stairs, or, worse, when people follow their GPSes too closely without looking where they're going, causing cars to tumble off the end of a bridge. Our approach, PRM-RL, directly combats the reality gap by combining probabilistic roadmaps with deep reinforcement learning. By necessity, reinforcement learning navigation systems are trained in simulation and tested in the real world. PRM-RL uses a deep reinforcement learning system as both the probabilistic roadmap's local planner and the robot's navigation system. Because links are added to the roadmap only if the reinforcement learning local controller can traverse them, the agent has a better chance of attempting to execute its plans in the real world. In simulation, our agent could traverse hundreds of meters using the PRM-RL approach, doing much better than a "straight-line" local planner which was our default alternative. While I didn't happen to have in my back pocket a hundred-meter-wide building instrumented with a mocap rig for our experiments, we were able to test a real robot on a smaller rig and showed that it worked well (no pictures, but you can see the map and the actual trajectories below; while the robot's behavior wasn't as good as we hoped, we debugged that to a networking issue that was adding a delay to commands sent to the robot, and not in our code itself; we'll fix this in a subsequent round). This work includes both our group working on office robot navigation - including Alexandra Faust, Oscar Ramirez, Marek Fiser, Kenneth Oslund, me, and James Davidson - and Alexandra's collaborator Lydia Tapia, with whom she worked on the aerial navigation also reported in the paper.  Until the ICRA version comes out, you can find the preliminary version on arXiv:

https://arxiv.org/abs/1710.03937 PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning

We present PRM-RL, a hierarchical method for long-range navigation task completion that combines sampling-based path planning with reinforcement learning (RL) agents. The RL agents learn short-range, point-to-point navigation policies that capture robot dynamics and task constraints without knowledge of the large-scale topology, while the sampling-based planners provide an approximate map of the space of possible configurations of the robot from which collision-free trajectories feasible for the RL agents can be identified. The same RL agents are used to control the robot under the direction of the planning, enabling long-range navigation. We use the Probabilistic Roadmaps (PRMs) for the sampling-based planner. The RL agents are constructed using feature-based and deep neural net policies in continuous state and action spaces. We evaluate PRM-RL on two navigation tasks with non-trivial robot dynamics: end-to-end differential drive indoor navigation in office environments, and aerial cargo delivery in urban environments with load displacement constraints. These evaluations included both simulated environments and on-robot tests. Our results show improvement in navigation task completion over both RL agents on their own and traditional sampling-based planners. In the indoor navigation task, PRM-RL successfully completes up to 215 meters long trajectories under noisy sensor conditions, and the aerial cargo delivery completes flights over 1000 meters without violating the task constraints in an environment 63 million times larger than used in training.
  So, when I say "I teach robots to learn" ... that's what I do. -the Centaur

My Daily Dragon Interview in Two Words: “Just Write!”

centaur 0
So at Dragon Con I had a reading this year. Yeah, looks like this is the last year I get to bring all my books - too many, to heavy! I read the two flash fiction pieces in Jagged Fragments, "If Looks Could Kill" and "The Secret of the T-Rex's Arms", as well as reading the first chapter of Jeremiah Willstone and the Clockwork Time Machine, a bit of my and Jim Davies' essay on the psychology of Star Trek's artificial intelligences, and even a bit of my very first published story, "Sibling Rivalry". I also gave the presentation I was supposed to give at the SAM Talks before I realized I was double booked; that was "Risk Getting Worse". But that wasn't recorded, so, oh dang, you'll have to either go to my Amazon page to get my books, or wait until we get "Risk Getting Worse" recorded. But my interview with Nancy Northcott for the Daily Dragon, "Robots, Computers, and Magic", however, IS online, so I can share it with you all. Even more so, I want to share what I think is the most important part of my interview:
DD: Do you have any one bit of advice for aspiring writers? AF: Write. Just write. Don’t worry about perfection, or getting published, or even about pleasing anyone else: just write. Write to the end of what you start, and only then worry about what to do with it. In fact, don’t even worry about finishing everything—don’t be afraid to try anything. Artists know they need to fill a sketchbook before sitting down to create a masterwork, but writers sometimes get trapped trying to polish their first inspiration into a final product. Don’t get trapped on the first hill! Whip out your notebook and write. Write morning pages. Write diary at the end of the day. Write a thousand starts to stories, and if one takes flight, run with it with all the abandon you have in you. Accept all writing, especially your own. Just write. Write.
That's it. To read more, check out the interview here, or see all my Daily Dragon mentions at Dragon Con here, or check out my interviewer Nancy Northcott's site here. Onward! -the Centaur    

The Centaur at Clockwork Alchemy

centaur 0

20160528_165746.jpg

This Memorial Day Weekend, I’ll be appearing at the Clockwork Alchemy steampunk convention! I’m on a whole passel of panels this year, including the following (all in the Monterey room near the Author’s Alley, as far as I know):

Friday, May 26
4PM: NaNoWriMo - Beat the Clock! [Panelist]

Saturday, May 27
12NOON: Working with Editors [Panelist]
1PM: The Science of Airships [Presenter]
5PM: Versimilitude in Fiction [Panelist]

Sunday, May 28
10AM: Applied Plotonium [Panelist]
12NOON: Organizing an Anthology [Panelist]
1PM: Instill Caring in Readers [Panelist]
2PM: Overcoming Writer's Block [Presenter]

Monday, May 29
11AM: Past, Present, Future - Other! [Moderator]

Of course, if you don’t want to hear me yap, there are all sorts of other reasons to be there. Many great authors will be in attendance in the Author’s Alley:

20160527_155633.jpg

There’s a great dealer’s room and a wonderful art show filled with steampunk maker art:

20160530_120714.jpg

For yet another more year, we’ll be co-hosted with Fanime Con, so there will be buses back and forth and fans of both anime and steampunk in attendance:

20160529_155429.jpg

As usual, I will have all my latest releases, including Jeremiah Willstone and the Clockwork Time Machine, the steampunk novel I have like been promising you all like for ever!

20160528_165628.jpg

In addition to my fine books, there will also be new titles from Thinking Ink Press, including the steampunk anthologies TWELVE HOURS LATER, THIRTY DAYS LATER, and SOME TIME LATER!

20160529_111928.jpg

I think I have about as much fun at Clockwork Alchemy as I do at Dragon Con, and that’s saying something. So I hope you come join us, fellow adventurers, in celebrating all things steampunk!

20160529_190525.jpg

-the Centaur

Viiictory the Fifteenth

centaur 0

Print

Once again, I’ve completed the challenge of writing 50,000 words in a month as part of the National Novel Writing Month challenges - this time, the July 2016 Camp Nanowrimo, and the next 50,000 words of Dakota Frost #5, PHANTOM SILVER!


Phantom Silver v2 Small.png

This is the reason that I’ve been so far behind on posting on my blog - I simultaneously was working on four projects: edits on THE CLOCKWORK TIME MACHINE, writing PHANTOM SILVER, doing publishing work for Thinking Ink Press, and doing my part at work-work to help bring about the robot apocalypse (it’s busy work, let me tell you). So busy that I didn’t even blog successfully getting TCTM back to the editor. Add to that a much needed old-friends recharge trip to Tahoe kicking off the month, and I ended up more behind than I’ve ever been … at least, as far as I’ve been behind, and still won:

Camp Nano 2016 July 31b.png

What did I learn this time? Well, I can write over 9,000 words a day, though the text often contains more outline than story; I will frequently stop and do GMC (Goal Motivation Conflict) breakdowns of all the characters in the scene and just leave it in the document as paragraphs of italicized notes, because Nano - I can take it out later, its word count now now now! That’s how you get five times a normal word count in a day, or 500+ times the least productive day in which I actually wrote something.

Camp Nano 2016 July 31c.png

Also, I get really really really sloppy - normally I wordsmith what I write as I write, even in Nano - but that’s when I have the luxury of writing 1000-2000 words a day. When I have to write 9000, I write things like "I want someoent bo elive this whnen ai Mideone” and just keep going, knowing that I can correct the text later to “I want someone to believe this when I am done,” and, more importantly, can use the idea behind that text to craft a better scene on the next draft (in this case, Dakota’s cameraman Ron is filming a bizarre event in which someone’s life is at stake, and when challenged by a bystander he challenges back, saying that he doesn’t have any useful role to fill, but he can at least document what’s happening so they’ll all be believed later).

Camp Nano 2016 July 31d.png

The other thing is, what I am starting to call The Process actually seems to work. I put characters in situations. I think through how they would react, using Goal Motivation Conflict to pull out what they want, why they want it, and why they can’t get it (a method recommended by my editor Debra Dixon in her GMC book). But the critical part of my Process is, when I have to go write something that I don’t know, I look it up - in a lot of detail. Yes, Virginia, even when I was writing 9,000+ words a day, I still went on Wikipedia - and I don’t regret it. Why? Because when I’m spewing around trying to make characters react like they’re in a play, the characters are just emoting, and the beats, no matter how well motivated, could get replaced by something else.

2209942304_e9f94d213a_b.jpg

But when it strikes me that the place my characters area about visit looks like a basilica, I can do more than just write “basilica.” I can ask myself why I chose that word. I can look up the word “basilica” on Apple’s Dictionary app. I can drill through to famous basilicas like the Basilica of Saint Peter. I can think about how this place will be different from that, and start pulling out telling details. I can start to craft a space, to create staging, to create an environment that my characters can react to. Because emotions aren’t just inside us, or between us; they’re for something, for navigating this complex world with other humans at our side. If a group of people argues, no matter how charged, it’s just a soap opera. Put them in their own Germanic/Appalachian heritage family kitchen in the Dark Corner of South Carolina, on on the meditation path near an onsen run continuously by the same family for 42 generations, and the same argument can have a completely different ambiance - and completely different reactions.

The text I wrote using my characters reacting to the past plot, or even with GMC, may likely need a lot of tweaking: the point was to get them to a particular emotional, conceptual or plot space. The text I wrote with the characters reacting to things that were real, even if it needs tweaking, often crackles off the page, even in very rough form. It’s material I won’t want to lose - more importantly, material I wouldn’t have produced, if I hadn’t pushed myself to do National Novel Writing Month.

Up next, finishing a few notes and ideas - the book is very close to done - and then diving into contracts for Thinking Ink Press, and reinforcement learning policy gradients for the robot apocalypse, all while waiting for the shoe to drop on TCTM. Keep your fingers crossed that the book is indeed on its way out!

-the Centaur