Press "Enter" to skip to content

Posts published in “Startuppery”

Never Give Up

centaur 0

So, 2019. What a mess. More on that later; as for me, I've had neither the time nor even the capability to blog for a while. But one thing I've noticed is, at least for me, the point at which I want to give up is usually just prior to the point where I could have my big breakthrough.

For example: Scrivener.

I had just about given up on Scrivener, an otherwise great program for writers that helps with organizing notes, writing screenplays, and even for comic book scripts. But I'd become used to Google Docs and its keyboard shortcuts for hierarchical bulleted lists, not entirely different from my prior life using hierarchical notebook programs like GoldenSection Notes. But Scrivener's keyboard shortcuts were all different, and the menus didn't seem to support what I needed, so I had started trying alternatives. Then I gave on more shot at going through the manual, which had earlier got me nothing.At first this looked like a lost cause: Scrivener depended on Mac OS X's text widgets, which themselves implement a nonstandard text interface (fanboys, shut up, sit down: you're overruled. case in point: Home and End. I rest my case), and worse, depend on the OS even for the keyboard shortcuts, which require the exact menu item. But the menu item for list bullets actually was literally a bullet, which normally isn't a text character in most programs; you can't access it. But as it turns out, in Scrivener, you can. I was able to insert a bullet, find the bullet character, and even create a keyboard shortcut for it. And it did what it was supposed to!

Soon I found the other items I needed to fill out the interface that I'd come to know and love in Google Docs for increasing/decreasing the list bullet indention on the fly while organizing a list:

Eventually I was able to recreate the whole interface and was so happy I wrote a list describing it in the middle of the deep learning Scrivener notebook that I had been working on when I hit the snag that made me go down this rabbit hole (namely, wanting to create a bullet list):

Writing this paragraph itself required figuring out how to insert symbols for control characters in Mac OS X, but whatever: a solution was possible, even ready to be found, just when I was ready to give up.

I found the same thing with so many things recently: stuck photo uploads on Google Photos, configuration problems on various publishing programs, even solving an issue with the math for a paper submission at work.

I suspect this is everywhere. It's a known thing in mathematics that when you feel close to a solution you may be far from it; I often find myself that the solution is to be found just after the point you want to give up.

I've written about a related phenomenon called this "working a little bit harder than you want to" but this is slightly different: it's the idea that your judgment that you've exhausted your options is just that, a judgment.

It may be true.

Try looking just a bit harder for that answer.

-the Centaur

Pictured: a photo of the Greenville airport over Christmas, which finally uploaded today when I went back through the archives of Google Photos on my phone and manually stopped a stuck upload from December 19th.

Learning to Drive … by Learning Where You Can Drive

centaur 0

I often say "I teach robots to learn," but what does that mean, exactly? Well, now that one of the projects that I've worked on has been announced - and I mean, not just on arXiv, the public access scientific repository where all the hottest reinforcement learning papers are shared, but actually, accepted into the ICRA 2018 conference - I  can tell you all about it!

When I'm not roaming the corridors hammering infrastructure bugs, I'm trying to teach robots to roam those corridors - a problem we call robot navigation. Our team's latest idea combines "traditional planning," where the robot tries to navigate based on an explicit model of its surroundings, with "reinforcement learning," where the robot learns from feedback on its performance.

For those not in the know, "traditional" robotic planners use structures like graphs to plan routes, much in the same way that a GPS uses a roadmap. One of the more popular methods for long-range planning are probabilistic roadmaps, which build a long-range graph by picking random points and attempting to connect them by a simpler "local planner" that knows how to navigate shorter distances. It's a little like how you learn to drive in your neighborhood - starting from landmarks you know, you navigate to nearby points, gradually building up a map in your head of what connects to what.

But for that to work, you have to know how to drive, and that's where the local planner comes in. Building a local planner is simple in theory - you can write one for a toy world in a few dozen lines of code - but difficult in practice, and making one that works on a real robot is quite the challenge. These software systems are called "navigation stacks" and can contain dozens of components - and in my experience they're hard to get working and even when you do, they're often brittle, requiring many engineer-months to transfer to new domains or even just to new buildings.

People are much more flexible, learning from their mistakes, and the science of making robots learn from their mistakes is reinforcement learning, in which an agent learns a policy for choosing actions by simply trying them, favoring actions that lead to success and suppressing ones that lead to failure. Our team built a deep reinforcement learning approach to local planning, using a state-of-the art algorithm called DDPG (Deep Deterministic Policy Gradients) pioneered by DeepMind to learn a navigation system that could successfully travel several meters in office-like environments.

But there's a further wrinkle: the so-called "reality gap". By necessity, the local planner used by a probablistic roadmap is simulated - attempting to connect points on a map. That simulated local planner isn't identical to the real-world navigation stack running on the robot, so sometimes the robot thinks it can go somewhere on a map which it can't navigate safely in the real world. This can have disastrous consequences - causing robots to tumble down stairs, or, worse, when people follow their GPSes too closely without looking where they're going, causing cars to tumble off the end of a bridge.

Our approach, PRM-RL, directly combats the reality gap by combining probabilistic roadmaps with deep reinforcement learning. By necessity, reinforcement learning navigation systems are trained in simulation and tested in the real world. PRM-RL uses a deep reinforcement learning system as both the probabilistic roadmap's local planner and the robot's navigation system. Because links are added to the roadmap only if the reinforcement learning local controller can traverse them, the agent has a better chance of attempting to execute its plans in the real world.

In simulation, our agent could traverse hundreds of meters using the PRM-RL approach, doing much better than a "straight-line" local planner which was our default alternative. While I didn't happen to have in my back pocket a hundred-meter-wide building instrumented with a mocap rig for our experiments, we were able to test a real robot on a smaller rig and showed that it worked well (no pictures, but you can see the map and the actual trajectories below; while the robot's behavior wasn't as good as we hoped, we debugged that to a networking issue that was adding a delay to commands sent to the robot, and not in our code itself; we'll fix this in a subsequent round).

This work includes both our group working on office robot navigation - including Alexandra Faust, Oscar Ramirez, Marek Fiser, Kenneth Oslund, me, and James Davidson - and Alexandra's collaborator Lydia Tapia, with whom she worked on the aerial navigation also reported in the paper.  Until the ICRA version comes out, you can find the preliminary version on arXiv:

https://arxiv.org/abs/1710.03937
PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning

We present PRM-RL, a hierarchical method for long-range navigation task completion that combines sampling-based path planning with reinforcement learning (RL) agents. The RL agents learn short-range, point-to-point navigation policies that capture robot dynamics and task constraints without knowledge of the large-scale topology, while the sampling-based planners provide an approximate map of the space of possible configurations of the robot from which collision-free trajectories feasible for the RL agents can be identified. The same RL agents are used to control the robot under the direction of the planning, enabling long-range navigation. We use the Probabilistic Roadmaps (PRMs) for the sampling-based planner. The RL agents are constructed using feature-based and deep neural net policies in continuous state and action spaces. We evaluate PRM-RL on two navigation tasks with non-trivial robot dynamics: end-to-end differential drive indoor navigation in office environments, and aerial cargo delivery in urban environments with load displacement constraints. These evaluations included both simulated environments and on-robot tests. Our results show improvement in navigation task completion over both RL agents on their own and traditional sampling-based planners. In the indoor navigation task, PRM-RL successfully completes up to 215 meters long trajectories under noisy sensor conditions, and the aerial cargo delivery completes flights over 1000 meters without violating the task constraints in an environment 63 million times larger than used in training.

 

So, when I say "I teach robots to learn" ... that's what I do.

-the Centaur

The Yearly Reboot

centaur 0

So one of the things I like to do each year, as part of my traditional visit to family over the holidays, is to drop in on a Panera Bread, pull out my notebook, review my plans for the previous year, and make plans for the new one.

As of the 7th of January, I still haven't done this yet.

Shit happened last year. Good shit, such as really getting serious about teaching robots to learn; bad shit, such as serious illnesses in the pets in our family; and ugly shit which I'm not going to talk about until the final contracts are signed and everyone agrees everything is hunky and dory. And much of this went down just before the holidays, and once the holidays started, I cared a lot more about spending time with family and friends than sitting by myself in a Panera. (In all fairness, the holidays were easier when I lived in Atlanta and came up to see family many times a year, as opposed to only occasionally).

But I can recommend trying to do a yearly review. One year I decided to list what I wanted to do, both in the immediate future, in the coming year, in the coming 5 years, and in my life; and the next year, almost by chance, I sat down in the same Panera to review it. That served me well for more than a decade, and I find that even trying to do it helps me feel more focused and refreshed.

And so that's precisely what I tried to do yesterday. I didn't accomplish it - I still haven't managed to "clear the thickets" of my TODO lists to get to the actual yearly plan, and I miss being able to take a whole afternoon at Panera doing this - but I did the next best thing, sitting myself down to a nice "reboot" dinner and treating myself to a showing of Star Wars: The Last Jedi.

As someone said (a reference I read recently, but have been unable to find) the very act of doing something daily centers the mind.

Here's to that.

-Anthony

Everything was on fire until earlier today

centaur 0

Not literally; we were far south of the literal fires, which just barely missed the homes of our friends. But so many other things have been going wrong that it felt like things were on fire ... so no posts for a while, sorry.

But tonight, I got to the last chapter of Dakota Frost #6, SPIRITUAL GOLD.

I will likely finish this chapter Saturday.

That makes today a good day.

Time for some cake.

-the Centaur

Pictured: a cat break with Loki. Not how things look right now, but how I feel.