Let me completely up front about my motivation for writing this post: recently, I came across a paper which was similar to the work in my PhD thesis, but applied to a different area. The paper didn’t cite my work – in fact, its survey of related work in the area seemed to indicate that no prior work along the lines of mine existed – and when I alerted the authors to the omission, they informed me they’d cited all relevant work, and claimed “my obscure dissertation probably wasn’t relevant.” Clearly, I haven’t done a good enough job articulating or promoting my work, so I thought I should take a moment to explain what I did for my doctoral dissertation.
My research improved computer memory by modeling it after human memory. People remember different things in different contexts based on how different pieces of information are connected to one another. Even a word as simple as ‘ford’ can call different things to mind depending on whether you’ve bought a popular brand of car, watched the credits of an Indiana Jones movie, or tried to cross the shallow part of a river. Based on that human phenomenon, I built a memory retrieval engine that used context to remember relevant things more quickly.
My approach was based on a technique I called context directed spreading activation, which I argued was an advance over so-called “traditional” spreading activation. Spreading activation is a technique for finding information in a kind of computer memory called semantic networks, which model relationships in the human mind. A semantic network represents knowledge as a graph, with concepts as nodes and relationships between concepts as links, and traditional spreading activation finds information in that network by starting with a set of “query” nodes and propagating “activation” out on the links, like current in an electric circuit. The current that hits each node in the network determines how highly ranked the node is for a query. (If you understand circuits and spreading activation, and this description caused you to catch on fire, my apologies. I’ll be more precise in future blogposts. Roll with it).
The problem is, as semantic networks grow large, there’s a heck of a lot of activation to propagate. My approach, context directed spreading activation (CDSA), cuts this cost dramatically by making activation propagate over fewer types of links. In CDSA, each link has a type, each type has a node, and activation propagates only over links whose nodes are active (to a very rough first approximation, although in my evaluations I tested about every variant of this under the sun). Propagating over active links isn’t just cheaper than spreading activation over every link; it’s smarter: the same “query” nodes can activate different parts of the network, depending on which “context” nodes are active. So, if you design your network right, Harrison Ford is never going to occur to you if you’ve been thinking about cars.
I was a typical graduate student, and I thought my approach was so good, it was good for everything—so I built an entire cognitive architecture around the idea. (Cognitive architectures are general reasoning systems, normally built by teams of researchers, and building even a small one is part of the reason my PhD thesis took ten years, but I digress.) My cognitive architecture was called context sensitive asynchronous memory (CSAM), and it automatically collected context while the system was thinking, fed it into the context-directed spreading activation system, and incorporated dynamically remembered information into its ongoing thought processes using patch programs called integration mechanisms.
CSAM wasn’t just an idea: I built it out into a computer program called Nicole, and even published a workshop paper on it in 1997 called “Can Your Architecture Do This? A Proposal for Impasse-Driven Asynchronous Memory Retrieval and Integration.” But to get a PhD in artificial intelligence, you need more than a clever idea you’ve written up in a paper or implemented in a computer program. You need to use the program you’ve written to answer a scientific question. You need to show that your system works in the domains you claim it works in, that it can solve the problems that you claim it can solve, and that it’s better than other approaches, if other approaches exist.
So I tested Nicole on computer planning systems and showed that integration mechanisms worked. Then I and a colleague tested Nicole on a natural language understanding program and showed that memory retrieval worked. But the most important part was showing that CDSA, the heart of the theory, didn’t just work, but was better than the alternatives. I did a detailed analysis of the theory of CDSA and showed it was better than traditional spreading activation in several ways—but that rightly wasn’t enough for my committee. They wanted an example. There were alternatives to my approach, and they wanted to see that my approach was better than the alternatives for real problems.
So I turned Nicole into an information retrieval system called IRIA—the Information Retrieval Intelligent Assistant. By this time, the dot-com boom was in full swing, and my thesis advisor invited me and another graduate student to join him starting a company called Enkia. We tried many different concepts to start with, but the further we went, the more IRIA seemed to have legs. We showed she could recommend useful information to people while browsing the Internet. We showed several people could use her at the same time and get useful feedback. And critically, we showed that by using context-directed spreading activation, IRIA could retrieve better information faster than traditional spreading activation approaches.
The first publication on IRIA came out in 2000, shortly before I got my PhD thesis, and at the company things were going gangbusters. We found customers for the idea, my more experienced colleagues and I turned the IRIA program from a typical graduate student mess into a more disciplined and efficient system called the Enkion, a process we documented in a paper in early 2001. We even launched a search site called Search Orbit—and then the whole dot-com disaster happened, and the company essentially imploded. Actually, that’s not fair: the company continued for many years after I left—but I essentially imploded, and if you want to know more about that, read “Approaching 33, as Seen from 44.”
Regardless, the upshot is that I didn’t follow up on my thesis work after I finished my PhD. That happens to a lot of PhD students, but for me in particular I felt that it would have been betraying the trust of my colleagues to go publish a sequence of papers on the innards of a program they were trying to use to run their business. Eventually, they moved on to new software, but by that time, so had I.
Fast forward to 2012, and while researching an unrelated problem for The Search Engine That Starts With A G, I came across the 2006 paper “Recommending in context: A spreading activation model that is independent of the type of recommender system and its contents” by Alexander Kovács and Haruki Ueno. At Enkia, we’d thought of doing recommender systems on top of the Enkion, and had even started to build a prototype for Emory University, but the idea never took off and we never generated any publications, so at first, I was pleased to see someone doing spreading activation work in recommender systems.
Then I was unnerved to see that this approach also involved spreading activation, over a typed network, with nodes representing the types of links, and activation in the type nodes changing the way activation propagated over the links. Then I was unsettled to see that my work, which is based on a similar idea and predates their publication by almost a decade, was not cited in the paper. Then I was actually disturbed when I read: “The details of spreading activation networks in the literature differ considerably. However, they’re all equal with respect to how they handle context … context nodes do not modulate links at all…” If you were to take that at face value, the work that I did over ten years of my life—work which produced four papers, a PhD thesis, and at one point helped employ thirty people—did not exist.
Now, I was also surprised by some spooky similarities between their systems and mine—their system is built on a context-directed spreading activation model, mine is a context-directed spreading activation model, theirs is called CASAN, mine is embedded in a system called CSAM—but as far as I can see there’s NO evidence that their work was derivative of mine. As Chris Atkinson said to a friend of mine (paraphrased): “The great beam of intelligence is more like a shotgun: good ideas land on lots of people all over the world—not just on you.”
In fact, I’d argue that their work is a real advance to the field. Their model is similar, not identical, and their mathematical formalism uses more contemporary matrix algebra, making the relationship to related approaches like Page Rank more clear (see Google Page Rank and Beyond). Plus, they apparently got their approach to work on recommender systems, which we did not; IRIA did more straight up recommendation of information in traditional information retrieval, which is a similar but not identical problem.
So Kovács and Ueno’s “Recommending in Context” paper is a great paper and you should read it if you’re into this kind of stuff. But, to set the record straight, and maybe to be a little bit petty, there are a number of spreading activation systems that do use context to modulate links in the network … most notably mine.
Pictured: a tiny chunk of the WordNet online dictionary, which I’m using as a proxy of a semantic network. Data processing by me in Python, graph representation by the GraphViz suite’s dot program, and postprocessing by me in Adobe Photoshop.