Neural Networks and Genetic Algorithms

The all-new “old-school” approach to artificial intelligence:
Imitating the machinery of mind and life

Neural networks and genetic algorithms tackle problems we don’t know how to solve by using feedback to eliminate error. Neural network behavior emerges from the interaction of many simple units, whereas genetic algorithms evolve complex individual structures; both, however, use an empirical approach, finding solutions by making many small changes to processing structures and validating the solutions against the desired output.

Imitating Mind and Life

Text Box: Topics
Imitating Mind and Life
What are Neural Networks?
What are Neural Networks Good For?
Inspiration: Biological neural systems
Implementation: graph structures of simple processing nodes
Important Types of Neural Networks
Issues in neural networks:
Intellectual History of Neural Networks
Anatomy of Neural Networks
Processing units
State of Activation
Output function
Pattern of Connectivity
Propagation Rule
Activation Rule
Learning Rules
Strengths and Weaknesses of Neural Networks
Strengths of Neural Networks
Limits of Neural Networks
Challenges in Constructing Neural Networks
Genetic Algorithms
Adaptive Systems
The Genetic Algorithm
Implementations of Genetic Algorithms
Hybrid Approaches
Resources
Class Texts
Further Reading
Online
General AI
Neural Networks
Genetic Algorithms

Assignments\
	Essay 1: Critique an AI Movie
	Artificial Intelligence: Ch. 3,4, 5
	Machines Who Think: Ch. 4
	Dive Into Python: Ch. 3
Brains are made of millions of tiny neurons working together in network, so why can’t we use many simple components networked together as a basis for intelligence?  For that matter, the structure of brains evolved over time, so why can’t we evolve the structures of our artificial intelligences? As it turns out, we can.

Neural networks and genetic algorithms are two of AI’s best tools for tackling problems that we don’t know how to solve, but we do know what success looks like.  Each uses techniques inspired by, but not necessarily completely faithful to, natural processes. Each uses feedback from or regularities in the environment to make changes to processing structures that are evaluated empirically by comparing their outputs to desired results.  And this evidence-based processing has enabled each approach to solve problems which are difficult or intractable for traditional knowledge-based or analytical methods to solve.

What are Neural Networks?

Roughly speaking, neural networks are information processors composed of a network of many simple units obeying simple rules that depend only on their own state and the states of their immediate neighbors.  Individually, each of these units is ignorant of the overall problem, but collectively the pattern of their connectivity enables intelligent behavior to emerge as patterns of activation in individual notes in the network.  If the patterns of activation produced are not what is desired, the pattern of connectivity that generated it can be updated to correct the problem.

What are Neural Networks Good For?

·       Pattern Matching — nonlinear function approximators

·         “Traditional” neural networks

·       Unsupervised Learning — detecting regularities

·         Harmony theory

·       Knowledge Discovery — creating new structures

·         Self-organizing maps

·       Constraint Satisfaction — associative memories

·         Hopfield Networks

Inspiration: Biological neural systems

·       Composed of many tree-like cells called neurons

·         Dendrites — input sites

·         Cell body — central body

·         Axons — transmit signals through potential shifts

·         Synapses — where axons and dendrites exchange chemical signals

·       Structured to receive and process information on the world and act upon it

·         Sensor neurons receive input from the outside world

·         Central neurons process information internally

·         Effector neurons affect muscles or other bodily systems

·       Hebbian learning was an early model of how neural systems learn

Implementation: graph structures of simple processing nodes

·       Node types: threshold, sigmoids, winner-take-all

·         Node input as a vector

·         Combination functions: summation, weighted

·         Thresholds, bias and output functions

·         Inhibitory and excitatory connections

·       Connectivity: single layer vs multi-layer, feedforward vs. recurrent vs. fully connected

·         Input representations

·         Hidden layers

·         Output representations

·         Recurrent connections

·       Learning: use a variety of systems which are more or less Hebbian

Important Types of Neural Networks

·       Individual Neurons: Adalines or TLU’s (Threshold Logic Units)

·       Perceptrons: Single-layer neural networks without feedback

·       Neural Networks: Hidden Layer Feedforward Networks with Backpropagation

·       Constraint-satisfying systems: fully connected networks

·       Self-Organizing Maps

Issues in neural networks:

·       Justification: Why should we expect we can use neural networks?

·       Processing: How do the nodes work together to solve problems?

·       Learning: How can the nodes be corrected if the system makes a mistake?

·       Applicability: What kinds of problems can neural networks can solve??

·       Limitations: Where does the neural network model break down

Intellectual History of Neural Networks

True, humans brains are networks of neurons. But why should we feel justified building our intelligent machines on the same model?  What kind of guarantees can we make about our work?  Last time we saw the following chain of justifications behind symbolic artificial intelligence:

 

·       The Formalization of Mathematics: Work over the 1800s to early 1900s to formalize mathematics and logic culminated in Russel and Whitehead’s Principia Mathematica, which showed virtually every kind of thought conceivable could be written in a formal mathematical notation, and in Kurt Godel’s incompleteness theorem, which showed that while there were limits to formalization, they reside far beyond the kinds of reasoning needed for virtually all practical problems in science, math and engineering.  This provides the justification for using formal mathematics at all — we’ve shown it can do a lot, and it’s limits, while interesting, are rarely relevant to what we want to do.

·       The Formalization of Computation: Work through the first half of the 1900’s resulted in the Church-Turing thesis, which states informally but persuasively that virtually any kind of reasoning or computation could be performed by an extremely kind of general machine called a Turing Machine, which in turn can be formally shown to be equivalent to a wide variety of computer and programming language implementations. This provides the justification to use computers and programming languages — we’ve shown that they can do just about anything formal mathematics can automatically.

·       The Formalization of Intelligence: Work through the mid-century resulted in the information processing model (or symbol processing model) of intelligence, which culminated in the Physical Symbol System Hypothesis.  The PSSH showed that a certain class of physical systems, computationally equivalent to a Turing machine, could interact with their environments in ways that strongly resemble representing the world with symbols and interpreting patterns of symbols to guide reasoning and action.  This provides the first-order justification for artificial intelligence — we’ve shown that we can build a system that can perform virtually any reasoning that mathematicians can devise, and can guide their actions in the world based on what they can imagine.

 

We’ve shown that we can. Now the big question — how do we actually build such a system? 

 

Symbolic artificial intelligence proponents had (and still have) their own recipe to do so based on varying amounts of formal math, programming wizardry, domain analysis, and sheer cussedness.  However, this approach has been shown to have a variety of difficulties: “strong” methods that can solve any problem are too slow, “weak” methods that exploit knowledge are too difficult to instruct, and both kinds of methods fail in the face of unclear specifications and error-filled data.

Neural networks address some of the issues of symbolic artificial intelligence by distributing processing over a network which can be computed for a fixed cost, which are robust in the face of erroneous inputs, and which can be trained based on direct evidence, rather than by human instructions.  Most importantly, just as formal mathematics, the Church-Turing thesis, and the Physical Symbol System Hypothesis outline the strengths and limits of traditional AI, neural networks are buttressed by a series of results which indicate that they can perform any task that traditional AI can:

 

·       The McCulloch-Pitts Equivalence Theorem: Using a network of simple neurons that added up their excitatory and inhibitory neurons and only fired if the sum was greater than 1, McCulloch and Pitts established that neural networks can compute anything that Turing machines can.

·       Von Neumann Error-Correction Theorem: By adding redundant connections to McCulloch-Pitts networks, Von Neumann showed that neural networks could function even if individual elements failed.  Almost all modern neural networks incorporate this kind of redundancy.

·       The Perceptron Convergence Procedure: Rosenblatt showed that a simple two-layer network of nodes with adjustable weights and thresholds, called a perceptron, could reliably use feedback to learn weights that recognize complex patterns, such as Boolean functions and optical patterns.

·       The Perceptron Limit Theorem: Marvin Minsky and Seymour Papert showed that perceptrons could only learn linearly separable functions, limiting them from learning complex functions like XOR (exclusive OR) as well as a variety of complex optical patterns.

·       Backpropagation via the Delta Rule: Sutton and Barto applied the Widrow-Hoff rule for adapting electronic switching circuits to neural networks to produce the backpropagation procedure, which can train multilayer networks to learn non-linearly separable functions.

·       Hopfield Networks and Boltzmann Machines: John Hopfield showed that certain kinds of neural network could act like an associative memory — retrieving complete patterns when presented with only fragments — by minimizing the “potential” stored in the activation of the network.

 

These results, and others like them, have left neural networks in the same position as symbolic artificial intelligence: potentially capable of performing virtually every task, potentially capable of learning a wide variety of behaviors — and lacking a definite procedure that would enable us to construct a network that can solve an arbitrarily complicated task.

As a side note, while derivations are interesting, I quote the following section of the PDP book:

 

“Note that some of the following sections are written in italics. These sections constitute informal derivations of the claims made in the surrounding text and can be omitted by the reader who finds the derivations tedious.”

Rumelhart, Hinton & Williams (1987) Learning Internal Representations by Error Propagation.

So the focus of our work will be on the actual rules themselves, not how they are derived.

Anatomy of Neural Networks

Processing units

·        Simple elements that make up a larger network, representing:

·         Actions

·         Concepts

·         Features

·         Subfeatures

·         Things Understood Only By The Machine

·        Types of Processing Units:

·         TLU: Threshhold Logic Units (also known as Perceptrons and Adalines)

·         TISA: Test, Inhibit, Squelch, Act

·         Full Boolean

·         “Sigmoid” units

·         “Winner Take All” units

·         Finite State Automata

·         Arbitrary Computational Units

State of Activation

·        State of the network at a given time

·        Types of Activation

·         Floating Point Number

·         No state

·         Boolean state

·         Discrete numeric state

·         Floating point state

·         Vector state – multiple kinds of activation

Output function

·        Translates activation into output

·         Can be deterministic or stochastic

·        Types of Output Function

·         Identity function: output = activation

·         Threshhold: output = if activation > threshold then 1 else 0

·         Ramp: 0, then linear increase, then 1

·         Sigmoid: the logistic curve: output = (1+e-a)-1

·         Hyperbolic activation function (speeds up training):

·         Sets of outputs

·         TISA: Test, Inhibit, Squelch, Act: weighted and, weighted or

·         WTA: Winner Take All corresponds to set of inputs

Pattern of Connectivity

·        Input, output, and hidden units

·        Feedforward vs Recurrent vs Bidirectional networks

·        Top-down vs bottom-up

·        Fan in and fan out

·        Represented as a weight matrix in simple cases

·        Types of connectivity

·         Single node

·         Single layer (perceptron)

·         Dual layer

·         3+ layers (standard neural network)
feedforward hidden-layer neural network with backpropagation

·         recurrent (feeds backwards)

·         bi-directional (constraint satisfaction)

Propagation Rule

·        Determined in the case of simple feedforward network structures

·        Recurrent networks and bi-directional networks require more complex updating

·        Net input vector for networks with more than one kind of activaiton

·         Each kind of activation in the network can be summed

·         Some networks use separate inhibition and excitation connections

Activation Rule

·        Often simple if the earlier information is worked out

·        May be identity function or a threshhold

·        Deterministic and Stocastic

·        Activation saturation

·        Quasilinear functions

Learning Rules

·       Add, delete, modify connections

·       Add, delete, modify nodes

·       Rules for Updating

·         Hebbian learning – if one node receives an input from another node and both are highly active, the weight between them should be strengthened

·         Change in connection weight weight between source node and target node:
f(target node activation, feedback)*g(source node output, connection weight)

·         Full version: ∆wij = kf(ai, ti)g(oj,wij)

·         Simple version: ∆wij = kaioj

·         Widrow-Hoff or Delta Rule: ∆wij = k(ti-ai)oj

·         Grossberg Rule: ∆wij = kai(oj –wij)

·         Hopfield Networks and Boltzmann Machines

Renewed question – how do we decide on the topography and weights of a neural network?

Strengths and Weaknesses of Neural Networks

Strengths of Neural Networks

·       Learning systems

·       Noise resistant

·       Failure tolerant

·       Parallelizable

·       Neurologically Plausible

·       Good at tasks difficult for symbolic artificial intelligence:

·         Vision

·         Associative Memory

Limits of Neural Networks

·       Training can be slow

·       Learned knowledge is opaque

·       Output can be unpredictable

·       Difficult to debug

·       Computationally intensive

·       Poor at tasks done well by symbolic artificial intelligence

·         Manipulating symbolic structures

·         Symbolic reference and binding

 

Neural networks often seem complementary to symbolic systems — providing certain features symbolic systems lack, yet lacking the features that symbolic systems themselves provide. Researchers in several areas are trying to develop combined architectures that have the best features of both; however, while most combined architectures have higher-level symbolic structures, some of them do not have components that resemble neural networks — the designers of SOAR, for example, argue that the fast parallel architecture of the brain is used for memory retrieval — and choose to implement that more simply and directly as a symbolic process in SOAR.

Challenges in Constructing Neural Networks

Some of the most important limits of neural networks are in their design and construction.

·       Network topology — not all networks can work

·         Underfitting

·         Overfitting

·         Misfitting

·       Initial weights — not all weight ranges will work

·         Local minima

·         Training time

Solving these problems can be nontrivial.  This is where genetic algorithms come in.

Genetic Algorithms

Adaptive Systems

·       Inspired by biological evolution

·       Based on the work of John Holland and the “genetic algorithm”

·       Have shown fantastic success at problems where a fitness function can be found

·       Can be applied to hardware as well as software for great results

The Genetic Algorithm

·       Designed for search of a vast space of possible solutions to a problem

·       Maintain a large population of candidate solutions

·       Each individual is represented by a “chromosome” that can be modified

·       A fitness function determines an individual’s success at solving the target problem

·       Evolution takes place over multiple phases:

·         Initialization: Create the initial population (random or from known good solutions)

·         Evolution:

·         Evaluate the solutions against the fitness function

·         Select the next generation (best, random, or other sampling)

·         Generate the next generation of the population

·         Replication

·         Crossover

·         Mutation

·         Rinse and repeat until a “sufficiently good” solution is found, or you’re out of time

Implementations of Genetic Algorithms

·       Genetic algorithms: generic term for evolving data structures

·       Genetic programming: evolve computer programs

·       Classifier systems: a reactive system that maintains a population of candidate reflexes and updates them through interaction with the world

Hybrid Approaches

Just as neural systems answer the question of how to make systems learn, genetic algorithms can answer the question of how to make it.  Genetic algorithms have proven repeatedly that they are a good tool to use to design components of systems where we don’t have the knowledge to generate an analytical solution.  This can include the topology of a neural network, the setting of its initial weights, the learning scheme the network uses, the parameters of a reactive controller, and so on.

Resources

Class Texts

·        Artificial Intelligence: Chapters 3, 4, and 5

·        Machines Who Think: Chapter 4

·        Dive Into Python: Chapter 3

Further Reading

Hogan, James P. (1997). Mind Matters: Exploring the World of Artificial Intelligence. Del Rey/Ballantine.

Jones, M. Tim. (2003). AI Application Programming. Charles River Media.

Luger, G.F & Stubblefield, W.A. (1989). Artificial intelligence and the design of expert systems. Benjamin Cummings.

Russel, S., & Norvig, P. (2003). Artificial intelligence: A modern approach. 2nd ed. Morgan Kaufmann.

Rumelhart, D.E., Hinton, G.E. & Williams, R.J. (1987) Learning Internal Representations by Error Propagation. In Rumelhart, D.E. & McClelland, J. L. (Eds). Parallel distributed processing: Explorations in the microstructure of cognition. Cambridge, Massachusetts: Bradford/MIT Press.

Rumelhart, D.E. & McClelland, J. L. (1986). Parallel distributed processing: Explorations in the microstructure of cognition. Cambridge, Massachusetts: Bradford/MIT Press.

Singh, Jagjit. (1966). Great Ideas in Information Theory, Language and Cybernetics. Dover.

Online

General AI

·       AI FAQ: http://www-2.cs.cmu.edu/Groups/AI/html/faqs/ai/top.html

·       AI HOWTO: http://zhar.net/gnu-linux/howto/html/ai.html#toc2

·       Math for AI: http://www.generation5.org/content/2001/beginner00.asp

Neural Networks

·       Neural Networks

·         Neural Networks: http://www-106.ibm.com/developerworks/library/l-neural/

·         A Python Neural Network Library: http://arctrix.com/nas/python/bpnn.py

·         Neural Python: http://starship.python.net/crew/seehof/NeuralPython.html

·         Backpropagation: http://www.generation5.org/content/2002/bp.asp

·       Perceptrons:

·         http://ei.cs.vt.edu/~history/Perceptrons.Estebon.html

·         http://www.generation5.org/content/1999/perceptron.asp

Genetic Algorithms

·       http://pygp.sourceforge.net/

·       http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/199121

·       http://www.scipy.org/