CS 371: Introduction to Artificial Intelligence
Neural Networks

Machine Learning
Learning is such an important part of what we consider "intelligence" that it appears in one common definition:
intelligence: the ability to learn or understand or to deal with new or trying situations. (Webster's)
Intelligent agents make mistakes, but one might argue that they don't make the same mistakes perpetually.

What do agents learn?
If-then decision structures (decision trees)
Function approximation (neural networks)
Action (control) policy (reinforcement learning)
etc. à lots of things!
Learning agents have at least one adaptive component of their architecture.

Connectionism
Connectionism – intelligence bottom up
Small, simple components
Connected together in a large network
Give rise to complex (intelligent) behaviors.
Can complex behavior be learned from a simple process?
Our brief foray into artificial neural networks (ANNs) will limit itself to a few simple goals:

Goals
Understand
the perceptron, the basic unit of ANN computation (like transistor is to circuits),
the perceptron learning rule,
the class of functions a perceptron can represent,
multilayer feed-forward networks,
the back-propogation learning algorithm, and
the momentum variant of back-propogation.
Experience the strengths/weaknesses of multilayer feed-forward methods through experimentation.

The Neuron

The Neuron (cont.)
Basic unit of brain computation
Dendrites
many local to cell body
senses input
Axon
one reaching ~1cm from cell body
transmits output
Axons connect to dendrites through synapses

The Neuron (cont.)
Complex electrochemical process:
Synapses release chemicals…
Chemicals increase dendrite electrical potential…
When potential reaches a threshold, …
An electrical pulse (action potential) goes down axon to synapses, so …
Synapses release chemicals…

Computer & Brain – a comparison
Computers have a much faster clock speed.
Brains are much, much more parallel. à more unit updates per sec than computer
Brains are more adaptive à grow into tasks
Brains exhibit graceful degradation: gradual rather than sharp drop off in performance and conditions worsen

Motivation for Neural Network
Brain has many desirable characteristics that most computers lack
plasticity, self-adaptivity
massive parallelism
graceful degradation
What would be a simple computation unit from which to build a "computer brain"?

The Neural Network Unit
weighted sum of inputs: ini = sumj(Wj,i´aj)
output from activation function: aj = g(ini)

Activation Functions
(a) t = step threshold (can replace with extra input weight W0,i = t & fixed a0 = -1)
(c) sigmoid(x) = 1/(1 + e-x)

Understanding What Units Compute
Suppose you have a 2-input unit with a step function and a fixed threshold t.
Let x, y be inputs.
What set of points on the x-y plane are at the unit's threshold? (Simplify equations.)
Answer: The line Wx,i´x + Wy,i´y = t
rewritten: y = (-Wx,i/Wy,i)x + (t/Wy,i)

In-Class Exercise: Units as Logic Gates
For values 0,1 corresponding to true, false, and a unit with a 0/1-step function, can you choose W1,i, W2,i, and t so as to compute:
AND?
OR?
IMPLIES?
EQUIVALENT?

In-Class Exercise: Units as Logic Gates
For values 0,1 corresponding to true, false, and a unit with a 0/1-step function, can you choose W1,i, W2,i, and t so as to compute:
AND? (1, 1, 1.5)
OR? (1, 1, .5)
IMPLIES? (1, -1, .5)
EQUIVALENT? (NOT POSSIBLE – Why?)

Linear Separability
One 2-input unit activates for all inputs on one side of a line
3-inputs à plane,  n-inputs à hyperplane

3-Input Unit and Plane of Separation

Perceptrons
A perceptron has
input units Ij
input weights Wj
step activation function step0
output O
O = step0(sumj(Wj´Ij))

Perceptron Learning Rule
Suppose one randomizes initial weights and has a set of desired input, output pairs.
Iterate:
Compute O from inputs
Compute error Err = T – O from correct output T
Adjust weights: Wj ß Wj + Ij´Err where a is the learning rate.

Perceptron Learning
Perceptron learning is a gradient descent search through the space of possible weights.
Each training example provides an "error surface" for weights.  Learning rule runs weights downhill with learning rate a as step size.
For linearly separable functions, there are no local minima, and guaranteed to converge if learning rate a not too high (overshoot)
Summary: Very effective for very simple representable functions.

Network Learning Algorithm

Is There Hope?
Is there any hope for learning functions that are not linearly separable?
Yes, but a perceptron network isn't enough.
One needs more than one layer of units between inputs and outputs to compute other functions.
With enough "hidden" units (units within), any boolean function is computable, and any continuous function is approximable.

Simple Multilayer Feed-Forward Network

Multilayer Feed-Forward Network with 1 Hidden Layer

Back-Propagation
Basic idea: Supply training inputs, computation feeds forward, error computed with training output, error propagates backward for weight updates.
Start with final layer
Update output weights of layer according to layer output error as with perceptron learning rule
Assign error to units of previous layer according to weights
Repeat this process backwards through layers

Back-Propagation (cont.)
Error computation makes use of the slope of the activation function, so we need to use continuous activation functions.
The sigmoid function is typical.
sigmoid(x) = 1/(1 + e-x)
sigmoid'(x) = sigmoid(x)(1 – sigmoid(x))
 Error term Di = Erri*g'(ini)

Back-Propagation (cont.)
Updates to output units:
W'j,i
ß Wj,i + aj´Di
Computation of error for previous layer units:
Dj ß g'(inj) ´ sumi(Wj,i´Di)
Process continues with previous layer:
W'k,j
ß Wk,j + ak´Dj
Dk ß g'(ink) ´ sumj(Wk,j´Dj)
Repeat until input layer is reached (e.g. ak = Ik)

Slide 28

Momentum
When updating a weight, also add the previous update to that weight times a momentum constant m (0.0 <= m < 1.0).
Possible to carry weights
across plateaux in error surface
through local minima to global minima
through global minima to local minima (i.e. can have undesirable effects as well).

Error Surface

Text Notation