close

Вход

Забыли?

вход по аккаунту

код для вставкиСкачать
CAP6938
Neuroevolution and
Artificial Embryogeny
Neural Network Weight
Optimization
Dr. Kenneth Stanley
January 18, 2006
Review
?
?
?
? ?
?
?
? ?
• Remember, the values of the weights and the
topology determine the functionality
• Given a topology, how are weights optimized?
• Weights are just parameters on a structure
Two Cases
• Output targets are known
• Output targets are not known
out1
out2
H1
H2
w11
w12
X1
w21
X2
Decision Boundaries
OR function:
+
+
Input
1
1
-1
-1
-
1
-1
1
-1
Output
1
1
1
-1
+
• OR is linearly separable
• Linearly separable problems do not
require hidden nodes (nonlinearities)
Bias
Decision Boundaries
XOR function:
+
-
Input
1
1
-1
-1
-
1
-1
1
-1
Output
-1
1
1
-1
+
• XOR is not linearly separable
• Requires at least one hidden node
Bias
Hebbian Learning
• Change weights based on correlation of
connected neurons
• Learning rules are local
• Simple Hebb Rule:w i ( new)  w i ( old )  x i y
• Works best when relevance of inputs to
outputs is independent
• Simple Hebb Rule grows weights unbounded
• Can be made incremental:  w i   x i y
More Complex Local Learning Rules
• Hebbian Learning with a maximum magnitude:
– Excitatory:  w   1 (W-w)xy  η 2 Wx(y  1.0 )
– Inhibitory:  w   1 (W-w)xy   2 (W-w)x (1 . 0  y )
• Second terms are decay terms: forgetting
– Happens when presynaptic node does not affect
postsynaptic node
• Other rules are possible
• Videos: watch the connections change
Perceptron Learning
• Will converge on correct weights
• Single layer learning rule: w ( new)  w ( old )   tx
• Rule is applied until boundary is learned
i
Bias
i
i
Backpropagation
•
•
•
•
•
Designed for at least one hidden layer
First, activation propagates to outputs
Then, errors are computed and assigned
Finally, weights are updated
Sigmoid is a common
t1
t2
x’s are inputs
activation function
y1
z’s are hidden units
y2
w11 w w21 w22
12
z1
v11
v12
X1
v21
z2
v22
y’s are outputs
t’s are targets
v’s are layer 1 weights
w’s are layer 2 weights
X2
Backpropagation Algorithm
1)
2)
Initialize weights
While stopping condition is false, for each training pair
1)
2)
3)
Compute outputs by forward activation
Backpropagate error:
(target minus output times slope)
1) For each output unit, error  k  ( t k  y k ) f ( yin k )
2)
3)
4)
Weight correction  w   z (Learning rate times error times
jk
k j
Send error back to hidden units
hidden output)
Calculate error contribution for each hidden unit:
5)
Weight correction
 m

 j     k w jk  f ( zin j )
 k 1

 v ij   j x i
Adjust weights by adding weight corrections
Example Applications
• Anything with a set of examples and
known targets
• XOR
• Character recognition
• NETtalk: reading English aloud
• Failure predicition
• Disadvantages: trapped in local optima
Output Targets
Often Not Available
(Stone, Sutton, and Kuhlmann 2005)
One Approach: Value Function
Reinforcement Learning
• Divide the world into states and actions
• Assign values to states
• Gradually learn the most promising states
and actions
0
0
0
0
0
0
0
Goal
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Start
0
0
0
0
0
0
0
Learning to Navigate
T=56
T=1
0
0
0
0
0
0
0
Goal
1
0
0
0
0
0
0
0
Goal
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0.5
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Start
0
0
0
0
0
0
0
0
Start
0
0
0
0
0
0
0
T=703
T=350
0
0
0
0
0
0
0
Goal
1
0
0
0
0
0
0
0
Goal
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
Start
0
0
0
0
0
0.9 1
1
Start
1
1
1
1
1
1
1
How to Update State/Action Values
• Q learning rule:
Q ( state , action )  R ( state , action )    Max Q ( nextstate , allactions ) 
• Exploration increases Q-values’ accuracy
• The best actions to take in different states
become known
• Works only in Markovian domains
Backprop In RL
• The state/action table can be estimated by
a neural network
• The target learned by the network is the
Q-value:
Value
NN
Action State_description
Next Week:
Evolutionary Computation
•
•
•
•
EC does not require targets
EC can be a kind of RL
EC is policy search
EC is more than RL
For 1/23: Mitchell ch.1 (pp. 1-31) and ch.2 (pp. 35-80)
Note Section 2.3 is "Evolving Neural Networks"
For 1/25: Mitchell pp. 117-38,
paper: No Free Lunch Theorems for Optimization (1996)
by David H. Wolpert, William G. Macready
1/--страниц
Пожаловаться на содержимое документа