close

Вход

Забыли?

вход по аккаунту

код для вставкиСкачать
Systems Learning for Complex
Pattern Problems
Omid Madani
AI Center, SRI International
© 2008 SRI International
Foundations of Intelligence: Concepts (Categories)
• Intelligent systems categorize their perceptions (objects, events, relations)
• Categorization involves substantial abstraction: you rarely see the same exact
thing again…
• Categorization is necessarily for intelligence
• Categories are complex: have adaptive structure, composed of parts, of
absrtactions,…
• High intelligence (advanced animals) requires myriad categories
What are the principles behind such learning and development?
• Assumptions/Evidence: These (perceptual) categories are developed mainly in
an unsupervised manner
– Doubtful they are all programmed in.. Many are not (in particular, for humans)
– Explicit teacher is absent
© 2008 SRI International
Example Perceptual Concepts
• In text, every word, phrase, expression: “book”, “new”, “a”, …
• Single characters are primitive concepts: “a”, “b”, …, “1”,”2”, “;” ….
• Concepts can be composed of other concepts:
– “n”+”e” = “ne”
– “new” + “york” = “new york”
• Concepts can be abstractions:
– week-day = {Monday, Tuesday, ….}
– Digits = {1,2,3,4,….}
• Area code is a concept that involves both composing and abstraction:
– Composition of 3 digits
– A digit is a grouping, i.e., the set {0,1,2,…,9} ( 2 is a digit )
• Other examples: phone number, address, resume page, face (in visual domain),
etc.
© 2008 SRI International
Acquiring and Developing Concepts
?
• Higher intelligence, such as “advanced” pattern recognition/generation
(e.g. vision), may require
– Long term learning (weeks, months, years,…)
– Cumulative learning (learn these first, then these, then these,…)
– Massive Learning: Myriad inter-related categories/concepts
– Systems learning: multiple algorithms working together
– Autonomy (relatively little human involvement)
What are the learning
processes?
Applications: learning to segment words in speech
stream in any language, visual object recognition,
learn to play Go/Chess
© 2008 SRI International
Learning by Repeatedly
Predicting in a Rich World
• In a nutshell, we seek a system
such that:
(e.g. words, digits, phrases,
phone numbers, faces, visual
objects, home pages, sites,…)
(Input say text: characters, ..
or vision: edges, curves,…)
low level or “hard-wired” categories
higher level categories
(bigger chunks)
…. 0011101110000….
predict
predict
observe & update
Prediction System
After a while
(much learning)
Prediction Games in Infinitely Rich Worlds, AAAI FSS07
© 2008 SRI International
observe & update
Prediction
System
Example Category Node (processed Jane Austen’s online books)
categories
appearing before
prediction weights
“nei”
“and ”
0.13
0.087
“toge”
“ far”
“ bro”
“heart”
0.07
0.057
0.052
0.11
“ther ”
7.1
0.41
(keep local statistics)
0.10
“love ”
“by ”
(Exploring Massive Learning via a Prediction System, AAAI FSS’07)
© 2008 SRI International
Some Challenges or Features of the Task
• Lots of
– Features/predictors (input dimensionality),
– classes (output dimensionality),
– instances (episodes)
• Uncertainty in the value of features, classes, adequate
segmentation, …
– No one segments them for us! (what about written language?)
• Require algorithms that are primarily:
– incremental, handle nonstationarities, uncertainty, asymptotic
convergence, efficient sample complexity
• Objectives and evaluation criteria?
© 2008 SRI International
Many-Class Learning (.. A Wiring Problem)
• The questions raised during this research:
1. Given the need to quickly classify (a given instance) into one of
myriad classes (e.g. millions), how can this be done?
1. How about space efficiency?
2. How can we efficiently learn such efficient classification systems?
classification system
x R , x
n
© 2008 SRI International
?
many-class
learning
A Solution: Index Learning
Output: Output:
an index = sparse
weighted
bipartite
A (sparse)
matrixgraph
Input:
tripartite graph
features instances categories
features
cj
0
learn
fi
fi
0
c1
w ij
w ij
0
W
© 2008 SRI International
categories
cj
Classification/Prediction
(retrieval & scoring)
x  { f2 , f3 }
1. Features are “activated”
2. Edges are activated
3. Receiving classes are
activated
4. Classes sorted/ranked
features
f1
f2
f3
f4
classes
c1
0 .1
0 .4
0 .3
0 .2
0 .1
sorted list :
( c 4 ,0 . 5 ), ( c 3 ,0 . 4 ), ( c 5 ,0 . 1 ), ( c1 ,0 . 1 )
see omadani.net for the learning algorithms
© 2008 SRI International
c2
c3
c4
c5
Summary
• Encouraging signs that elements of unsupervised (more
“autonomous”) long-term learning systems are developing:
– For instance, efficient many-class learning a good possibility
– Good progress in machine learning (e.g. some evidence that hierarchical networks are
useful)
• Our work stresses large-scale and long-term learning
– A “systems” approach (compared to traditional neural network approaches): we require to
solve multiple problems and need multiple algorithms
– Many challenges:





Uncertainties (e.g. feature noise and label noise)
Nonstationarities (concepts evolve, the system evolves and develops)
System objective(s)?
Avoiding accumulation of error, local minima, slow learning
Understanding the interaction between different modules (segmentation and concept learning,
etc.)
• Driven by goal of robustly solving practical problems (versus driven
by “modeling” the brain), but problems that we think intelligence in
the biological world solves.
© 2008 SRI International
Expedition (a 1st System)
predictors (active categories)
… New Jersey in …
predictors
next time
step
… New Jersey in …
target (category to predict)
window containing context
and target
target
In this example, context contains
one category on each side
© 2008 SRI International
.. Some Time Later ..
predictors
… loves New York life …
target (category to predict)
window containing context
and target
In terms of supervised learning/classification, in this learning activity (prediction games):
• The set of concepts grows over time
• Same for features/predictors (concepts ARE the predictors!)
• Instance representation (segmentation of the data stream) changes/grows over time ..
© 2008 SRI International
On Learning a Task (or a Dilemma of AI!)
Program It!
Program
Learn
to Learn
It!
It!
Program to Learn to Learn It!...
© 2008 SRI International
A View of ML: On the Source of Classes
(A Spectrum of Feedback-Driven (“Supervised”) Learning)
1. Human defined
2. Human/Explicitly assigned
(human procures training data)
(classic supervised learning )
Annotator/Editorial label
assignment, (Reuters RCV1,
ODP,…) controlled image
tagging, ~mechanical Turk,
explicit personalization
(news filtering, spam,…)
1. Human defined
2. Implicitly assigned (by
the “world” or a “natural”
activity, or by machine)
The Newsgroup data set
Image tagging in Flicker
Users as classes
Queries as classes
Predict clicks
…..
More machine autonomy (less human involvement)
More noise/uncertainty
More training data
More classes
More open problems!
More interesting!
© 2008 SRI International
predict a
word using
context in text
1. Machine defined
2. Implicitly assigned
(by the “world” or a “natural”
activity/machine)
Autonomous
learning systems
( systems acquiring
and developing their
own concepts,
prediction games,
complex sensory
input streams,
cumulative learning,
life-long learning,
development,... )
Summary
See omadani.net/publications.html
© 2008 SRI International
1/--страниц
Пожаловаться на содержимое документа