close

Вход

Забыли?

вход по аккаунту

Луч света в конце туннеля;pdf

код для вставкиСкачать
A Fast Kernel-based Multilevel Algorithm for Graph
Clustering
Inderjit Dhillon
Yuqiang Guan
Brian Kulis
Dept. of Computer Sciences
University of Texas at Austin
Austin, TX 78712
Dept. of Computer Sciences
University of Texas at Austin
Austin, TX 78712
Dept. of Computer Sciences
University of Texas at Austin
Austin, TX 78712
[email protected]
[email protected]
[email protected]
ABSTRACT
General Terms
Graph lustering (also alled graph partitioning) | lustering the nodes of a graph | is an important problem in
diverse data mining appliations. Traditional approahes
involve optimization of graph lustering objetives suh as
normalized ut or ratio assoiation; spetral methods are
widely used for these objetives, but they require eigenvetor omputation whih an be slow. Reently, graph
lustering with a general ut objetive has been shown to
be mathematially equivalent to an appropriate weighted
kernel k -means objetive funtion. In this paper, we exploit this equivalene to develop a very fast multilevel algorithm for graph lustering. Multilevel approahes involve
oarsening, initial partitioning and renement phases, all of
whih may be speialized to dierent graph lustering objetives. Unlike existing multilevel lustering approahes,
suh as METIS, our algorithm does not onstrain the luster sizes to be nearly equal. Our approah gives a theoretial
guarantee that the renement step dereases the graph ut
objetive under onsideration. Experiments show that we
ahieve better nal objetive funtion values as ompared to
a state-of-the-art spetral lustering algorithm: on a series
of benhmark test graphs with up to thirty thousand nodes
and one million edges, our algorithm ahieves lower normalized ut values in 67% of our experiments and higher ratio assoiation values in 100% of our experiments. Furthermore, on large graphs, our algorithm is signiantly faster
than spetral methods. Finally, our algorithm requires far
less memory than spetral methods; we luster a 1.2 million node movie network into 5000 lusters, whih due to
memory requirements annot be done diretly with spetral
methods.
Algorithms, Experimentation
Categories and Subject Descriptors
G.1.8.1 [Numerial Analysis℄: Spetral Methods; H.3.3.a
[Information Searh and Retrieval℄: Clustering; I.5.3.a
[Pattern Reognition℄: Clustering Algorithms
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
KDD’05, August 21–24, 2005, Chicago, Illinois, USA.
Copyright 2005 ACM 1-59593-135-X/05/0008 ...$5.00.
Keywords
Graph Clustering, Kernel Methods, Multilevel Methods, Spetral Clustering
1. INTRODUCTION
Graph lustering (also alled graph partitioning) is an important problem in many domains. Ciruit partitioning,
VLSI design, task sheduling, bioinformatis, soial network
analysis and a host of other problems all rely on eÆient and
eetive graph lustering algorithms. In many data mining
appliations, pairwise similarities between data objets an
be modeled as a graph, and the subsequent problem of data
lustering an be viewed as a graph lustering problem.
The problem of graph lustering has been studied for
deades, and a number of dierent approahes have been
proposed. Spetral methods have been widely used for graph
lustering [2, 6, 9℄. These algorithms use the eigenvetors of
a graph aÆnity matrix, or a matrix derived from the aÆnity matrix, to partition the graph. Various spetral algorithms have been developed for a number of dierent objetive funtions, suh as normalized ut [9℄ and ratio ut [2℄.
Furthermore, extensive researh has been done on spetral
postproessing: going from the eigenvetor matrix to a disrete lustering [4, 10℄.
It was reently shown that a wide lass of graph lustering objetives, inluding ratio ut, ratio assoiation, the
Kernighan-Lin objetive and normalized ut, an all be viewed
as speial ases of the weighted kernel k -means objetive
funtion [3, 4℄. In addition to unifying several dierent graph
lustering objetives, inluding a number of spetral lustering objetives, this result implies that a simple weighted
kernel k -means algorithm an be used to optimize graph
lustering objetives. The basi kernel k -means algorithm,
however, relies heavily on eetive luster initialization to
ahieve good results, and an often yield poor results.
In this paper, we develop a multilevel algorithm for graph
lustering that uses weighted kernel k -means as the renement algorithm. Multilevel methods have been used extensively for graph lustering with the Kernighan-Lin objetive, whih attempts to minimize the ut in the graph while
maintaining equal-sized lusters [1, 7, 8℄. In multilevel algorithms, the graph is repeatedly oarsened level by level
until only a small number of nodes are left. Then, an ini-
Name
Ratio Assoiation
Ratio Cut
Normalized Cut
P
P
V ;:::;V
minimize Pk
V ;:::;V
Objetive Funtion
k links(V ;V )
maximize
V1 ;:::;Vk =1
jV j
minimize k links(V ;VnV )
1
k
1
k
Input Graph
Final Clustering
jV j
links(V ;VnV )
=1
degree(V )
=1
Table 1: Examples of graph lustering objetives.
Note that Normalized Assoiation = k Normalized
Cut.
Coarsening
tial lustering on this small graph is performed. Finally, the
graph is unoarsened level by level, and at eah level, the
lustering from the previous level is rened using the renement algorithm. This multilevel strategy results in a very
fast and eetive graph lustering algorithm. METIS [8℄ is
perhaps the most popular multilevel algorithm, and is signiantly faster than spetral methods on very large graphs
when equally sized lusters are desired.
However, all previous multilevel algorithms, suh as METIS,
suer from the serious drawbak of restriting lusters to be
of equal size. The algorithm presented in this paper removes
this restrition: instead of a Kernighan-Lin algorithm for the
renement phase, our algorithm employs weighted kernel k means.
We present experimental results on a number of test graphs
to illustrate the advantage of our approah. We show that
our algorithm is signiantly faster than eigenvetor omputation, a neessary step for spetral lustering methods,
on large graphs. We also demonstrate that our algorithm
outperforms a state-of-the-art spetral lustering algorithm
in terms of quality, as measured by the graph objetive funtion values. On our benhmark test graphs of varying sizes,
the nal objetive funtion values of the multilevel algorithm
are better than the spetral algorithm in 100% of runs for
ratio assoiation, and 67% of runs for normalized ut. We
also present results on data arising from real-life data mining appliations. We luster the 1.2 million node IMDB
movie network into 5000 lusters. Running a spetral algorithm diretly on this data set would require storage of 5000
eigenvetors, whih is prohibitive.
2.
GRAPH CLUSTERING
We rst introdue a number dierent graph lustering objetives. As we will see later, eah of the following objetives
may be expressed as speial ases of the weighted kernel k means objetive funtion, whih will motivate our use of
weighted kernel k -means as the algorithm used in the renement phase.
For data given as a graph, a number of dierent objetives
have been proposed for the graph lustering problem. We
are given a graph G = (V ; E ; A), onsisting of a set of verties
V and a set of edges E suh that an edge between two verties
represents their similarity. The aÆnity matrix A is jVjjVj
whose entries represent the weights of the edges (an entry of
A is 0 if there is no edge between the orresponding verties).
We denote links(A; B) = i2A;j 2B Aij ; that is, the sum
of the weights of edges from one luster to the other. Let
degree(A) = links(A; V ), the links to all verties from nodes
in A.
A list of some of the most ommon graph lustering obje-
P
Refining
Initial Clustering
Figure 1: Overview of the multilevel algorithm (for
k = 2).
tives is presented in Table 1. The Kernighan-Lin objetive,
another ommon objetive funtion, is the same as ratio ut
exept that lusters are additionally onstrained to be of
equal size.
For ratio assoiation, ratio ut and normalized ut, the
most ommon approah for optimization has been to use a
spetral algorithm. Suh an algorithm is derived by relaxing
the above objetives so that a global solution to the relaxed
objetive may be found by omputing eigenvetors of an appropriate matrix. One these eigenvetors are omputed,
a postproessing step derives a disrete lustering from the
eigenvetors. See the referenes [2, 9, 10℄ for further details
on the spetral algorithms assoiated with eah of these objetives.
For the Kernighan-Lin objetive, the most suessful approah for optimizing the objetive has been to use a multilevel algorithm.
3. THE MULTILEVEL APPROACH
In this setion, we develop a new, general algorithm for
graph lustering based on multilevel methods. The multilevel approah has been made popular by METIS [8℄, though
a number of multilevel lustering algorithms have been studied, dating bak to [1, 7℄. Our algorithm will dier from previous approahes in that it works for a wide lass of graph
lustering objetives, and that all three phases of the algorithm an be speialized for eah graph lustering objetive.
Below, we desribe eah phase of the algorithm.
We assume that we are given an input graph, denoted
by G0 = (V0 ; E0 ; A0 ), as well as the number of partitions
requested, k. See Figure 1 for a graphial overview of the
multilevel approah.
3.1 Coarsening Phase
Given the initial graph G0 , the graph is repeatedly transformed into smaller and smaller graphs G1 , G2 , :::; Gm suh
that jV0 j > jV1 j > ::: > jVm j. To oarsen a graph from Gi to
Gi+1 , a number of dierent tehniques may be used. In general, sets of nodes in Gi are ombined to form supernodes
in Gi+1 . When ombining a set of nodes into supernodes,
the edge weights between two supernodes in Gi+1 are taken
to be the sum of the edge weights between the nodes in Gi
omprising the supernodes.
Our oarsening approah works as follows: given a graph,
start with all nodes unmarked. Visit eah vertex in a random
order. For eah unmarked vertex x, if all neighbors of x have
been marked, mark x and proeed to the next unmarked
vertex. On the other hand, if x has an unmarked neighbor,
then merge x with the unmarked vertex y that maximizes
e(x; y ) e(x; y )
+
;
w(x)
w(y )
(1)
where e(x; y ) orresponds to the edge weight between verties x and y and w(x) is the weight of vertex x. Then mark
both x and y . One all verties are marked, the oarsening
for this level is ompleted.
As we will see when disussing the onnetion between
graph lustering and weighted kernel k -means, dierent graph
lustering objetives indue dierent vertex weights. For example, in normalized ut, the weight of a vertex is its degree,
and (1) redues to the normalized ut between x and y . For
ratio assoiation, (1) simplies to the heaviest edge riterion
of METIS, sine all verties have unit weight.
3.2 Initial Clustering Phase
Eventually, the graph is oarsened to the point where very
few nodes remain in the graph. We speify a parameter indiating how small we want the oarsest graph to be; in our
experiments, we stop oarsening when the graph has less
than 20k nodes, where k is the number of desired lusters.
At this point, we perform an initial partitioning by lustering the oarsest graph.
One way to obtain an initial partitioning is to use the
region growing algorithm of METIS [8℄. However, this approah is inadequate for our purposes sine it generates lusters of equal size. We have found the best method for initialization to be a spetral algorithm. We use the spetral algorithm of Yu and Shi [10℄, whih we generalize to
work with arbitrary weights [4℄. Thus our initial lustering is \ustomized" for dierent graph lustering objetives.
Sine the oarsest graph is signiantly smaller than the input graph, spetral methods are adequate in terms of speed.
We nd the spetral methods to yield the best quality of results, though they are somewhat less eÆient than the region
growing algorithm. Hene, when eÆieny is a onern, the
region growing algorithm may be used instead.
3.3 Refinement Phase
The nal phase of the algorithm is the renement phase.
Given a graph Gi , we form the graph Gi 1 (Gi 1 is the
same graph used in level i 1 in the oarsening phase).
We extend the lustering from Gi to Gi 1 as follows: if
a supernode is in a luster , then all nodes in Gi formed
from that supernode are in luster . This yields an initial
lustering for the graph, whih is then improved using a
renement algorithm. Note that we also run the renement
algorithm on the oarsest graph.
The algorithm terminates after the renement algorithm
is run on the original graph G0 . Sine we have a good initial lustering at eah level, the renement algorithm often
onverges very quikly. Thus, this approah is extremely
eÆient.
At eah renement step, our algorithm uses an approah
inspired by the reently-shown theoretial onnetion between kernel k -means and spetral lustering [4℄. In [4℄, we
proved that eah of the graph lustering objetives given
Algorithm 1: Renement Algorithm.
Weighted Kernel kmeans(K , k, w,
tmax ,
k )
,
f
g
=1
=1
Input: K : kernel matrix, k: number of lusters,
w: weights for eah point, tmax : optional maximum number of iterations, f(0) gk=1 : optional initial lustering
Output: f gk=1 : nal lustering of the points
1. If no initial lustering is given, initialize the k
lusters 1(0) ; :::; k(0) randomly. Set t = 0.
2. For eah row i of K and every luster , ompute
2 j 2(t) wj Kij
d(i; m ) = Kii
j 2(t) wj
(t) wj wl Kjl
:
+ j;l2
( j 2(t) wj )2
f gk
(0)
P
P
P
P
3. Find (i) = argmin d(i; m ), resolving ties
arbitrarily. Compute the updated lusters as
(t+1) = fi : (i) = g:
4. If not onverged or tmax > t, set t = t + 1
and go to Step 3; Otherwise, stop and output nal
lusters f(t+1) gk=1 .
Objetive
Ratio Assoiation
Ratio Cut
K-L Objetive
Normalized Cut
Node Weights Kernel Matrix
1 8 nodes
1 8 nodes
1 8 nodes
Deg. of node
K = I + A
K = I D + A
K = I D + A
K = D 1 + D 1 AD
1
Table 2: Popular graph lustering objetives with
orresponding weights and kernels given the aÆnity
matrix A
earlier may be expressed as speial ases of the weighted
kernel k -means objetive with the orret hoie of weights
and kernel matrix. Hene, the weighted kernel k -means algorithm an be diretly used to loally optimize these graph
lustering objetives. Eah iteration of this algorithm osts
O(nz ) time, where nz is the number of nonzero entries in
the kernel matrix.
Algorithm 1 desribes the weighted kernel k-means algorithm. The input is the kernel matrix K , the number of
lusters k, weights for the points w, an optional maximum
number of iterations, and an optional initial lustering, denoted by f(0) gk=1 .
For the Kernighan-Lin (K-L) objetive, whih requires
equally sized lusters, an inremental kernel k -means algorithm (in whih points are swapped to improve the objetive
funtion value) an be used to loally optimize the objetive.
In Table 2, we display eah of the earlier graph lustering objetives, along with the hoie of weights and kernel
required for the weighted kernel k-means algorithm to monotonially optimize the given graph lustering objetive. The
matrix D is a diagonal matrix whose entries orrespond to
the sum of the rows of A, the graph aÆnity matrix. Finally, is a real number hosen to be large enough that K
is positive denite. As long as K is positive denite, we
Graph name
DATA
3ELT
UK
ADD32
WHITAKER3
CRACK
FE 4ELT2
MEMPLUS
BCSSTK30
#nodes
2851
4720
4824
4960
9800
10240
11143
17758
28294
#edges
15093
13722
6837
9462
28989
30380
32818
54196
1007284
Desription
nite element mesh
nite element mesh
nite element mesh
32-bit adder
nite element mesh
nite element mesh
nite element mesh
memory iruit
stiness matrix
Table 3: Test graphs
an theoretially guarantee that the algorithm monotonially optimizes the orresponding graph lustering objetive.
Initialization an be done randomly, or by using another
lustering algorithm, suh as METIS or spetral lustering.
In the ase of our multilevel algorithm, initialization is obtained from the lustering from the previous level, or the
initial lustering in the ase of the oarsest graph.
An extension to the basi bath algorithm presented in [4℄
uses loal searh to avoid loal optima. To avoid this problem, it has been shown [5℄ that loal searh an signiantly
improve results by allowing the algorithm to esape suh
poor optima. The loal searh algorithm onsiders the eet
on the objetive funtion of moving a point from one luster to another. It hooses a hain of moves that auses the
greatest improvement in the objetive funtion value. We
inorporate this loal searh proedure to improve the quality of our weighted kernel k -means bath renement sheme.
4.
EXPERIMENTS
In this setion, we present a number of experiments to
show that our multilevel weighted kernel k -means algorithm
outperforms spetral methods in terms of quality (normalized ut and ratio assoiation values) as well as omputation
time. We also present results of lustering the 1.2 million
node IMDB movie data set graph. In our experiments, we
use the spetral algorithm from [10℄, generalized to work
for both normalized ut and ratio assoiation. This algorithm is used as the benhmark spetral algorithm in our
experiments, as well as the method for lustering the oarsest graph in our multilevel implementation. To speed up
omputation, we onsidered only \boundary points" when
running weighted kernel k-means; further disussion is omitted due to lak of spae. All the experiments are performed
on a Linux mahine with 2.4 GHz Pentium IV proessor and
1 GB main memory.
Additionally, we use the following aronyms: mlkkm(0)
stands for our multilevel kernel k-means algorithm with no
loal searh, mlkkm(20) is our multilevel algorithm with a
loal searh hain length of 20, kkm stands for kernel kmeans with random initialization, NCV is short for normalized ut value, and RAV is short for ratio assoiation value.
4.1 Clustering of Benchmark Graphs
Table 3 lists 9 test graphs1 from various soures and different appliation domains. Some of them are benhmark
1
Downloaded from http://staweb.ms.gre.a.uk/~.walshaw/
partition/
NCV
RAV
mlkkm(0)
kkm METIS
2643
12744
2308 4788
18526 1349
Table 4: Normalized ut values (NCV) and ratio
assoiation values (RAV) for obtaining 5000 lusters
of the IMDB movie data set using our multilevel
algorithm (mlkkm(0)), kernel k-means (kkm) and
METIS
matries used to test METIS.
On eah of the benhmark graphs, we ompared our multilevel algorithms, mlkkm(0) and mlkkm(20), with the spetral algorithm in terms of objetive funtion values and omputation time. We report results for both ratio assoiation
and normalized ut, and when 64 lusters are omputed. We
do not report omparisons between our multilevel algorithm
and METIS; in [4℄, we showed that the spetral algorithm is
superior to METIS in terms of better ratio assoiation and
normalized ut values.
Figure 2 shows the relative performane of the spetral
algorithm ompared with our multilevel algorithm in terms
of RAV and omputation time. In the left panel, for eah
graph we plot the ratio of the RAV of all methods to that
of mlkkm(20) | ratios less than 1 indiate that mlkkm(20)
produes higher RAVs (and thus performs better). We an
see that 100% of the RAVs produed using mlkkm(20) are
higher than those obtained by the spetral algorithm. Also,
we see that mlkkm(20) performs onsistently better than
mlkkm(0), whih implies that loal searh improves the RAV
in all ases (maximally by 3%).
In the right panel of Figure 2, we plot the omputation
time for the three methods. It is lear that our multilevel
algorithm is muh faster than the spetral algorithm. For
example, in the ase of ADD32, the spetral algorithm is
48 times slower than mlkkm(20) and 55 times slower than
mlkkm(0).
Results for the normalized ut objetive appear in Figure 3. For eah graph, we plot the ratio of NCV of other
methods to that of mlkkm(20). As opposed to ratio assoiation, values that are larger than 1 in Figure 3 indiate that mlkkm(20) produes lower NCVs (and thus performs better). We see that 67% of the NCVs produed using
mlkkm(20) are lower than those produed using the spetral
algorithm. We also see that mlkkm(20), whih utilizes the
loal searh tehnique, performs muh better than mlkkm(0)
in many ases.
In terms of omputation time, our multilevel methods
again outperform the spetral algorithm in all ases. For
ADD32, spetral is 58 times slower than mlkkm(20) and 60
times slower than mlkkm(0).
4.2 The IMDB Movie Data Set — A Case Study
The IMDB data set2 ontains information about movies,
musi bands, ators, movie festivals and events. By onneting ators and movies or events in whih they partiipate,
we form a sparse undireted graph with approximately 1.2
million nodes and 7.6 million edges.
It is impratial to run a spetral algorithm diretly on
this data set to produe 5000 lusters not only beause om2
Downloaded from http://www.imdb.om/interfaes
64 clusters
64 clusters
12
spectral
mlkkm(0)
mlkkm(20)
10
Computation time in seconds
Ratio association value (scaled by values of mlkkm(20))
1.05
1
0.95
spectral
mlkkm(0)
mlkkm(20)
8
6
4
2
0.9
data
3elt
uk
add32
whitaker3
crack
fe_4elt2
memplus
0
bcsstk30
data
3elt
uk
add32
whitaker3
crack
fe_4elt2
memplus
bcsstk30
Figure 2: Quality and omputation time of our multilevel methods ompared with the benhmark spetral
algorithm. The left panel plots the ratio of the ratio assoiation value of every algorithm to that of mlkkm(20)
for 64 lusters. Note that bars below the baseline orrespond to ases where mlkkm(20) performs better.
The right panel ompares omputation times for generating 64 lusters using dierent algorithms. As an
example, on the ADD32 graph, the spetral algorithm is 48 times slower than mlkkm(20) and 58 times slower
than mlkkm(0).
64 clusters
64 clusters
1.25
30
spectral
mlkkm(0)
mlkkm(20)
25
1.2
Computation time in seconds
Normalized cut value (scaled by values of mlkkm(20))
1.3
1.15
1.1
1.05
spectral
mlkkm(0)
mlkkm(20)
20
15
10
1
5
0.95
0.9
data
3elt
uk
add32
whitaker3
crack
fe_4elt2
memplus
bcsstk30
0
data
3elt
uk
add32
whitaker3
crack
fe_4elt2
memplus
bcsstk30
Figure 3: Quality and omputation time of our multilevel methods ompared with the benhmark spetral
algorithm. The left panel plots the ratio of the normalized ut value of every algorithm to that of mlkkm(20)
for 64 lusters. Note that bars above the baseline orrespond to the ases where mlkkm(20) performs better.
The right panel ompares omputation times for generating 64 lusters. As an example, on the ADD32
graph, the spetral algorithm is 58 times slower than mlkkm(20) and 60 times slower than mlkkm(0).
Movies
Harry Potter and the Sorerer's Stone (2001)
Harry Potter and the Chamber of Serets (2002)
Harry Potter and the Prisoner of Azkaban (2004)
Harry Potter and the Goblet of Fire (2005)
Harry Potter: Behind the Magi (2001 TV)
Harry Potter und die Kammer des Shrekens:
Das grobe RTL Speial zum Film (2002 TV)
Ators
Daniel Radlie, Rupert Grint, Emma Watson, Tom Felton
Peter Best, Sean Biggersta, Sott Fern, Alfred Enoh, Joshua Herdman
Harry Melling, Matthew Lewis, Devon Murray, Robert Pattinson
James Phelps, Oliver Phelps, Edward Randell, Jamie Waylett
Shefali Chowdhury, Katie Leung, Bonnie Wright, Stanislav Ianevski
Jamie Yeates, Chris Rankin
Table 5: A Seletion of Movies and Ators in Cluster 633
5
5. CONCLUSIONS
5
x 10
x 10
1.61
1.633
1.634
1.62
1.635
1.63
1.636
1.64
1.637
1.638
1.65
1.639
1.66
1.64
1.61
1.62
1.63
1.64
nz = 7606432
1.65
1.66
5
x 10
1.633 1.634 1.635 1.636 1.637 1.638 1.639
nz = 7606432
1.64
5
x 10
Figure 4: Graphial Plots of Cluster 633 of IMDB
(Harry Potter). The right plot shows a loser view
of the luster irled in the left plot.
puting 5000 eigenvetors of a graph of this size would be
extremely slow but also beause storing 5000 eigenvetors
in main memory requires 24 GB, whih is prohibitive. Thus
our multilevel algorithm is also onsiderably more eÆient
in terms of memory usage in addition to omputation time.
Sine we annot run spetral methods on this data set, we
ompare our multilevel algorithm (mlkkm(0)) to METIS and
weighted kernel k-means with random initialization (kkm).
It takes our multilevel algorithm approximately twenty-ve
minutes to luster this graph into 5000 lusters, and the
luster sizes range from 13 to 7616 (for the normalized ut
objetive). Table 4 shows the normalized ut values and
ratio assoiation values for 5000 lusters generated using
these three methods. We see that our multilevel algorithm
is markedly superior to kkm: its NCV is twie as high as
the NCV of mlkkm(0) and the RAV of kkm is less than
one tenth of the RAV of mlkkm(0). Comparing METIS and
mlkkm(0), the latter is superior and ahieves a RAV that is
50% higher and a NCV that is 10% lower.
To demonstrate the quality of results, we disuss a sampling of the produed lusters. After lustering, we rearrange the rows and olumns suh that rows (olumns) in
the same luster are adjaent to one another. Cluster 633,
irled in the right plot of Figure 4, is of size 121 and
mainly ontains \Harry Potter" movies and the ators in
these movies. The left olumn of Table 5 lists movies in the
luster, where we see 3 Harry Potter movies that were released in the past and 1 to be released in November, 2005.
There are also 2 other doumentary TV programs. The
right olumn of Table 5 lists a seletion of some of the ators in the luster, where we see the major ast members of
the Harry Potter movies, suh as Daniel Radlie, Rupert
Grint, Emma Watson, et.
Cluster 3537 ontains the short series \Festival numero"
(No. 1-12), shot in year 1965; luster 4400 ontains the Korean movie \Chunhyang" and 19 of its ast members who
ated only in this movie in the database. Other small lusters follow this pattern, i.e., most of them are about one
movie or movie series and ontain ast members that ated
only in this movie or movie series. Popular ators, diretors or well-known movie festivals are assoiated with more
people, so generally they belong to muh larger lusters.
We have presented a new multilevel algorithm for graph
lustering. This algorithm has key benets over existing
graph lustering algorithms. Unlike previous multilevel algorithms suh as METIS, our algorithm is general and aptures a number of graph lustering objetives. This frees
us from the restrition of equal-size lusters. It has advantages over spetral methods in that the multilevel approah
is onsiderably faster and gives better nal objetive funtion values.
Using a number of benhmark test graphs, we demonstrated that, in general, our algorithm produes better ratio assoiation values and normalized ut values when using
spetral initialization at the oarsest level. As the oarsest
graph is signiantly smaller than the input graph, spetral
methods are feasible in this ase, and the running time of
the algorithm is still muh faster than spetral lustering on
the input graph. Spetral methods have attrated intense
study over the last several years as a powerful method for
graph lustering; our results indiate that this may not be
the most powerful tehnique for partitioning graphs. Furthermore, given an input pairwise similarity matrix between
data vetors, our approah an be viewed as a fast multilevel
algorithm for optimizing the kernel k -means objetive.
Our approah also onsumes far less memory than spetral
methods. Spetral lustering of a 1.2 million node movie
network into 5000 lusters would require several gigabytes
of storage. On the other hand, we were able to luster this
network in approximately twenty-ve minutes without any
signiant memory overhead.
Aknowledgments. This researh was supported by
NSF grant CCF-0431257, NSF Career Award ACI-0093404,
and NSF-ITR award IIS-0325116.
6. REFERENCES
[1℄ T. Bui and C. Jones. A heuristi for reduing ll-in in
sparse matrix fatorization. In 6th SIAM Conferene on
Parallel Proessing for Sienti Computing, pages
445{452, 1993.
[2℄ P. Chan, M. Shlag, and J. Zien. Spetral k -way ratio ut
partitioning. IEEE Trans. CAD-Integrated Ciruits and
Systems, 13:1088{1096, 1994.
[3℄ I. Dhillon, Y. Guan, and B. Kulis. Kernel k -means,
spetral lustering and normalized uts. In Pro. 10th
ACM KDD Conferene, 2004.
[4℄ I. Dhillon, Y. Guan, and B. Kulis. A unied view of kernel
k -means, spetral lustering and graph uts. Tehnial
Report TR-04-25, University of Texas at Austin, 2004.
[5℄ I. S. Dhillon, Y. Guan, and J. Kogan. Iterative lustering
of high dimensional text data augmented by loal searh.
In Proeedings of The 2002 IEEE International Conferene
on Data Mining, pages 131{138, 2002.
[6℄ W. E. Donath and A. J. Homan. Lower bounds for the
partitioning of graphs. IBM J. Res. Development,
17:422{425, 1973.
[7℄ B. Hendrikson and R. Leland. A multilevel algorithm for
partitioning graphs. Tehnial Report SAND93-1301,
Sandia National Laboratories, 1993.
[8℄ G. Karypis and V. Kumar. A fast and high quality
multilevel sheme for partitioning irregular graphs. SIAM
J. Si. Comput., 20(1):359{392, 1999.
[9℄ J. Shi and J. Malik. Normalized uts and image
segmentation. IEEE Trans. Pattern Analysis and Mahine
Intelligene, 22(8):888{905, August 2000.
[10℄ S. X. Yu and J. Shi. Multilass spetral lustering. In
International Conferene on Computer Vision, 2003.
1/--страниц
Пожаловаться на содержимое документа