close

Вход

Забыли?

вход по аккаунту

код для вставкиСкачать
Chameleon
Automatic Selection of Collections
Ohad Shacham
Tel Aviv University
Martin Vechev
Eran Yahav
IBM T.J. Watson Research Center
Collections




Abstract data types
Many implementations
Different space/time tradeoffs
Incompatible selection might lead to


runtime degradation
Space bloat – wasted space
Set
HashSet
LinkedSet
ArraySet
LazySet
Map
HashMap
LinkedMap
ArrayMap
LazyMap
LinkedList
ArrayList
LazyList
List
Collection Bloat

Collection bloat is a non justified space overhead for
storing data in collections
List s = new ArrayList();
s.add(1);
1
Bloat for s is 9
Collection Bloat

Collection-bloat is a serious problem in practice


Hard to detect and fix



Observed to occupy 90% of the heap in real-world
applications
Accumulation: death by a thousand cuts
Correction: Need to correlate bloat to program code
How to pick the right implementation?


Minimize bloat
But without degrading running time
Our Vision

Programmer declares the ADT to be used
 Set s = new Set();

Programmer defines what metric to optimize
 e.g. space-time

Runtime automatically selects implementation based on metric
 Online: detect application usage of Set
 Online: select appropriate implementation of Set
Set
HashSet
ArraySet
LinkedSet
…
This Work

Programmer defines the implementation to be used


Programmer defines what metric to optimize



Set s = new HashSet();
space-time product
Space = Bloat
Runtime suggests implementation based on metric



Online: automatically detect application usage of HashSet()
Online: automatically suggest alternative to HashSet()
Offline: programmer modifies program accordingly

e.g. Set s = new ArraySet();
How Can We Calculate Bloat ?

Data structure Bloat


Occupied Data – Used Data
Example:
List s = new ArrayList();
s.add(1);
1
Bloat for s is 9
How to Detect Collection Bloat?

Each collection maintains a field for used data

Language runtime can find out actually occupied data


Bloat = Occupied Data – Used Data
Solution: Garbage Collector Computes Bloat Online


Reads used data fields from collections
Low-overhead: can work online in production
Semantic Maps

How Collections Communicate Information to GC


Includes size and pointers to actual data fields
Allows for trivial support of Custom Collections
ArrayList
…
int size
…
Object[] Array
…
…
ArrayList
Semantic map
HashMap
Semantic map
Used Data
Occupied Data
Used Data
Occupied Data
GC
HashMap
…
elementCount
…
elementData
…
Example: Collections Bloat in TVLA
80
70
70
Live Collections
Live Collections
Used Collections
% Live
Live Data
Data
%
60
50
40
30
20
10
0
51 76
11 26
26 51
76 101
101 126
126 151
151 176
176 201
201 226
226 251
251 276
276 301
301 326
326 351
351 376
376 401
401 426
426 451
451
GC
Example: Collections Bloat in TVLA
80
Live Collections
70
Used Collections
% Live Data
60
50
40
30
20
10
0
1
26 51 76 101 126 151 176 201 226 251 276 301 326 351 376 401 426 451
Example: Collections Bloat in TVLA
80
Live Collections
Used Collections
Core Collections
70
% Live Data
60
50
40
Lower bound for
bloat
30
20
10
0
1
26 51 76 101 126 151 176 201 226 251 276 301 326 351 376 401 426 451
Fixing Bloat

Must correlate all bloat stats to program
point

Need Trace Information

Remember: do not want to degrade time
Correlating Code and Bloat


Aggregate bloat potential per allocation context
Done by the garbage collector
public final class ConcreteKAryPredicate extends ConcretePredicate {
…
public void modify() {
…
values = HashMapFactory.make(this.values);
}
…
}
public class GenericBlur extends Blur {
…
public void blur(TVS structure) {
…
Map invCanonicName =HashMapFactory.make(structure.nodes().size());
…
}
}
public class HashMapFactory {
public static Map make(int size) {
return new HashMap(size);
}
}
Ctx4 7%
Ctx1 40%
Ctx2 11%
Ctx5 5%
Ctx6 3%
Ctx8 3%
Ctx7 7%
Ctx3 5%
Trace Information

Track Collection Usage in Library:



Distribution of operations
Distribution of size
Aggregated per allocation context
ctx1
Size = 7
Get = 3
Add = 9
….
ctx2
Size = 1
Contains = 100
Insert = 1
….
ctx3
Size = 103
Contains = 10041
Insert = 140
Remove = 20
…
ctxi
….
….
But how to choose the new Collection ?

Rule Engine: user defined rules



Input: Heap and Trace Statistics per-context
Output: Suggested Collection for that context
Rules based on trace and heap information


HashMap: #contains < X  CollmaxSize < Y → ArrayMap
HashMap: #contains < X  CollmaxSize < Y+10  %liveHeap > Z → ArrayMap
Rule Engine
Hashmap: maxSize < X → ArrayMap
LinkedList: NoListOp → ArrayList
Hashmap: (#contains < X  CollmaxSize < Y+10  %liveHeap > Z ) → ArrayMap
…
Overall Picture
Potential report
Recommendations
Semantic Profiler
Rule Engine
Program
ctx1
Size = 7
Get = 3
Add = 9
….
ctx2
Size = 1
Contains = 100
Insert = 1
….
Semantic maps
Hashmap: maxSize < X → ArrayMap
LinkedList: NoListOp → ArrayList
Hashmap: (#contains < X  CollmaxSize < Y+10 
%liveHeap > Z ) → ArrayMap
…
…
…
Rules
Correct Collection Bloat – Typical Usage

Step 1: Profile for Bloat without Context




Step 2: Combine heap information with trace information per context




Can switch automatically to step 2 from step 1
Higher-overhead than step 1
Automatic: prior to Chameleon - a manual step (very hard)
Step 3: Suggest fixes to user based on rules


Low-overhead, can run in production
If problem detected, go to step 2
Automatic
Automatic
Step 4: Programmer applies suggested fixes

Manual
Chameleon on TVLA
1: HashMap:tvla...HashMapFactory:31
;tvla.core.base.BaseTVS:50
replace with ArrayMap
Potential
% Live Data
12
over
used
10
8
…
6
4: ArrayList:BaseHashTVSSet:112;
tvla...base.BaseHashTVSSet:60
set initial capacity
4
2
0
1
2
3
4
Size
Operations
Context
Max
Avg
Stddev
15
11.33
1.36
26
6.31
5.05
7
4.8
1.17
7
4.8
1.17
Implementation

Built on top of IBM’s JVM

Modifications to Parallel Mark and Sweep GC

Modular changes, readily applicable to other GCs

Modifications to collection libraries

Runtime overhead


Detection Phase: Negligible
Correction Phase: ~2x (due to cost of getting context)

Can Use PCC by Bond & McKinley
Experimental Results – Memory
100
90
80
Minimal Heap (%)
70
60
50
40
30
20
10
0
TVLA
FindBugs
PMD
Bloat
Fop
Soot
Experimental Results – Time
100
90
80
Runtime (%)
70
60
50
40
30
20
10
0
TVLA
FindBugs
PMD
Bloat
Fop
Soot
Related Work

Large volume of work on SETL




Automatic data structure selection in SETL [Schonberg et. al., POPL'79]
SETL representation sublanguage [Dewar et. al, TOPLAS'79]
…
Bloat

The Causes of Bloat, The Limits of Health [ Mitchell and Sevitsky,
OOPSLA’07]
Summary

Collection selection is a real problem



Chameleon integrates trace and heap information for
choosing a collection implementation


based on predefined rules
Using Chameleon, reduced the footprint of several
applications


Runtime penalty
Bloat
Never degrading running time, often improving it
First step towards automatic collection selection as part of
the runtime system
1/--страниц
Пожаловаться на содержимое документа