Вычислительный граф - Intel® Software Conference 2014

Intel® Threading Building Blocks
Вычислительный граф
Кирилл Рогожин
1
Содержание
Вычислительный граф
Intel® Threading Building Blocks (Intel TBB)
Несколько примеров
Ещё примеры
Анализ производительности
Заключение
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
2
Закон Мура
“The number of transistors
on a chip will double
approximately every
two years.”
[Gordon Moore]
Moore's Law graph, 1965
9/23/2014
Source: http://en.wikipedia.org/wiki/Moore's_law
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
3
Куда «уходят» все эти транзисторы?
Аппаратный параллелизм
Параллелизм
по ядрам и потокам
Параллелизм
по данным (SIMD)
Параллелизм
по инструкциям
9/23/2014
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
4
Куда «уходят» все эти транзисторы?
Аппаратный параллелизм
Программные модели
Параллелизм
по потокам
Параллельные циклы,
OpenMP, TBB
Параллелизм
по данным
Оптимизации компилятора,
«интрисики» и «прагмы»
Параллелизм
по инструкциям
Оптимизации компилятора
А достаточно ли этого?
9/23/2014
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
5
Программные модели
Три уровня
•
Обработка событий
•
Передача сообщений
•
«Реактивный»
• parallel_for
•
Параллелизм по данным
•
Модель задач, Fork-join
•
•
Параллелизм по данным
вектора, SIMD
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
?
Intel® TBB
Intel® Cilk™ Plus,
OpenMP
Intel® Cilk™ Plus,
OpenMP,
#pragma simd
6
Вычислительный граф
Последовательная программа
x = A();
y = B(x);
z = C(x);
D(y, z);
Параллельные циклы
Параллельные графы и циклы
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Вычислительный граф
Иерархическая параллельная структура
– Задействовать все доступные формы параллелизма
Foo()
Bar1()
9
Foo()
Bar2()
16
Bar1()
Bar1()
25
Bar2()
Bar2()
= 50 единиц
Ускорение = 50/30 = 1.67X
Bar1()
Bar1()
Foo()
Bar2()
Bar2()
Ускорение = 50/21 = 2.38X
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
8
Вычислительный граф
Похоже на последовательное программирование
Нужно определить ветвления и зависимости в алгоритме
a
b
c
c
d
e
b
g
f
d
a
f
g
e
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
9
Содержание
Вычислительный граф
Intel® Threading Building Blocks (Intel TBB)
Несколько примеров
Ещё примеры
Анализ производительности
Заключение
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
10
Intel® Threading Building Blocks (TBB)
Generic Parallel Algorithms
Efficient scalable way to exploit the power of multicore without having to start from scratch
Task scheduler
The engine that empowers parallel
algorithms that employs task-stealing to maximize
concurrency
Miscellaneous
Thread-safe timers
Threads
OS API wrappers
Concurrent Containers
Concurrent access, and a scalable alternative to
containers that are externally locked for threadsafety
TBB flow graph
Thread Local Storage
Scalable implementation of thread-local data that supports infinite
number of TLS
Synchronization Primitives
User-level and OS wrappers for mutual exclusion, ranging from atomic
operations to several flavors of mutexes and condition variables
Memory Allocation
Per-thread scalable memory manager and false-sharing free allocators
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
11
TBB Flow Graph
Graph object
Вы определяете зависимости между узлами
 Библиотека находит скрытый параллелизм
Graph node
 TBB организует вычисления
Применение
 Потоки видео, изображений, финансовых данных
Edge
 Реакция на события GUI
 Определение зависимостей в логике вычислений
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
12
Hello World
Создаём узлы и рёбра, запускаем на исполнение и ждём результата
tbb::flow::graph g;
tbb::flow::continue_node< tbb::flow::continue_msg >
h( g, []( const continue_msg & ) { std::cout << “Hello “; } );
tbb::flow::continue_node< tbb::flow::continue_msg >
w( g, []( const continue_msg & ) { std::cout << “World\n“; } );
tbb::flow::make_edge( h, w );
h.try_put(continue_msg());
g.wait_for_all();
f()
f()
h
w
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
13
Содержание
Вычислительный граф
Intel® Threading Building Blocks (Intel TBB)
Несколько примеров
Ещё примеры
Анализ производительности
Заключение
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
14
Граф зависимостей
A
B
E
C
int main() {
graph g;
broadcast_node<continue_msg> start;
continue_node<continue_msg> a( g, body("A")
continue_node<continue_msg> b( g, body("B")
continue_node<continue_msg> c( g, body("C")
continue_node<continue_msg> d( g, body("D")
continue_node<continue_msg> e( g, body("E")
make_edge(
make_edge(
make_edge(
make_edge(
make_edge(
make_edge(
start, a );
start, b );
a, c );
b, c );
c, d );
a, e );
for (int i = 0; i < 3; ++i ) {
start.try_put(continue_msg());
g.wait_for_all();
}
return 0;
D
);
);
);
);
);
struct body {
std::string my_name;
body( const char *name ) :
my_name(name) {}
void operator()(continue_msg) const{
printf("%s\n", my_name.c_str());
}
};
}
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
15
Поток данных
Sum = X2 + X3
f(x)
squarer
struct square {
int operator()(int v) {
return v*v;
}
};
struct cube {
int operator()(int v) {
return v*v*v;
}
};
summer
f(x)
f(x)
cuber
class sum {
int &my_sum;
public:
sum( int &s ) : my_sum(s) {}
int operator()( std::tuple<int, int> v ) {
my_sum += std::get<0>(v) + std::get<1>(v);
return my_sum;
}
};
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Поток данных
int main() {
int result = 0;
graph g;
broadcast_node<int> input;
function_node<int, int> squarer( g, unlimited, square() );
function_node<int, int> cuber( g, unlimited, cube() );
join_node< tuple<int, int>, queueing > j( g );
function_node<std::tuple<int, int>, int>
summer( g, serial, sum(result) );
make_edge(
make_edge(
make_edge(
make_edge(
make_edge(
input, squarer );
input, cuber );
squarer, std::get<0>( j.inputs() ) );
cuber, std::get<1>( j.inputs() ) );
j, summer );
for (int i = 1; i <= 10; ++i)
input.try_put(i);
g.wait_for_all();
printf("Final result is %d\n", result);
return 0;
f(x)
squarer
f(x)
cuber
}
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
summer
f(x)
Поток данных
result = 0
1
f(x)
squarer
summer
f(x)
f(x)
cuber
Max concurrency = 1
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Поток данных
result = 0
1
f(x)
2
squarer
summer
f(x)
f(x)
cuber
1
Max concurrency = 3
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Поток данных
2
result = 0
1
f(x)
squarer
summer
f(x)
f(x)
cuber
1
2
Max concurrency = 5
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Поток данных
2
result = 0
1
f(x)
3
squarer
summer
f(x)
f(x)
cuber
1
2
Max concurrency = 5
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Поток данных
result = 0
1
f(x)
3
squarer
4
summer
f(x)
f(x)
cuber
1
2
Max concurrency = 4
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Поток данных
3
result = 0
1
f(x)
squarer
4
summer
f(x)
f(x)
cuber
1
2
3
Max concurrency = 6
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Поток данных
3
result = 0
1
f(x)
4
squarer
f(x)
summer
f(x)
1
cuber
2
3
Max concurrency = 5
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Поток данных
3
result = 0
1
f(x)
squarer
f(x)
summer
f(x)
4
1
cuber
2
3
Max concurrency = 6
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Поток данных
3
result = 0
f(x)
summer
squarer
f(x)
1
f(x)
8
4
1
cuber
3
Max concurrency = 4
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Поток данных
result = 0
f(x)
summer
squarer
f(x)
9 1
f(x)
27
8
4
1
cuber
Max concurrency = 2
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Поток данных
result = 5
f(x)
summer
squarer
f(x)
f(x)
9
27
1
8
cuber
Max concurrency = 2
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Поток данных
result = 14
f(x)
squarer
f(x)
summer
f(x)
9
27
cuber
Max concurrency = 1
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Поток данных
result = 50
f(x)
squarer
summer
f(x)
f(x)
cuber
Max concurrency = 1
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Программные модели
Три уровня
•
Обработка событий
•
Передача сообщений
•
«Реактивный»
Intel® TBB Flow Graph
• parallel_for
•
Параллелизм по данным
•
Модель задач, Fork-join
•
•
Параллелизм по данным
вектора, SIMD
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel® TBB
Intel® Cilk™ Plus,
OpenMP
Intel® Cilk™ Plus,
OpenMP,
#pragma simd
31
Содержание
Вычислительный граф
Intel® Threading Building Blocks (Intel TBB)
Несколько примеров
Ещё примеры
Анализ производительности
Заключение
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
32
Обработка видео – удаление шума
Видеопоток
с шумом
f()
f(x)
Input
Видеопоток
без шума
(для теста)
f(x)
3210
Denoise
Frame seq
Save in file
Видеопоток
без шума
f()
f(x)
f(x)
Input
Generate noise
Display
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
33
Обработка сетевых пакетов – VoIP телефон
f(x)
Голосовой
трафик
f(x)
Буфер
Воспроизведение
f(x)
f()
Пакет
из сокета
f(x)
f(x)
Сортировка
пакетов
Новый вызов
SIP INVITE
f(x)
f(x)
f(x)
SIP BYE
Сигнальный
трафик
f(x)
Текущий
разговор
f(x)
…
Обновление
состояния
звонка
SIP OK
f(x)
SIP RINGING
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
34
Обработка событий GUI
f()
Event type
f(x)
f(x)
2 finger zoom
Redraw screen
f(x)
f(x)
Swipe to left
Show list of
items
f(x)
f(x)
Swipe to right
Show recent
items
f(x)
f(x)
Tap on top
Show menu
f(x)
…
Open item
…
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
35
Фронт волны
f()
f()
f()
f()
f()
f()
f()
f()
f()
f()
f()
f()
f()
f()
f()
f()
36
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Резюме
Вычислительный граф
• Задействуйте все уровни параллелизма
• Описывайте логику «естественно»
• Библиотека сама занимается потоками
Проверьте ваше приложение ещё раз – может быть
третий уровень стоит добавить?
http://www.threadingbuildingblocks.org
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Разложение Холецкого
(a) Data flow based implementation
(b) Dependence based implementation
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
38
Содержание
Вычислительный граф
Intel® Threading Building Blocks (Intel TBB)
Несколько примеров
Ещё примеры
Анализ производительности
Заключение
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
39
Профилировка производительности
Почему масштабируемость 3.5X на 16 ядрах?
Обычный профилировщик не поможет
TBB scheduler
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Flow Graph Designer
Рисуйте граф и генерируйте С++ код
Анализируйте топологию
и поведение
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
41
Построение графа
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
42
Сохранение графа
GraphML – открытый стандарт
C/C++ код для TBB Flow Graph API
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
43
Отражение топологии
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
44
Отражение топологии
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
45
Содержание
Вычислительный граф
Intel® Threading Building Blocks (Intel TBB)
Несколько примеров
Ещё примеры
Анализ производительности
Заключение
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
46
Резюме
Вычислительный граф
• Задействуйте все уровни параллелизма
• Описывайте логику «естественно»
• Библиотека сама занимается потоками
Проверьте ваше приложение ещё раз – может быть
третий уровень стоит добавить?
http://www.threadingbuildingblocks.org
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Дополнительные ресурсы
The Open-Source Community Site: www.threadingbuildingblocks.org
Flow graph blogs: https://software.intel.com/en-us/tags/17218
FGD product page: http://software.intel.com/en-us/articles/flow-graph-designer
Habrahabr
http://habrahabr.ru/company/intel/blog/228433/
http://habrahabr.ru/company/intel/blog/157735/
9/23/2014
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
48
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
49
Legal Disclaimer & Optimization Notice
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO
ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND
INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR
WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT,
COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software,
operations and functions. Any change to any of those factors may cause the results to vary. You should consult other
information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of
that product when combined with other products.
Copyright © , Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Xeon Phi, Core, VTune, and Cilk are trademarks of
Intel Corporation in the U.S. and other countries.
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not
unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other
optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on
microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use
with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel
microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the
specific instruction sets covered by this notice.
Notice revision #20110804
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
50
Backup
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
52
Модель Акторов
Актор (Carl Hewitt 1973)
State
Thread
Methods
Mailbox
*Actors: Karmani and Agha, Open Systems Lab, UIUC
 A model of concurrent computation for developing parallel, distributed and mobile systems
 Extend the advantages of objects to concurrent programming by separating control (where
and when) from the logic of computation
Systems using Actors can expose large amounts of concurrency to system
implementation
Dynamic systems exhibit unstructured parallelism
 Graph can extend itself by adding new nodes and edges at anytime
Actor model can help build robust systems and extract all of the parallelism
present in a system
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
53
source_node
Functional
f()
buffer_node
continue_node
f()
queue_node
function_node
multifunction_node
f(x)
priority_queue_node
f(x)
sequencer_node
Buffering
3210
queueing join
reserving join
tag matching join
split_node
or_node*
Split / Join
broadcast_node
write_once_node
overwrite_node
limiter_node
Other
TBB Flow Graph allows you to build dependency and data flow graphs!
Intel Confidential
Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.