Intel® Threading Building Blocks Вычислительный граф Кирилл Рогожин 1 Содержание Вычислительный граф Intel® Threading Building Blocks (Intel TBB) Несколько примеров Ещё примеры Анализ производительности Заключение Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 2 Закон Мура “The number of transistors on a chip will double approximately every two years.” [Gordon Moore] Moore's Law graph, 1965 9/23/2014 Source: http://en.wikipedia.org/wiki/Moore's_law Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 3 Куда «уходят» все эти транзисторы? Аппаратный параллелизм Параллелизм по ядрам и потокам Параллелизм по данным (SIMD) Параллелизм по инструкциям 9/23/2014 Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 4 Куда «уходят» все эти транзисторы? Аппаратный параллелизм Программные модели Параллелизм по потокам Параллельные циклы, OpenMP, TBB Параллелизм по данным Оптимизации компилятора, «интрисики» и «прагмы» Параллелизм по инструкциям Оптимизации компилятора А достаточно ли этого? 9/23/2014 Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 5 Программные модели Три уровня • Обработка событий • Передача сообщений • «Реактивный» • parallel_for • Параллелизм по данным • Модель задач, Fork-join • • Параллелизм по данным вектора, SIMD Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. ? Intel® TBB Intel® Cilk™ Plus, OpenMP Intel® Cilk™ Plus, OpenMP, #pragma simd 6 Вычислительный граф Последовательная программа x = A(); y = B(x); z = C(x); D(y, z); Параллельные циклы Параллельные графы и циклы Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Вычислительный граф Иерархическая параллельная структура – Задействовать все доступные формы параллелизма Foo() Bar1() 9 Foo() Bar2() 16 Bar1() Bar1() 25 Bar2() Bar2() = 50 единиц Ускорение = 50/30 = 1.67X Bar1() Bar1() Foo() Bar2() Bar2() Ускорение = 50/21 = 2.38X Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 8 Вычислительный граф Похоже на последовательное программирование Нужно определить ветвления и зависимости в алгоритме a b c c d e b g f d a f g e Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 9 Содержание Вычислительный граф Intel® Threading Building Blocks (Intel TBB) Несколько примеров Ещё примеры Анализ производительности Заключение Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10 Intel® Threading Building Blocks (TBB) Generic Parallel Algorithms Efficient scalable way to exploit the power of multicore without having to start from scratch Task scheduler The engine that empowers parallel algorithms that employs task-stealing to maximize concurrency Miscellaneous Thread-safe timers Threads OS API wrappers Concurrent Containers Concurrent access, and a scalable alternative to containers that are externally locked for threadsafety TBB flow graph Thread Local Storage Scalable implementation of thread-local data that supports infinite number of TLS Synchronization Primitives User-level and OS wrappers for mutual exclusion, ranging from atomic operations to several flavors of mutexes and condition variables Memory Allocation Per-thread scalable memory manager and false-sharing free allocators Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 11 TBB Flow Graph Graph object Вы определяете зависимости между узлами Библиотека находит скрытый параллелизм Graph node TBB организует вычисления Применение Потоки видео, изображений, финансовых данных Edge Реакция на события GUI Определение зависимостей в логике вычислений Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 12 Hello World Создаём узлы и рёбра, запускаем на исполнение и ждём результата tbb::flow::graph g; tbb::flow::continue_node< tbb::flow::continue_msg > h( g, []( const continue_msg & ) { std::cout << “Hello “; } ); tbb::flow::continue_node< tbb::flow::continue_msg > w( g, []( const continue_msg & ) { std::cout << “World\n“; } ); tbb::flow::make_edge( h, w ); h.try_put(continue_msg()); g.wait_for_all(); f() f() h w Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 13 Содержание Вычислительный граф Intel® Threading Building Blocks (Intel TBB) Несколько примеров Ещё примеры Анализ производительности Заключение Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 14 Граф зависимостей A B E C int main() { graph g; broadcast_node<continue_msg> start; continue_node<continue_msg> a( g, body("A") continue_node<continue_msg> b( g, body("B") continue_node<continue_msg> c( g, body("C") continue_node<continue_msg> d( g, body("D") continue_node<continue_msg> e( g, body("E") make_edge( make_edge( make_edge( make_edge( make_edge( make_edge( start, a ); start, b ); a, c ); b, c ); c, d ); a, e ); for (int i = 0; i < 3; ++i ) { start.try_put(continue_msg()); g.wait_for_all(); } return 0; D ); ); ); ); ); struct body { std::string my_name; body( const char *name ) : my_name(name) {} void operator()(continue_msg) const{ printf("%s\n", my_name.c_str()); } }; } Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 15 Поток данных Sum = X2 + X3 f(x) squarer struct square { int operator()(int v) { return v*v; } }; struct cube { int operator()(int v) { return v*v*v; } }; summer f(x) f(x) cuber class sum { int &my_sum; public: sum( int &s ) : my_sum(s) {} int operator()( std::tuple<int, int> v ) { my_sum += std::get<0>(v) + std::get<1>(v); return my_sum; } }; Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Поток данных int main() { int result = 0; graph g; broadcast_node<int> input; function_node<int, int> squarer( g, unlimited, square() ); function_node<int, int> cuber( g, unlimited, cube() ); join_node< tuple<int, int>, queueing > j( g ); function_node<std::tuple<int, int>, int> summer( g, serial, sum(result) ); make_edge( make_edge( make_edge( make_edge( make_edge( input, squarer ); input, cuber ); squarer, std::get<0>( j.inputs() ) ); cuber, std::get<1>( j.inputs() ) ); j, summer ); for (int i = 1; i <= 10; ++i) input.try_put(i); g.wait_for_all(); printf("Final result is %d\n", result); return 0; f(x) squarer f(x) cuber } Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. summer f(x) Поток данных result = 0 1 f(x) squarer summer f(x) f(x) cuber Max concurrency = 1 Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Поток данных result = 0 1 f(x) 2 squarer summer f(x) f(x) cuber 1 Max concurrency = 3 Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Поток данных 2 result = 0 1 f(x) squarer summer f(x) f(x) cuber 1 2 Max concurrency = 5 Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Поток данных 2 result = 0 1 f(x) 3 squarer summer f(x) f(x) cuber 1 2 Max concurrency = 5 Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Поток данных result = 0 1 f(x) 3 squarer 4 summer f(x) f(x) cuber 1 2 Max concurrency = 4 Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Поток данных 3 result = 0 1 f(x) squarer 4 summer f(x) f(x) cuber 1 2 3 Max concurrency = 6 Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Поток данных 3 result = 0 1 f(x) 4 squarer f(x) summer f(x) 1 cuber 2 3 Max concurrency = 5 Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Поток данных 3 result = 0 1 f(x) squarer f(x) summer f(x) 4 1 cuber 2 3 Max concurrency = 6 Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Поток данных 3 result = 0 f(x) summer squarer f(x) 1 f(x) 8 4 1 cuber 3 Max concurrency = 4 Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Поток данных result = 0 f(x) summer squarer f(x) 9 1 f(x) 27 8 4 1 cuber Max concurrency = 2 Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Поток данных result = 5 f(x) summer squarer f(x) f(x) 9 27 1 8 cuber Max concurrency = 2 Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Поток данных result = 14 f(x) squarer f(x) summer f(x) 9 27 cuber Max concurrency = 1 Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Поток данных result = 50 f(x) squarer summer f(x) f(x) cuber Max concurrency = 1 Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Программные модели Три уровня • Обработка событий • Передача сообщений • «Реактивный» Intel® TBB Flow Graph • parallel_for • Параллелизм по данным • Модель задач, Fork-join • • Параллелизм по данным вектора, SIMD Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel® TBB Intel® Cilk™ Plus, OpenMP Intel® Cilk™ Plus, OpenMP, #pragma simd 31 Содержание Вычислительный граф Intel® Threading Building Blocks (Intel TBB) Несколько примеров Ещё примеры Анализ производительности Заключение Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 32 Обработка видео – удаление шума Видеопоток с шумом f() f(x) Input Видеопоток без шума (для теста) f(x) 3210 Denoise Frame seq Save in file Видеопоток без шума f() f(x) f(x) Input Generate noise Display Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 33 Обработка сетевых пакетов – VoIP телефон f(x) Голосовой трафик f(x) Буфер Воспроизведение f(x) f() Пакет из сокета f(x) f(x) Сортировка пакетов Новый вызов SIP INVITE f(x) f(x) f(x) SIP BYE Сигнальный трафик f(x) Текущий разговор f(x) … Обновление состояния звонка SIP OK f(x) SIP RINGING Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 34 Обработка событий GUI f() Event type f(x) f(x) 2 finger zoom Redraw screen f(x) f(x) Swipe to left Show list of items f(x) f(x) Swipe to right Show recent items f(x) f(x) Tap on top Show menu f(x) … Open item … Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 35 Фронт волны f() f() f() f() f() f() f() f() f() f() f() f() f() f() f() f() 36 Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Резюме Вычислительный граф • Задействуйте все уровни параллелизма • Описывайте логику «естественно» • Библиотека сама занимается потоками Проверьте ваше приложение ещё раз – может быть третий уровень стоит добавить? http://www.threadingbuildingblocks.org Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Разложение Холецкого (a) Data flow based implementation (b) Dependence based implementation Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 38 Содержание Вычислительный граф Intel® Threading Building Blocks (Intel TBB) Несколько примеров Ещё примеры Анализ производительности Заключение Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 39 Профилировка производительности Почему масштабируемость 3.5X на 16 ядрах? Обычный профилировщик не поможет TBB scheduler Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Flow Graph Designer Рисуйте граф и генерируйте С++ код Анализируйте топологию и поведение Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 41 Построение графа Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 42 Сохранение графа GraphML – открытый стандарт C/C++ код для TBB Flow Graph API Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 43 Отражение топологии Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 44 Отражение топологии Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 45 Содержание Вычислительный граф Intel® Threading Building Blocks (Intel TBB) Несколько примеров Ещё примеры Анализ производительности Заключение Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 46 Резюме Вычислительный граф • Задействуйте все уровни параллелизма • Описывайте логику «естественно» • Библиотека сама занимается потоками Проверьте ваше приложение ещё раз – может быть третий уровень стоит добавить? http://www.threadingbuildingblocks.org Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Дополнительные ресурсы The Open-Source Community Site: www.threadingbuildingblocks.org Flow graph blogs: https://software.intel.com/en-us/tags/17218 FGD product page: http://software.intel.com/en-us/articles/flow-graph-designer Habrahabr http://habrahabr.ru/company/intel/blog/228433/ http://habrahabr.ru/company/intel/blog/157735/ 9/23/2014 Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 48 Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 49 Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright © , Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Xeon Phi, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 50 Backup Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 52 Модель Акторов Актор (Carl Hewitt 1973) State Thread Methods Mailbox *Actors: Karmani and Agha, Open Systems Lab, UIUC A model of concurrent computation for developing parallel, distributed and mobile systems Extend the advantages of objects to concurrent programming by separating control (where and when) from the logic of computation Systems using Actors can expose large amounts of concurrency to system implementation Dynamic systems exhibit unstructured parallelism Graph can extend itself by adding new nodes and edges at anytime Actor model can help build robust systems and extract all of the parallelism present in a system Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 53 source_node Functional f() buffer_node continue_node f() queue_node function_node multifunction_node f(x) priority_queue_node f(x) sequencer_node Buffering 3210 queueing join reserving join tag matching join split_node or_node* Split / Join broadcast_node write_once_node overwrite_node limiter_node Other TBB Flow Graph allows you to build dependency and data flow graphs! Intel Confidential Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
© Copyright 2022 DropDoc