Streaming Task Parallelism
Senior Research Scientist, INRIA
Stream computing is often associated with regular, data-intensive applications, and more specifically with the family of cyclo-static data-flow models. The term also refers to bulk-synchronous data parallelism on SIMD architectures. Both interpretations are valid but incomplete: streams underline the formal definition of Kahn process networks, a foundation for deterministic concurrent languages and systems with a solid heritage. Streaming task parallelism is a semantical framework for parallel languages and a model for task-parallel execution with first-class dependences. Parallel languages with dynamic, nested task creation and first-class streams expose more parallelism and enable application-specific throttle control. These expressiveness and resource management capabilities address a key limitation of previous data-flow programming models. To support the class of streaming task parallel languages, we propose a new lock-free algorithm for stalling and waking-up tasks in a shared memory, user-space scheduler according to changes in the state of streaming queues. The algorithm generalizes both work-stealing with concurrent ring buffers, and proven correct against the C11 memory model. We show through experiments that it can serve as a common runtime for efficient parallel runtime systems. We also report on scalability-oriented extensions of a streaming task parallel runtime, with multiple optimizations leveraging the explicit data flow conveyed by the programming model. We will report on recent experiments on large scale NUMA systems with up to 24 nodes.
Albert Cohen is a senior research scientist at INRIA and a part-time associate professor at École Polytechnique. He graduated from École Normale Supérieure de Lyon, and received his PhD from the University of Versailles in 1999 (awarded two national prizes). He has been a visiting scholar at the University of Illinois and an invited professor at Philips Research, both for six months. Albert Cohen works on parallelizing and optimizing compilers, parallel programming, and synchronous programming for embedded systems. He will be or has been the general or program chair of major conferences, including PLDI, PPoPP, HiPEAC, CC, and DAC embedded systems track. He coauthored more than 130 peer-reviewed papers, and has been the advisor for 22 PhD theses. Several research projects initiated by Albert Cohen resulted in effective transfer to production compilers. In particular, Albert Cohen pioneered the transfer of polyhedral compilation technology into industrial products, including GCC and later LLVM.
Datacenter efficiency -- What's next?
Professor, Rutgers University and Chief Efficiency Strategist, Microsoft Research
Over the last 10+ years, large datacenters have benefited from computer technology and physical infrastructure advances that substantially improved their efficiency. However, there is still much room for improvement, as the datacenters' computational resources are often poorly utilized and technology advances are starting to falter. In this talk, I will discuss some interesting avenues for continued efficiency improvement driven by smarter software, including greater use of parallelism and predictive workload scheduling.
Dr. Ricardo Bianchini received his PhD degree in Computer Science from the University of Rochester. He is a Professor of Computer Science at Rutgers University, but is currently on leave working as Microsoft's Chief Efficiency Strategist. His main interests include cloud computing, and power/energy/thermal management of datacenters. In fact, Dr. Bianchini is a pioneer in datacenter energy management, energy-aware storage systems, energy-aware load distribution across datacenters, and leveraging renewable energy in datacenters. He has published eight award papers, and has received the CAREER award from the National Science Foundation. He is currently an ACM Distinguished Scientist and an IEEE Fellow.
Automatically Scalable Computation
Herchel Smith Professor of Computer Science, Harvard College Professor
As our computational infrastructure races gracefully forward into increasingly parallel multi-core and clustered systems, our ability to easily produce software that can successfully exploit such systems continues to stumble. For years, we've fantasized about the world in which we'd write simple, sequential programs, add magic sauce, and suddenly have scalable, parallel executions. We're not there. We're not even close. I'll present a radical, potentially crazy approach to automatic scalability, combining learning, prediction, and speculation To date, we've achieved surprisingly good speedup in limited domains, but the potential is tantalizingly enormous.
Margo Seltzer is a Herchel Smith Professor of Computer Science in the Harvard School of Engineering and Applied Sciences. Her research interests include architecture, provenance, file systems, databases, transaction processing systems, and applying technology to problems in healthcare. She is the author of several widely-used software packages including database and transaction libraries and the 4.4BSD log-structured file system. Dr. Seltzer was a founder and CTO of Sleepycat Software, the makers of Berkeley DB, and is now an Architect at Oracle Corporation. She is currently the President of the USENIX Association and a member of the Computing Research Association Board. She is a Sloan Foundation Fellow in Computer Science, an ACM Fellow, a Bunting Fellow, and was the recipient of the 1996 Radcliffe Junior Faculty Fellowship.
She is recognized as an outstanding teacher and mentor, having received the Phi Beta Kappa teaching award in 1996, the Abrahmson Teaching Award in 1999, and the Capers and Marion McDonald Award for Excellence in Mentoring and Advising in 2010.
Dr. Seltzer received an A.B. degree in Applied Mathematics from Harvard/Radcliffe College in 1983 and a Ph. D. in Computer Science from the University of California, Berkeley, in 1992.