Manycore and Multicore Workshop (UNC-CH Computer Science)

Invited Speakers

The list of confirmed invited speakers include:

Jack Dongarra, University of Tennessee: An Overview of High Performance Computing and Challenges for the Future
David B. Kirk, NVIDIA: NVIDIA CUDA Software and GPU Parallel Computing Architecture
Charles E. Leiserson, MIT: Multithreaded Programming in Cilk
John Manferdelli, Microsoft: Supercomputing and Mass Market Desktops
Chuck Moore, AMD: The Role of Accelerated Computing in the Multi-core Era
David Patterson, UC-Berkeley: The Parallel Computing Landscape: A Berkeley View 2.0
Stephen S Pawlowski, Intel Corporation: Supercomputing for the Masses
Daniel A. Reed, UNC-Chapel Hill: Multicore: Let’s Not Focus on the Present
Vivek Sarkar, Rice University: Portable Parallel Programming for Heterogeneous Multicore Computing

Panel on "Manycore/Multicore's Programmability Gap"

Speaker Abstracts

Jack Dongarra, University of Tennessee (biography)
An Overview of High Performance Computing and Challenges for the Future

Talk Slides (pdf format)

In this talk we examine how high performance computing has changed over the last 10-years and look toward the future in terms of trends. These changes have had and will continue to have a major impact on our software. A new generation of software libraries and algorithms are needed for the effective and reliable use of (wide area) dynamic, distributed and parallel environments. Some of the software and algorithm challenges have already been encountered, such as management of communication and memory hierarchies through a combination of compile--time and run--time techniques, but the increased scale of computation, depth of memory hierarchies, range of latencies, and increased run--time environment variability will make these problems much harder. We will focus on the redesign of software to fit multicore architectures.

David B. Kirk, NVIDIA (biography)
NVIDIA CUDA Software and GPU Parallel Computing Architecture

Talk Slides (pdf format)

In the past, graphics processors were special purpose hardwired application accelerators, suitable only for conventional rasterization-style graphics applications. Modern GPUs are now fully programmable, massively parallel floating point processors. This talk will describe NVIDIA's massively multithreaded computing architecture and CUDA software for GPU computing. The architecture is a scalable, highly parallel architecture that delivers high throughput for data-intensive processing. Although not truly general-purpose processors, GPUs can now be used for a wide variety of compute-intensive applications beyond graphics.

Charles E. Leiserson, MIT (biography)
Multithreaded Programming in Cilk

Talk Slides (pdf format)

John Manferdelli, Microsoft (biography)
Supercomputing and Mass Market Desktops

Talk Slides (pdf format)

The advent of heterogeneous manycore processors on mass market desktops will propel traditional supercomputing techniques, including parallel programming, onto everyone.s desktops. Traditional .supercomputing. workloads (dense and sparse linear algebra, optimization, statistical inference, image and signal analysis, simulation, compression, machine learning, semantic processing) are needed for games, entertainment, analytics and for richer user interfaces on client machines. These capabilities serve scientists, government service providers, business analysts, educators and home users. In contrast with rather more tolerant traditional supercomputing customers, these users have come to expect that that their software experiences will improve at the pace of hardware improvement customary in the PC market. Meeting these expectations will require a change in the execution environment, programming tools and management of desktop machines and much of the requisite technology has been developed in the supercomputing community. New experiences that become possible with 100-fold computing power improvements on the desktop will emerge as the relatively low expectations that commercial programmers have for processing power on the desktop change. I will discuss some of the architectural and programming model changes that are underway to support this change and some applications.

Chuck Moore, AMD (biography)
The Role of Accelerated Computing in the Multi-core Era

Talk Slides (pdf format)

David Patterson, UC-Berkeley (biography)
The Parallel Computing Landscape: A Berkeley View 2.0

Talk Slides (pdf format)

Last December we published a broad survey of the issues for the whole field concerning the multicore/manycore sea change. (See view.eecs.berkeley.edu.) This talk covers the specific research agenda that a group of us at Berkeley are going to follow.

To take a fresh approach to this longstanding problem, our research agenda will be driven by compelling applications developed by domain experts. Historically, efforts to resolve parallel computing's challenges have often been driven "bottom-up" from the hardware, with applications an afterthought. We will focus on exciting new applications that need much more computing horsepower to run well rather than on legacy programs that run well on today's computers. Our applications are in the areas of personal health, image retrieval, speech understanding, music, and browsers.

The development of parallel software is the heart of the research agenda. The task will be divided into two layers: an efficiency layer that aims at low overhead for 10 percent of the best programmers, and a productivity layer for the rest of the programming community that reuses the parallel software developed at the efficiency layer. Key to this approach is a layer of libraries and programming frameworks centered around the 13 computational bottlenecks that we identified in the Berkeley View report. We will also create a Composition and Coordination Language to make it easier to compose these components. Finally, we will rely on autotuning to map the software efficiently to a particular parallel computer. Past attempts have usually relied on a single programming abstraction and language for everyone and on parallelizing compilers.

The role of the operating systems and the architecture is to support software and applications in achieving these goals, rather than the conventional approach of fixing the environment to which parallel software must comply. Examples include thin hypervisors and libraries for the operating system and hardware support for partitioning and fast barrier synchronization.

Stephen S Pawlowski, Intel Corporation (biography)
Supercomputing for the Masses

Talk Slides (pdf format)

Daniel A. Reed, UNC-Chapel Hill (biography)
Multicore: Let’s Not Focus on the Present

Talk Slides (pdf format)

Let’s step back from our current analysis of GPUs and multicore processors and their deployment in high-performance computing systems and large data centers and think about the longer term future. Where is the technology going and what are the research implications? What architectures are appropriate for 100-way or 1000-way multicore designs? How do we develop and support software? What is the ecosystem of components in which they will operate? How do we optimize performance, power and reliability?

Vivek Sarkar, Rice University (biography)
Portable Parallel Programming for Heterogeneous Multicore Computing

Talk Slides (pdf format)

The computer industry is at a major inflection point in its hardware roadmap due to the end of a decades-long trend of exponentially increasing clock frequencies. It is widely agreed that spatial parallelism in the form of multiple power-efficient cores must be exploited to compensate for this lack of frequency scaling. Unlike previous generations of hardware evolution, this shift towards multicore and manycore computing will have a profound impact on software. In this talk, we will focus on the programming problem for tightly coupled heterogeneous processors, which have the potential to deliver order-of-magnitude improvements in performance and power efficiency relative to homogeneous multicore processors but also pose the greatest challenges for software enablement. We summarize the current state of the art in programming models for current heterogeneous multicore processors such as Cell, GPGPUs, ClearSpeed, stream processors, and FPGAs, and then discuss a potential approach for developing a productive and portable programming model and runtime for future generations of multicore heterogeneous processors. This approach is being pursued in the new Habanero research project at Rice University, and builds on the X10 project which was developed as part of the IBM PERCS project in the DARPA High Productivity COmputing Systems (HPCS) program.

Panel on "Manycore/Multicore's Programmability Gap"

Today's languages are very inefficient for programming hybrid multicore systems: a partitioned software development cycle is required to program each of the system's component processors. For example, programming GPU-CPU systems requires both C/C++ and graphics shader language expertise -- two different programming styles and idioms to accomplish a distributed programming task. The Cell Broadband Engine is another example, where separate programs are written for the Cell's primary and symbiotic processor elements.

Progress is being made to make multicore systems "programmable by mere mortals". SDKs, language extensions, new programming paradigms and new programming languages are all avenues to solutions that are more efficient. NVIDIA's CUDA SDK removes the need for graphics shader language expertise. Gedae's multicore language and compiler system is a good example of an entirely new language-based approach
to multicore software development. Even hybrid language solutions, such as RapidMind's software platform, introduce more efficient concept-to-implementation cycle times.

Despite all of this interim progress being made, this purpose of this panel is to answer the following outstanding questions:

(a) How are today's multicore programming solutions evolving?

(b) What language extensions are needed to bridge the gap between today's unicore, unithread languages and tomorrow's multicore, multithread needs? Is creeping incrementalism really an evolutionary path forward?

(c) Are we approaching a programming language evolutionary inflection point -- we've been on the flat part of the "S" curve for a while, so are we approaching the sharp upward bend?