Virtualized Software Development and the Multicore Revolution
Full system simulation, or virtualization, provides a binary-compatible
instance of the target hardware, but one which operates completely
within a virtualized environment running on standard laptop or desktop
PCs. Virtual platforms breaks a project's dependence on physical
hardware for effective software design, implementation and test,
fostering truly concurrent hardware and software development activities
in ways that change the nature of the overall product development
process. With the multicore revolution currently taking place virtual
platforms are expected to become increasingly important for software
developers as the complexity of the software grows as a result of
concurrency and timing issues related to multicore systems.
Since the early 1980s, embedded software developers have been taking
advantage of the regular increases in processor performance. Every year
or two new processors would ship that would increase the performance of
the system without disturbing the software structure. A design using a
500 MHz processor one year could rely on the appearance of a 1GHz
processor the next year. This performance increase was often factored
into application and system design: the software designers planned and
implemented application sets that were too heavy for the currently
available processors, counting on a new faster processor to come out on
the market and solve their performance problems before the product got
to market. Such a strategy was and is necessary in order to stay
competitive. This comfortable state was driven by the steady progress of
the semiconductor industry that kept packing more transistors into
smaller packages at higher clock frequencies. Processor designers used
this continuous increase of resources to create processors that ran
single programs faster and faster. This increase in speed was achieved
by increasing the clock frequency, along with architectural innovations
like more efficient processing pipelines and improved branch prediction,
and by adding resources like on-chip caches and memory controllers.
Overall, the net result was a steady increase in performance for
single-threaded programs, achieved by ever more complex processor designs.
However, in 2004 it became clear that the progress in single-processor
performance was starting to slow considerably. Partially due to
semiconductor transistor leakage, voltage could no longer decrease
enough from process generation to process generation to allow the
frequency to be increased without increasing the power to levels that
cannot be dissipated. Clock frequencies will thus only increase slowly
in the future. Instead, the semiconductor industry is turning to
parallelism to increase performance.
Using multiple processor cores on the same chip, the performance per
chip can increase dramatically even if the performance per core is only
improving slowly. However, this causes a big problem. The increase in
"performance" is now measured by aggregate throughput rather than
processing speed on a single thread. The peak performance numbers assume
that an application can keep all the processor cores busy. This is
straightforward for large server-side applications such as web servers
and databases. They are handling hundreds of concurrent requests and are
naturally parallel, easily taking advantage of multiple processors. The
same is not true either for desktop applications or for most embedded
applications. In practice, applications that cannot take advantage of
the parallel machine often run slower than on a single core chip.
The reliance on parallelism in the hardware to improve overall
performance creates a problem for software developers. Applications that
have traditionally used single processors will now have to be
parallelized over multiple processors in order to take advantage of the
multiple cores and so continue to increase their performance. This is a
huge change in the embedded software landscape, and one that will put
great pressure on software designers to wring good performance out of
the new architectures.
Every high-performance processor family and instruction set is moving to
multicore designs. Freescale has announced the PowerPC 8641D, with two
G4-class cores on one chip. IBM has the PowerPC 970MP, with two G5
cores. Intel is rolling out multicore chips across its product line, and
AMD has dual-core server and desktop processors. PMC Sierra is selling
dual-core RM9200 MIPS64 processors. Cavium has announced up to 16-way
parallel MIPS64-based processors. ARM has begun selling their ARM11
MPCore, with four cores sold as a single package. The multicore
revolution is here!