|
Feature Story
from Superconductors and Cryoelectronics:
| The most recent article on
this topic, "NSA
Proposes $400 Million Superconducting Computer Project,"
is available in the October 2, 2006 issue of Superconductor Week.
|
|
Get the full issue!
$24.00 (shipping included) |
 |
Superconductors & Cryoelectronics in the Petaflops-Scale Computer Project
The Hybrid Technology
Multi-Threaded Architecture project led by the Jet Propulsion Laboratory has set
its sights on a supercomputer that would be two hundred and fifty times faster
than today's fastest supercomputer. RSFQ superconducting circuits and other
cryoelectronic components form the core of the computer's design.
| Of all the existing and
proposed applications for superconductors our industry has considered
through the decades, none seems more unlikely, because of its perceived
complexity and expense, than a superconducting (SC) computer. SC digital
technology intended for computers has been successfully designed and
demonstrated, starting in the late 1960's at IBM (1969-1983), and later
at MITI (1981-1990), but in each case the 1GHz clock frequencies
achieved did not justify further development. However, a new SC logic
family and the US government's call for a petaflops-scale (petaflops =
1015 floating point operations per second) supercomputer have created a
new superconducting program which may drive further commercialization of
LTS electronics. Since program managers have recently suggested that a
petaflops system is possible as soon as the year 2004, the success of
the program may now depend more on funding than it does on additional
technological advances. |

Computer rendered image of petaflops
computing facility proposed by the HTMT program. |
THE HTMT PROGRAM
The vehicle for US development of petaflops computing is the Hybrid Technology
Multi-threaded (HTMT) program, sponsored by the Defense Advanced Research
Projects Agency (DARPA), NASA, and the National Security Agency (NSA).
Participants include Argonne National Lab, California Institute of Technology,
Jet Propulsion Laboratory, Notre Dame University, Princeton University,
University of Delaware, University of Rochester, Bookham Technology, HYPRES,
Tera Computer, and TRW. The $7 million two-year HTMT study has reached its end,
completing phase 2 of the overall HTMT program, and has resulted in a
preliminary study and block level design of a petaflops computing system.
The program seeks to combine innovative
technologies, including superconductor logic, an optical switching network,
holographic optical storage, and processor in memory (PIM) semiconductor
architecture-all of which provide improvements in speed, power consumption, and
parts count-in an advanced multi-threaded model that will make optimal use of
the new devices. The proposed system features multiple levels of distributed
memory, including holographic, semiconductor SRAM based on PIM, DRAM, and
cryogenic memory (CRAM). The design also calls for core RSFQ (rapid single flux
quantum) superconducting processors.
SUPERCOMPUTING
The government's motivation for petaflops computing, as opposed to the
teraflops-scale systems now available based on semiconductor technology, is a
wide range of scientific, engineering, and national and international
security-related applications that are believed to require, or greatly benefit
from, this level of speed. Some of these are nuclear stockpile stewardship
(explosion simulation), fluid dynamics modeling for aerospace system
development, chemical reaction simulation for new drug design, climate modeling
for longer-term weather forecasts, and global economic modeling. The President's
Information Technology Advisory Committee (PITAC) named high speed computing as
one of four key information technologies, and specifically recommended
initiatives that would enable petaflops/petaops sustained performance on real
applications by 2010.
Two approaches to achieving petaflops in PITAC's
recommended time frame have been proposed:
- The semiconducting approach would employ massive parallelism, with
50,000 to 100,000 processors clocked at 2 to 3.5GHz. This proposal assumes
semiconductor processors will be able to reach these frequencies in 6-7
years as predicted by industry roadmaps, although such improvements are by
no means guaranteed.
- The superconductive approach takes advantage of higher speed processors,
with 4,096 processors clocked at 50-100GHz. Achieving clock speeds in this
range with superconducting processors could be achieved as early as 2004,
assuming RSFQ circuit fabrication technology can be developed in a
relatively straightforward manner.
The superconductive approach has advantages in
power (2.6 MW compared to 10-15MW) and size (20x30x30 feet compared to an entire
building). The biggest problem associated with building petaflops with
semiconductor-based parallelism is not size or power consumption, but latency
resulting from increasing interconnect distance between processors as the
systems get larger and larger. Because of their low power consumption and the
ability to use fewer circuits, RSFQ ICs can be packaged in a much denser
configuration.
COOL-0 HTMT DESIGN
As part of the HTMT design phase, Professor Konstantin Likharev's group at the
State University of New York (SUNY) at Stony Brook completed the initial design
(called COOL-0) of the RSFQ-based processing elements and processing systems.
The design assumes 0.8µm line width Nb-trilayer fabrication technology would be
available to provide a clock rate somewhere between 60 and 120 GHz. Each
processing module, according to the COOL-0 design, would contain seven 2x2
centimeter chips:
- two double-cluster multi-stream unit (MSU) chips, each with 2.8 million
Josephson junctions and 24mW of power dissipation,
- one chip housing 6 floating point functional units, processor-memory
interface, and an intraprocessor network (PNET), for a total of 1.6 million
junctions and 16mW of power dissipation, and
- four cryogenic random access memory (CRAM) chips, each with 10 million
junctions and 4mW of power dissipation.
| These processing modules
would all be flip-chip mounted on a 20x20cm2 cryogenic multichip module
(CMCM). Each multichip module would house 8 of these processing modules.
Five hundred and twelve CMCMs (for a total of 4096 processors) would be
vertically mounted on an octagonal cylinder. Each CMCM would dissipate
around 1mW per cm2, which is easily removed with a 1mm/s flow of helium
through 1.3cm gaps between the CMCMs. On the other hand, the 16 thousand
copper wires leading from each CMCM (for a total of almost 8.2 million)
to the room temperature interface generate three times this heat load.
The total power load for the processing portion of the petaflops design
is about 1kW at 4K (250W for the processor, and approximately 750W for
the interconnects leading to room temperature). This will require wall
power of about 0.3MW using existing helium recondensors with 300W/W
efficiency (20% Carnot). This wall power is on a par with a present-day
sub-teraflops supercomputer. The overall size of the RSFQ octagonal
module will be about 0.5m3, which is a small fraction of the processing
section of present day supercomputers. |

Compact packaging concept for HTMT
superconductor processors achieves low processor-to-processor latency.
Each multi-chip module (MCM) carries about 50 chips. The 512 MCMs are
interconnected by 160 octagonal cryogenic printed circuit boards (PCBs).
The I/O cables connect to room temperature electronics. The total power
dissipation at 4 kelvin is 1 kW. |
The lion's share of the overall power
consumption, as small as it is, is generated by copper interconnects leading
from the cryostat to room temperature. These may be greatly reduced by using HTS
interconnects between 4K and 77K stages. Researchers have conducted a
preliminary analysis of the potential reduction in power an HTS-based interface
would provide. They found that the reduction would be at least an order of
magnitude, bringing the 4-77K interface losses down to below 100W. Other
possibilities for reducing losses in this area include the use of an optical
interface, which, given current technology, would be feasible for input, but not
for output of data.
RSFQ
Despite the complexity and revolutionary nature of many of the
technologies intended for implementation in HTMT, many of which may be
far harder to develop than SC RSFQ in the long run, the key enabling
development for a future petaflops computer is RSFQ logic. The
limitations of superconducting projects in the past were due to the use
of "latching" circuitry based on unshunted Josephson tunnel junctions.
The conceptual development, in 1985-1986, of RSFQ using shunted
junctions led to its experimental demonstration from 1986. Since then
many industrial, government, and academic groups worldwide have adopted
the approach, which has led to its rapid development. US groups include
SUNY Stony Brook, University of Rochester, UC Berkeley, HYPRES, Inc.,
Westinghouse (division sold to Northrop Grumman), Conductus, and TRW. A
Japanese team, including NEC, Electrotechnical Laboratory (ETL,
Tsukuba), and some Japanese universities recently initiated a 5-year
program for LTS RSFQ. Some groups in Europe have also supported programs
in HTS RSFQ. |

This 1cm x 1cm chip, demonstrated to
digitize 20 GHz signals, is representative of the current
state-of-the-art. Designed, fabricated, and demonstrated by HYPRES, it
contains two A/D circuits with on-chip memory. Each circuit has about
3,500 junctions for a chip total of about 7,000 junctions. |
The work of Konstantin K. Likharev and colleagues
at SUNY Stony Brook recently included the demonstration of several circuits with
1-2 thousand Josephson junctions each, made at HYPRES using 3.0µm fabrication
technology. Stony Brook completed the first laboratory demonstration of RSFQ
with a circuit made at HYPRES, an oversampling flux-counting AD converter.
HYPRES later delivered the first system prototype to an Air Force laboratory, a
digital rf memory system. A number of other similar LTS RSFQ circuits and
systems have been demonstrated worldwide, as well as a few simpler HTS RSFQ
circuits. In both LTS and HTS, circuit complexity is now limited by fabrication
technology rather than design.
Complex RSFQ circuits made with HYPRES' 3.0µm
foundry have demonstrated clock speeds from 20-40GHz. The company is in the
process of upgrading to 1.5µm technology. Likharev estimates that 0.8µm
technology would allow circuit speeds of 100GHz in complex circuits, while
moving to deep sub-micron fabrication would potentially raise the bar as high as
200GHz. A demonstration at Stony Brook confirmed these scaling predictions via a
digital frequency divider made with a deep sub-micron fabrication system
developed at the university. The device operated up to 770GHz.
Current technology used for RSFQ is niobium trilayer (Nb/Al/AlOx/Nb) which must
operate in the 4-5 kelvin range. However, RSFQ has also been developed for NbN
junctions at TRW, raising the operating temperature to 10K. Likharev believes
that, while HTS RSFQ would require at least 10 years of development even with
very generous financing, it could theoretically offer even faster circuit speeds
than LTS at far higher operating temperatures.
FABRICATION ISSUES
Existing fab lines are located at HYPRES, Northrop Grumman, and TRW in the U.S.,
NEC in Japan, and PTB in Germany. Likharev and other experts have estimated that
it would require an investment on the order of $30 to 50 million to establish a
pilot line providing RSFQ technology with 0.8 micron Josephson junctions and
high integration scale (say 1 million junctions per cm2 for logic chips and 3
million junctions per cm2 for memory). A pilot line on this scale would be
required in phase 4 of the HTMT project. A pilot line for building a petaflops-scale
RSFQ system could cost as much as $100 million.
The technical outlook for building such a pilot line is very good. According to
Likharev, LTS integrated circuit manufacture is effectively a subset of
mainstream semiconductor technology, only it is simpler, involves fewer steps,
and avoids certain expensive procedures such as ion implantation. Lynn Abelson,
Manager of the superconductor IC foundry at TRW, says moving from the company's
current 1.25 micron manufacturing technology to 0.8 micron and high integration
scale, "is a natural evolution of the technology we have today. The fact that,
at 0.8 microns, we'll be several generations behind semiconductor fabrication
technology gives us unique advantages. Much of the equipment needed to put the
technology in place is already available, even on the used market." TRW's
roadmap for decreasing junction size is summarized elsewhere in this issue.
Elie Track, President of HYPRES, Inc., agrees:
"Absolutely. A 0.8 micron foundry can be built. The issues are strictly
engineering and can be solved with the right people and funding. The money
required to build an RSFQ foundry is reasonable in comparison to silicon
fabrication facilities, possibly an order of magnitude less for a foundry
providing equivalent feature sizes."
As discussed earlier, the largest scale RSFQ
chips built so far only contain a few thousand junctions, compared to the
millions required for LTS computer applications. There is still some uncertainty
as to what degree LTS engineers will be able to draw upon processes from
semiconductor fabrication to achieve the required integration scale for an
affordable sum. Of course this "affordable sum" is minuscule compared to the
billions spent on new semiconductor fabrication facilities. The situation is
probably comparable to the early days of semiconductor manufacture-the only way
to find out is to try, and the only way to try is to spend the money. Many of
those involved in HTMT say there is no reason to believe anything but money
stands in the way of fully functional highly integrated RSFQ circuits.
SCALABILITY
The targeted 0.8 micron junction size of HTMT is very conservative compared
to semiconductor feature sizes and, therefore, LTS promises additional
scalability into the deep sub-micron region. For this reason, Marc Feldman,
Professor at the University of Rochester, New York, has been charged with making
a preliminary assessment of the functionality of integrated circuits using 0.25
micron junctions. This includes outlining differences in the physics of 0.8
micron junctions versus 0.25 micron junctions and determining what research
ought to be done before scaling to smaller sizes. "We're conducting a literature
search and using computer modeling to try to understand what the available
theories predict for 0.25 micron devices, and how comfortable we are that those
theories are the right ones. People have made flip-flops with 0.25µm junctions,
but we need to know whether they're feasible in large integrated circuits,"
Feldman said.
"In my opinion, the real power of semiconductor
integrated circuitry is its scalability," Feldman added. "Twenty years ago
people had 2 micron feature size, whereas now they're in the deep sub-micron
region. If superconductor junctions had to stop at 0.8, it probably wouldn't be
worth doing. We want to know if we have a good shot at scaling to 0.25." One
promising aspect to scaling Josephson junctions to 0.25 microns is that the
shunting requirement is no longer necessary at smaller sizes since the internal
mechanisms of the device then provide the shunts. This will offer an additional
boost in integration scale if manufacturing gets to this level.
WHAT's NEXT?
Phase 3 of HTMT has been given the go-ahead and will run until October 2000.
This phase will continue with conceptual design, modeling, and hardware
demonstrations as in phase 2, but on a deeper level. Whereas phase 2 modeled
physical parameters and addressed the larger architecture and functionality of
the petaflops machine, phase 3 will consist of thorough quantitative modeling of
all computer subsystems. For the RSFQ component, the Stony Brook team will model
everything from the fluxes and currents in the millions of Josephson junctions
in a processor to the processor itself, and on up to an entire CMCM. "We're
going to have very tightly interacting processors, so we need to model the CMCM
very carefully. The network that connects all the processors has already been
modeled quite accurately; we're pretty sure that part of the system is ok,"
Likharev said. Phase 3 would include an "isomorphic simulator" to test each
facet of the HTMT system and to confirm task scheduling and data migration
throughout.
Phase 4 would consist of building a prototype to
confirm the operation of all the new advanced technologies implemented in the
system. This would allow researchers to confirm the design under real workloads
and get an idea of the real sustained performance possible, before committing to
the final architecture. Phase 5, as Likharev says, "would be the building of the
whole monster." Let's hope it doesn't take over the world.
FUTURE OF RSFQ AND SC COMPUTING BEYOND
PETAFLOPS PROJECT
While it is not credible to propose that superconducting circuits will take over
a large fraction of the total computing market, most readers of this magazine
would like to see superconducting computers, or other RSFQ devices, take over
some portion of the world's electronics markets. However, many people believe an
HTMT, or similar project, is necessary for significant commercial success due to
the fact that the opportunities in LTS electronics are not big enough to warrant
significant investment in manufacturing infrastructure. There is little doubt
that if the petaflops project is completed, its legacy to the superconductor
industry will be a manufacturing pilot line for RSFQ which could help solve the
capitalization problem. "I don't see anything else on the horizon in the
marketplace that would justify the investment needed for manufacturing
technology," Likharev stated.
| While refrigeration
requirements make it almost certain that SC RSFQ will never compete with
room temperature CMOS technology in the commodity market, with some
initial investment in manufacturing infrastructure, SC RSFQ could
generate a market for applications that are well beyond CMOS performance
capabilities. Additionally, it could succeed in high end computing,
perhaps on a personal teraflops scale, where the size and power
requirements of CMOS are prohibitive for widespread commercial use. |

A cross section of the physical layout of
the HTMT system showing the cryostat contained RSFQ superconductor
electronics (liquid helium-cooled to 4 degrees K), PIM-SRAM, data vortex
optical interconnect network, PIM-DRAM, and the HRAM. |
Commercial LTS electronics companies, as well as
researchers, believe near term opportunities exist for LTS RSFQ in several
areas, including: A to D converters, D to A converters, digital SQUIDs, digital
autocorrelators, pseudo-random signal generators, applications requiring a high
level of radiation hardness, and many other possibilities. More refined
fabrication technology, in the near micron or sub-micron range, could
potentially open up markets in ultrafast digital switching, complex digital
signal processing (DSP), as well as high end computing.
Most of the above applications, as with petaflops
computing, rely on superconducting technology's position as the only available,
or at least reasonable, option for providing certain solutions. However,
Likharev has proposed using RSFQ for mainstream high-end computing as another
potential market: "If the fabrication technology were in place, I believe high
performance RSFQ computers could be successfully commercialized," Likharev said.
"The first step in the process would be to market a commercial desktop-sized 1
teraflops machine. This is about the size we expect to build for the phase 4
prototype. While the system would not cost less than a CMOS teraflops, it would
fit on your desk.
"My estimates are that, in the span of seven
years, the semiconductor guys can put about 20 gigaflops on your desk at a cost
of $20,000. In seven years, we think we can put 1 teraflops on the desk at a
cost of about $100,000. This is five times the cost for fifty times the
performance. The downside is that the high performance workstation market is
better defined as a $20,000 computer market," Likharev added. "But I believe we
could still grab a portion of that market with a ten times better price to
performance ratio." The worldwide market for high performance workstations is
currently about 20 billion dollars, these are now operating below 1 gigaflop.
Given that refrigeration for a desktop teraflops
sold in volume would cost $5,000 in today's money, it would not be economically
feasible to build cheaper systems, Likharev pointed out. A closed cycle 4K
system for such an application costs around $30,000 dollars today, while at
higher volume, say 10,000 per year, the price would drop to the $5,000 per unit
level. (By the way, the recondenser system for the petaflops computer is
expected to top $1 million).
NEAR TERM RSFQ APPLICATIONS
If commercial superconducting computers are the most optimistic potential
application for RSFQ, there are many other lower profile, but probably more
feasible, applications that companies like TRW and HYPRES could develop with the
help of a new generation of fabrication capability. TRW, which develops LTS
technology for a range of military and space applications, would certainly
benefit from HTMT. "HTMT would provide TRW with opportunities develop
superconductive digital technology beyond its current level to support our core
business," says Abelson.
As it is, the US government supports LTS digital
technology development on a subsistence level, with the strongest level of
support coming from DoD sources such as the Office of Naval Research. Elie Track
believes the HTMT program, with its ambitious goals, can form a unique basis to
rally enough support to finance the rapid growth of the industry. In fact, as a
company solely devoted to the commercialization of superconductive digital
technology, HYPRES may be able to realize many of the applications for which the
company was founded as a result of HTMT.
"HTMT will advance the technology and enable many
corollary applications of high frequency instrumentation," Track explains.
"These include devices for military applications in digital radar and other
electronic warfare systems. Some of these devices inevitably will be needed for
the communications market," the biggest potential market for RSFQ, in Track's
opinion. A potentially large market is in software defined radio for all
wireless communications, including cellular and satellite. The basic premise of
this technology is to make as much of a communications system digital as
possible by digitizing right after the antenna. This nearly eliminates losses
and incompatibilities from different protocols to provide seamless wireless
communication and data transmission on a global scale." Communications offers a
rapidly growing market which currently appeals to investors, unlike
instrumentation, which, "isn't viewed as a hot item," according to Track.
One of the problems in selling digital LTS
electronics to investors and would-be customers is the lack of any real product
or full-performance demonstration prototype. "We have nothing sold on the
commercial market today that creates confidence in the technology," Track said.
"We have technology demonstrations, but not full-performance systems. A fully
operational prototype in which the cryogenics is transparent to the user would
trigger a lot more interest and investment." HYPRES does market its Primary
Voltage Standard Chips and Systems, but has yet to introduce a complete turnkey
system that can be operated by a non-expert user and where the cryogenics is, as
Track says, "transparent." This application, however, requires investment in
engineering, not in a new foundry.
"A 0.8 micron foundry will allow us to build
digitizing oscilloscopes, which are required instruments everywhere devices are
developed. We could also produce 10GHz A to D chips, which would be a factor of
ten above what semiconductor technology can produce. If people start using and
trusting LTS digitizers based on LTS ADCs to test their chips, they might start
using these same LTS ADCs in their wireless base stations," Track said.
But still, even in an age of remarkable advances
in cryocooler technology, Track said that the customers' perception of long-term
cryocooler reliability is the biggest hurdle to commercial deployment. Some
industry insiders have also said cryocooler reliability is hampering the
commercialization efforts of manufacturers of HTS receiver subystems for
cellular and PCS.
A NEW FRONTIER?
Following completion of phase 3, we will see if the arguments put forth in the
HTMT design phases are convincing enough to capture funding for the building of
a prototype LTS supercomputer. If so, the real excitement will begin and new
frontier may open up to superconducting electronics. It's even possible that LTS
RSFQ may make its way on to the desktops of a good portion of the over 750,000
organizations who buy high end computers each year. In addition, many other
proposed applications of superconducting digital electronics will benefit from
the advances made in the program. The technical feasibility of the project is
considered very high by program managers so far. If further modeling and
hardware studies support these early assessments, then we will just have to wait
and see if the funding deities agree.
| The most recent article on
this topic, "NSA
Proposes $400 Million Superconducting Computer Project,"
is available in the October 2, 2006 issue of Superconductor Week.
|
|
Get the full issue!
$24.00 (shipping included) |
 |
Or, check out Superconductor Week's special
first-time subscriber discount.
For more information on computing
with superconductors, click here. |