The Sun E10000 Starfire. (from www.top500.org)

Machine type RISC-based distributed-memory multi-processor
Models E10000 Starfire
Operating system Solaris (Sun's Unix flavour)
Connection structure Crossbar (see remarks)
Compilers Fortran 77, Fortran 90, HPF, C, C++
Vendors information Web page http://www.sun.com/servers/10000/spec.html
Year of introduction 1997.

 

System parameters:

Model E10000
Clock cycle 2.5 ns
Theor. peak performance
Per proc. (64-bit) 800 Mflop/s
Maximal (64-bit) 51.2 Gflop/s
Main memory <= 64 GB
Memory bandwidth
No. of processors 16--64

Remarks:

The Starfire E10000 is the largest of a series of Ex000 servers, where x can be 3, 4, 5, 6, 10. We only discuss this largest model as Sun has clearly positioned this machine themselves as a system for large-scale high-performance computing. The basic processor is a 2.5 ns cycle UltraSPARC processor with a Theoretical Peak Performance of 800 Mflop/s. Up to 64 processors are connected by a 64×64 crossbar, the largest crossbar employed commercially. This crossbar, called the Gigaplane-XB, also makes it different from the lower-end models from the Ex000 series as these systems use a bus interconnect between processors. The system is built up from system boards each containing up to 4 processors, 2 level-2 caches (<= 4 MB) and 4 memory banks that plug into the Gigaplane crossbar which thus acts as a backplane. The caches are kept coherent by a "snoopy bus" protocol, i.e., each cache is aware of the (in)validation of data by continuous monitoring the data on the backplane and updating their copies accordingly.

The Gigaplane crossbar connects to the processors with separate data and address lines which recognises the fact that most data transfers are essentially point-to-point transfers while addresses often have to be broadcasted to many or all processors. The effective aggregate bandwidth for data is 102.4 GB/s with a point-to-point speed of 1.6 GB/s (theoretical peak).

The Starfire is a typical SMP machine with provisions for shared-memory parallelism in the Fortran and C(++) compilers by directives in the source code. A the time of writing it is not yet clear whether Sun will join the OpenMP consortium for standardising the shared-memory programming model. Of course it is possible to cluster E10000s as has for instance been done with E6000 servers and use such a cluster in a DM-MIMD way (see results for E6000 in [5]).

Measured Performances: In [5] a speed of 26.45 Gflop/s is reported for a 64 processor machine in solving an order 19968 linear system. The efficiency for this problem is 83%. Also results for a 4-way cluster with a clock cycle of 3 ns are reported. This 256 processor system reached a speed of 123.9 Gflop/s in solving a linear system of order 80640. This amounts to an efficiency of 72%.