Supercomputer
The Advanced Scientific Computer (ASC) is a supercomputer designed and manufactured by Texas Instruments (TI) between 1966 and 1973.[1] The ASC's central processing unit (CPU) supported vector processing, a performance-enhancing technique which was key to its high-performance. The ASC, along with the Control Data Corporation STAR-100 supercomputer (which was introduced in the same year), were the first computers to feature vector processing. However, this technique's potential was not fully realized by either the ASC or STAR-100 due to an insufficient understanding of the technique; it was the Cray Research Cray-1 supercomputer, announced in 1975 that would fully realize and popularize vector processing. The more successful implementation of vector processing in the Cray-1 would demarcate the ASC (and STAR-100) as first-generation vector processors, with the Cray-1 belonging in the second.
History
TI began as a division of Geophysical Service Incorporated (GSI), a company that performed seismic surveys for oil exploration companies. GSI was now a subsidiary of TI, and TI wanted to apply the latest computer technology to the processing and analysis of seismic datasets. The ASC project started as the Advanced Seismic Computer. As the project developed, TI decided to expand its scope. "Seismic" was replaced by "Scientific" in the name, allowing the project to retain the designation ASC.
Originally the software, including an operating system and a FORTRAN compiler, were done under contract by Computer Usage Company, under direction of George R. Trimble, Jr.[2][3]
but later taken over by TI itself. Southern Methodist University in Dallas developed an ALGOL compiler for the ASC.
Architecture
The ASC was based around a single high-speed shared memory, which was accessed by the CPU and eight I/O channel controllers, in an organization similar to Seymour Cray's groundbreaking CDC 6600. Memory was accessed solely under the control of the memory control unit (MCU). The MCU was a two-way, 256-bit per channel parallel network that could support up to eight independent processors, with a ninth channel for accessing "main memory" (referred to as "extended memory"). The MCU also acted as a cache controller, offering high-speed access to a semiconductor-based memory for the eight processor ports, and handling all communications to the 24-bit address space in main memory. The MCU was designed to operate asynchronously, allowing it to work at a variety of speeds and scale across a number of performance points. For instance, main memory could be constructed out of slower but less expensive core memory, although this was not used in practice. At the fastest, it could sustain transfer rates of 80 million 32-bit words per second per port, for a total transfer rate of 640 million words per second. This was well beyond the capabilities of even the fastest memories of the era.
The CPU had a 60 ns clock cycle (16.67 MHz clock frequency) and its logic was built from 20-gate emitter-coupled logic integrated circuits originally developed by TI for the ILLIAC IV supercomputer. The CPU had an extremely advanced architecture and organization for its era, supporting microcoded arithmetic and mathematical instructions that operated on scalars, vectors, or matrices. The vector processing facilities had a memory-to-memory architecture; where the vector operands were read from, and the resulting vector written to, memory. The CPU could have one, two, or four vector lanes, allowing the CPU to produce one to four vector results every cycle, depending on the number of vector lanes installed. The vector lanes were also used for scalar instructions, and each lane could keep up to 12 scalar instructions in-flight simultaneously. The CPU, with four lanes, allowed up to 36 instructions in total across the entire CPU.
The processor had forty-eight 32-bit registers, a huge number for the time. 16 of the registers were used for addressing, 16 for scalar operations, 8 for index offsets, and 8 for specifying the various parameters for vector instructions. Data was moved between the registers and memory by load/store instructions, which could transfer from 4–64 bits (two registers) at a time.
Most vector processors tended to be memory bandwidth-limited, that is, they could process data faster than they could get it from memory. This remains a major problem on modern SIMD designs as well, which is why considerable effort has been put into increasing memory throughput in modern computer designs (although largely unsuccessfully). In the ASC this was improved somewhat with a lookahead unit that predicted upcoming memory accesses and loaded them into the scalar registers invisibly, using a memory interface in the CPU called the memory buffer unit (MBU).
The "Peripheral Processor" was a separate system dedicated entirely to quickly running the operating system and programs running within it, as well as feeding data to the CPU. The PP was built out of eight "virtual processors" (VPs), which were designed to handle instructions and basic integer arithmetic only. Each VP had its own program counter and registers, and the system could thus run eight programs at the same time, limited only by memory accesses. Keeping eight programs running allowed the system to shuffle execution of programs on the CPU depending on what data was available on the memory bus at that time, minimizing "dead time" where the CPU had to wait for data from the memory.
The PP also included a set of sixty-four 32-bit communications registers (CRs). The CRs stored the state required for communication between the various parts of the ASC: the CPU, VPs, and channel controllers.
The ASC instruction set include a bit-reverse instruction that was intended to speed up the calculation of fast Fourier transforms (FFTs). By the time the ASC was in production, better FFT algorithms had been developed that did not require this operation. TI offered a bounty to the first person to come up with a valid use for this instruction, but was never collected.
Market reception
When ASC machines first became available in the early 1970s, they outperformed almost all other machines, including the CDC STAR-100, and under certain conditions matched that of the one-off ILLIAC IV. However, only seven had been installed when the Cray-1 was announced in 1975. The Cray-1 dedicated almost all of its design to sustained high-speed access to memory,[clarification needed][citation needed] including over one million 64-bit words of semiconductor memory and a cycle time that was one-fifth that of the ASC (12.5 ns). Although the ASC was in some ways a more expandable design, in the supercomputer market speed is preferred,[clarification needed] and the Cray-1 was much faster. ASC sales ended almost overnight, and although an upgraded ASC had been designed with a cycle time one-fifth that of the original, Texas Instruments decided to exit the market.
Vector processing applications
The ASC #1 prototype was a one pipe system and brought up in Austin, Texas, off site from TI's main plant for proprietary information reasons. It was later upgraded to two pipes and renamed as ASC # 1A. It was then used by TI's GSI division for seismic data processing.
ASC #2 was leased to Shell Oil Company in the Netherlands and also used for seismic data processing.
ASC #3 was installed at the Redstone Arsenal in Huntsville, Alabama, for Anti Ballistic Missile Interception technology development. With the SALT Treaty, the system was later redeployed to the Army Corps of Engineers in Vicksburg, Mississippi, for dam stress analysis.
ASC #4 was used by NOAA at Princeton University for developing weather forecasting models.
ASC systems #5 and #6 were installed at TI's main plant in Austin and also used by GSI for seismic data processing.
ASC #7 went to the Naval Research Lab in Washington, D.C.[4] for plasma physics studies.
References
- Peter M. Kogge (1981). The Architecture of Pipelined Computers. Taylor & Francis. pp. 159–162.
External links