PetaBox

Internet Archive Petabox

PetaBox, also stylized Petabox, is a storage unit from Capricorn Technologies and the Internet Archive.[1][2] It was designed by the staff of the Internet Archive and C. R. Saikley to store and process one petabyte (a million gigabytes) of information.[3]

Specifications

  • Density: 1.4 petabytes/rack
  • Power consumption: 3 kW/petabyte
  • No air conditioning, instead uses excess heat to help heat the building

Design

Design goals of the Petabox included:[3]

  • Low power: 6 kW per rack, 60 kW for the entire storage cluster
  • High density: 100+ TB/rack
  • Local computing to process the data (800 low-end PCs)
  • Multi-OS possible, Linux standard
  • Colocation friendly
  • Shipping container friendly: able to be run in a 20' by 8' by 8' shipping container
  • Easy maintenance: one system administrator per petabyte
  • Software to automate full mirroring
  • Easy to scale
  • Inexpensive design and storage

History

The first 100 terabyte rack became operational in Amsterdam at the Internet Archive's European arm, the Stichting Internet Archive (SIA), in June 2004. The second 80 terabyte rack became operational in their main San Francisco location that same year. The Internet Archive then spun off its Petabox production to the newly-formed company Capricorn Technologies.[3]

Between 2004 and 2007, Capricorn replicated the Internet Archive's deployment of the Petabox for major academic institutions, digital preservationists, government agencies, high-performance computing (HPC) and major research sites, medical imaging providers, digital image repositories, storage outsourcing sites, and other enterprises. Their largest product uses 750 gigabyte disks. In 2007, the Internet Archive data center housed approximately three petabytes of Petabox storage technology.

In 2010, the fourth version of the Petabox began operation. Each Petabox allowed for 480 TB of raw storage (240 disks of 2 TB each, set up with 24 disks per 4U high rack units and with 10 units per rack) running on Linux.[4][5]

As of December 2021, the Internet Archive's Petabox storage system consists of four data centers, 745 nodes, and 28,000 spinning disks. The Wayback Machine contains 57 petabytes of information; book, music and video collections contain an extra 42 petabytes of information, and "unique data" account for an extra 99 petabytes of information, for a total of 212 petabytes of storage.[3]

References

  1. ^ "Big storage on the cheap". CNET.
  2. ^ "PetaBox Product Family". Capricorn Technologies. Retrieved 2023-07-10.
  3. ^ a b c d "Internet Archive: Petabox". Internet Archive. Retrieved 2023-07-10.
  4. ^ Jeff Kaplan (27 July 2010). "The Fourth Generation Petabox". Internet Archive.
  5. ^ "eWEEK Labs Walk-Through: the Internet Archive". PCMag UK. Archived from the original on 2022-04-27. Retrieved 2021-11-09.