Beowulf system: more geoscience computing power for less money

Sept. 1, 2000
Over the past few decades, our industry has progressed from mainframes to array processors, to vector computers, and to massively parallel supercomputers. Today, many companies are moving their seismic processing to commodity supercomputers, reducing their price-perfor-mance ratio (the cost per unit computation) by about a factor of 10.

Growth in performance of largest beowulfs is shown. The unit of measurement is Mflops (millions of floating point computer operations per second). The largest such systems are approaching one million Mflops.

Click here to enlarge image

The power of today's personal computers allows Amerada Hess to perform more and better seismic imaging while spend-ing less money on computer hardware.

Over the past few decades, our industry has progressed from mainframes to array processors, to vector computers, and to massively parallel supercomputers. Today, many companies are moving their seismic processing to commodity supercomputers, reducing their price-perfor-mance ratio (the cost per unit computation) by about a factor of 10.

In 1994, NASA researchers built the first so-called beowulf, a parallel computer built entirely by connecting commodity personal computers (PCs). This system was such an instant success that within a few years much more powerful beowulfs were growing in computational power by a factor of 4,000 within five years.

While the excellent price-performance ratio of commodity hardware components (CPUs, disk, memory, network, etc.) was a necessary enabler of these systems, a functioning supercomputer requires software to run and coordinate the hardware.

Assistance from Linux

The technical IT budget was halved over five years while the seismic imaging volume has increased by a factor of 15.

Click here to enlarge image

Fortunately, the open source software comm-unity had developed the necessary elements. The Linux operating system (OS), the GNU application development environment, and software for passing messages between the nodes of the parallel system enables a beowulf to function like the much more expensive massively parallel computers marketed by SGI and IBM.

In 1998, with oil and gas prices decreasing steadily, the Amerada Hess staff was faced with a dilema: IT (information technology) budgets were being severely reduced while the corporation's steady growth was generating increasing dem-ands for advanced seismic imaging. To compound the problem, one of the two supercomputers then in use at the Houston processing center was going off lease.

It was quite clear that a new paradigm was needed to replace this system. Amerada Hess' geoscience technology and technical computing groups decided to work together to move their seismic processing onto beowulfs.

Amerada Hess has a long history of using state-of-the-art supercomputers to perform advanced seismic imaging, specifically focusing on 3D pre-stack depth migration. Their seismic processing software was already running on a number of computing platforms, each running that vendor's flavor of the Unix OS. While most PCs run a version of Microsoft's Windows OS, these operating systems are very different from Unix, and a great deal of effort would have been be required to post the seismic software to Windows.

Quick integration

The ratio of the performance of the beowulf system to the traditional parallel supercomputer are compared. Which system performs better depends on the application, but average performances are comparable and differs at most by a factor of 2.5.

Click here to enlarge image

Fortunately, Linux is a version of Unix which was developed for PCs. Consequently, a port of the bulk of the seismic software to a small test beowulf running Linux was completed in a matter of weeks. Other than incorporating code, which automatically converted between the numbers on the PCs and the Unix workstations and supercomputers, the port to the beowulf took no more effort than required for a port to a different vendor's supercomputer. In fact, as much time was spent optimizing the performance of the processing software for the new platform as in performing the port. There are many excellent open-source mathematical libraries freely available on the Internet.

Although the open-source community provides much of what is necessary to create efficient programs, several software components from the commercial arena are used on Amerada Hess's beowulf. The resulting beowulf system performed comparable to the traditional parallel supercomputer.

One of the selling points of vendors' parallel supercomputers is their reliability, a key issue. Would a computer constructed from PC hardware and using software developed by hundreds of people across the Internet run reliably for the weeks necessary to process terabyte-sized seismic surveys?

Emboldened by the stability and reliability reported by their colleagues at the national labs and by their experience with their small test system, Amerada Hess staff ordered a 32-processor beowulf even before all their code had been ported to the test system. A few months later, the new system was installed, the software port completed, and 3D pre-stack seismic images were being produced by their beowulf system.

Established support

The performance (for a fixed price) increased by 72% over the course of 13.5 months, spanning the first four beowulf acquisitions, slightly beating Moore's law prediction of 68%.

Click here to enlarge image

An additional issue with beowulf systems was what to do when problems do occur. To whom could they turn for support? An advantage of buying a supercomputer from an established vendor is on-site support.

With beowulf systems, the only resource to turn to is the community of developers and users of that software. Fortunately, this community is both very knowledgable and very helpful - the experience at Amerada Hess has been that expert advice and timely software "fixes" are freely available on the Internet.

Over the past two years, Amerada Hess staff has added to the original beowulf, growing the system to nearly 400 processors. Each acquisition has been at a better price-performance ratio, roughly in accordance with Moore's law, which states that the performance of computers doubles every 18 months. In fact, the first four acquisitions (which were of comparable power) matched Moore's law nearly perfectly.

This process of continuously upgrading the system, each time buying at the "sweet spot" in price-performance ratio of the commodity curve, has resulted in Amerada Hess increasing their computational power by a factor of more than 10 for less money than they would have spent for a traditional supercomputer. This additional power has enabled the use of more compute-intensive, more accurate seismic imaging algorithms.

While Amerada Hess was the first oil company running production code on a beowulf, the idea has been quickly adopted by seismic processing companies and other oil companies. By year-end, it is expected that most companies with their own processing software will attempt to port their code to beowulf systems.

Not to rest on their laurels, Amerada Hess has been looking at leveraging commodity computers to run their interactive programs used for geologic and geophysical interpretation.

These interactive programs are run primarily on Unix workstations,. Amerada Hess has ported their proprietary interactive applications to Linux and is examining the feasibility of running them on laptops. They are also lobbying ISVs to support their applications on Linux. A clear trend already exists in non-petroleum software industries to support Linux (Oracle). It is only a matter of time until the petroleum industry fully embraces Linux.