Software and hardware efficiencies in handling large data sets

April 1, 1999
Simple Java 3D Application Scene Graph. [12,251 bytes] Oil and gas producers utilize a variety of technologies to solve core challenges in the processing, distribution, and visualization of complex data sets. However, many of these technologies and processes do not allow producers to view data sets efficiently or be as productive as possible with the information before them.

"Thin-client" networks and Java technology

Oil and gas producers utilize a variety of technologies to solve core challenges in the processing, distribution, and visualization of complex data sets. However, many of these technologies and processes do not allow producers to view data sets efficiently or be as productive as possible with the information before them.

The new network computing model and Java technology provide a framework for high-performance collaborative computing, enabling virtually any type of computer to access, process, and share data. This capability allows companies to cut the costs and logistical complexities of data transfer, while improving collaboration among scientists, engineers, and management.

Performance processing

The conventional processing sequence for seismic processing operates as a pipeline. A block of data is read one segment at a time and is then passed through a chain of data processing routines. A final process writes the finished data back to storage devices. Software loops are used to create more complex flows - not the most cost-effective or efficient method.

Processing speed and productivity can be dramatically improved through the use of parallel processing techniques and distributed network computing with other related technologies. In today's "connected" world, this technology provides an adequately portable, distributable, and parallel object framework for scientific computation.

Sun and Arco have developed an object model for parallel seismic processing, JavaSeis, as a proof-of-concept for Web-based delivery of parallelized data processing services. The model provides services for:

  • Navigating the file system
  • Displaying data sets
  • Building seismic processing flows
  • Updating parameters
  • Storing parameters
  • Creating graphical performance analysis.
As the basis of the project, Sun used the Arco Seismic Benchmark Suite (ASBS). Seismic data is managed as parallel distributed objects using traditional procedural languages, C and Fortran.

New processes are developed using "inheritance by copying templates," so system changes often require extensive modifications to all copies of a few basic system templates. Parallelism is managed through application programming interfaces (API) defined by the services required for implementing common geophysical data processing algorithms.

The implementation of ASBS based on the new technology has proven to significantly improve the portability and maintainability of seismic computing applications, and provides a pathway to the use of network-based parallel computing for production seismic processing.

Language and networking

There are two key components to the new network-computing model: the Java programming language and the "thin client" concept. The language is inherently cross-platform because the compiler generates bytecodes. Bytecodes are architecture-neutral object files designed to transport code to multiple hardware and software platforms.

The technology interpreter can execute bytecodes directly on any system to which the interpreter has been ported.

Thus programmers write a program once, and it can run on almost any device from desktop computers, laptops, and screen phones to high-performance servers and mainframes. Any user can access the program from anywhere using any network device, and developers no longer have to port applications to each platform.

The language achieves high performance by translating the bytecodes into machine codes. For applications requiring large amounts of compute power, the compute-intensive segments can be rewritten in native machine codes and interfaced with the new software environment.

The technology also supports multithreading - the capability to execute multiple concurrent sequences of instructions.

Thin-client computing

The language is the enabling technology behind "thin-client" network computing. Thin-client network computing involves storing or "housing" applications centrally on servers and deploying them to any client device as needed.

This eliminates the need for massive amounts of local processing power, memory, and storage on individual desktop computers. In the thin-client model, software is distributed over the network.

The computer user sees one seamless electronic environment differentiated only by user access rights. System administrators ensure security and privacy by controlling access rights. Equally important, no massive user re-training program is required to transition to the "thin-client" network-computing mode.

Another key advantage of this technology for oil and gas producers is the availability of a complete set of APIs. These simplify the process of creating, viewing, and distributing media-based applications such as 2D and 3D visualizations.

The Java 3D API is aimed at creating a single cross-platform 3D API in the new language. It is layered on top of existing, low-level APIs, such as OpenGL and Direct 3D, to leverage native acceleration on any given graphics platform.

The API incorporates a high-level scene graph model that allows developers to focus on the objects and scene composition, freeing the programmer from complex coding procedures to specify the scene display.

For the oil and gas industry, Java technology increases performance through parallelized processing, improves collaboration among diverse workgroups, and simplifies the development and rendering of 3D visualization.

Author

Jack Herzog is Director of Global Manufacturing and Energy Markets for Sun Microsystems, and is a 10-year veteran of the oil and gas industry.

Copyright 1999 Oil & Gas Journal. All Rights Reserved.