View Article as Single page
Lost data
The main problem is that every time data is moved, pathnames change, links get broken, and the data map—where particular datasets are stored and the knowledge of how to access and use them—becomes increasingly fragmented. This approach relies on so-called tribal knowledge of this map in order to be effective over time, but when employee turnover over decades is factored in, that knowledge is almost certain to be lost. When it does inevitably break down, data can no longer be reasonably located and is, in effect, gone.
Expensive seismic surveying data "going dark" is a major cost; either that data will have to be re-collected, or in the case of historic data, it may simply be irretrievably lost and unavailable to future projects and new opportunities.
Simply put, there is a disconnect between the fundamental way data needs to be accessed and the structure of the underlying hardware and storage systems. Engineers and geoscientists need to access seismic data today which years from now is at odds with the nature of technology upgrades and the physical structure of storage systems that change and evolve over time.
Infrastructure for data longevity
Fortune 1000 companies and smaller seismic players that serve the offshore industry are expected to confront a cataclysmic change over the next 24 months. Given the exponential growth and revenues associated with the data, companies will have to find a way to not only locate all of this "dark" data, but also re-architect their computing infrastructure so that research teams can immediately access and collaborate on it just as easily 30 years from now as they can this week. Additionally, they will have to do it in a cost-effective manner in both the short and long terms without compromising scale and performance.
For example, GE Oil & Gas is one of a number of organizations tackling the inconvenient realities of constantly shifting hardware through a virtualization-centric data architecture that creates a unified namespace for large unstructured datasets. These new approaches essentially make data accessible—even datasets that are highly distributed across storage mediums and across the globe—from a single virtual space, which does not change over time. From the user's perspective, it eliminates the heavy dependence on tribal knowledge for data location and preserves access to these datasets that would otherwise be lost in the next refresh cycle.
Solutions like these ultimately require a rethink of the modern IT infrastructure. It requires oil and gas companies to collaborate with engineering teams to design a new technological approach to managing and optimizing seismic data based on the following tenets:
Mission-critical unstructured datasets. These are precious assets that need to be maintained and re-harvested over their lifespan. Business leaders and heads of engineering need to be actively involved with IT in preserving these datasets and enabling geophysicists to manipulate them at any given time. These new requirements of data access, data longevity, and data location will join the traditional storage considerations: scale, performance, and cost.
IT architecture needs to evolve. Geoscientists' and data analysts' fundamental need for ready access to data needs to be resolved with an IT architecture that has not evolved to meet their needs. Today, it is too costly in terms of money and man hours to keep track of mission-critical seismic datasets over time. Many researchers and engineers lose 45 to 120 minutes a day managing, deleting, and finding unstructured seismic and related data.
Tribal knowledge is an ineffective long-term strategy. Over the course of decades, people who know where data is stored will leave the organization, and knowledge of its location will deteriorate over time. The amount of value organizations can derive from their unstructured seismic datasets depends on hosting the data in a solution that allows end users to find and access data easily and readily—even without tribal knowledge.