What exactly are Big Data?
As a data scientist, I am fascinated by data. With enthusiasm, I dive into the raging sea of data, the so-called Big Data, and search for pearls, the insights that are rarely found directly on the surface. What secrets are hidden in the unimagined depths of the sea of data?
To explain my profession to other people is not always easy. I often noticed that many people interpret the term “Big Data” differently. With the term “Big data” I stick to Gartner’s definition with the 3Vs: Volume, Variety, and Velocity. Volume refers to the large data volume. Volumes of data can no longer be handled by common methods. Data from the industry come from all areas of the automation pyramid, from the field level to the corporate management level, but also across the entire value chain. Naturally, large quantities come together.
Variety data vary depending on the source. At the field level, the data is structured, the measured values. But there are also unstructured measured values, such as texts from documentations or multimedia data like pictures, videos and sound analyses. There are different types of data that contain information and need to be processed.
Velocity– the data must be acquired and processed extremely quickly in order to make decisions.
In the meantime, some more Vs have been added to the definition. Of course, you can do that indefinitely, but there is one more V that I personally find very important and that is validity. Data quality can become a problem if the collected data is not correct or incomplete.
To what extent is Big Data used for the oil and gas industry?
Big Data is used in various areas of the oil and gas industry, such as process optimization. Using historical data, models are trained to analyze the flow of processes. What I want to explore: What information can be extracted from the collected data? Another important question: What are the consequences? How can the process be optimized? In many cases, optimized operating parameters can be determined. Processes are looked at in terms of energy optimization – how can processes be better controlled to use energy more effectively? Another example is the use of robots, which are mainly used in situations that could become dangerous for humans, e.g. to detect leaks in gas pipes. The infrastructure can be monitored by drones. To make decisions, evaluations of unstructured data, such as in maintenance reports or operator logs, can be used.
Operator logs are used, for example, to check which actions the operators actually perform. In industry, there are often so-called SOPs, prescribed procedures for the operator. In reality, however, the situation is different. Operators rely on their experience values, which are recorded in the event and alarm data. In addition, operator logs and diaries record anything unusual that has happened in the plant. The agglomerated data can be evaluated to analyze the actual approach of the operators in certain situations and thus improve the SOPs or, if necessary, automate things later.
Companies do not fully exploit the potential of Big Data
Even though the enormous potential of Big Data has largely been recognised, in practice the potential is not fully exploited. Especially in the industry, there is still a lot to do. What is the reason for this? This is where wishful thinking meets bureaucracy. The structures are not yet as developed as they should be for the use and analysis of Big Data. There are data silos, each department maintains its own databases. However, in order to use the full potential, one would have to merge all the data. This creates transparency, uncovers weaknesses and promotes productive initiatives. This is the only way to get the big picture, the big picture.
Data quality is also a shortcoming. In practice, data gaps or timestamps are not documented, so that data can no longer be correctly assigned. My favourite example: You have process data and laboratory data which must be assigned to a process state. Here it is of utmost relevance to note the exact time of data recording. If the person who takes the sample does not write down when exactly the sample was taken, it is difficult or even impossible to assign it. Then the data is useless. You have to make people aware that accurate data collection and documentation is extremely important. If according to the time schedule the measurement is due at 1 pm and you are late, there is often some cheating. People often do not want to admit and record the human “fault”. With the result that in case of doubt one cannot do much more with the measurement.
For all those who would like to delve deeper into the topic of Big Data applications: The first three sheets of the guideline VDI/VDE 3714 “Implementation and Operation of Big Data Applications” have just been published in green print, another four sheets will follow. As a member of the guideline committee, I was actively involved in the preparation of the guideline and I must say that it really does provide a good and comprehensive overview of the current findings from practice and science. And it is tailored to the issues facing the manufacturing industry. We have attached great importance to giving even newcomers to this field a good introduction and have illustrated many aspects with vivid examples from practice. It is worthwhile to take a look.
The next big trend in Big Data
Data architecture continues to be a major trend. How is data stored? Where is the data stored? How is data integrated to break up silos and make it available for larger applications?
One important topic is IIoT: There will be more and more intelligent devices and sensors that can be used in the industry alongside the control system, keyword “NOA”. Away from the automation pyramid, data can be integrated, visualised and evaluated. An example: The integration of sushi sensors. This is a very exciting tool with which a trend analysis can be made very easily. But the potential here also lies in the linking with other data. Our new JOIN solution can be used individually and tailor-made in the field of IIoT. Here JOIN serves as an open collaboration platform for various devices: with cloud structure and scalable data storage as well as an integrated data analysis platform and user-friendly dashboard.
Last but not least: data analysis, such as predictive analytics. What are future developments? How can you plan ahead, e.g. in the area of maintenance?
There is a lot to do in the wide field of Big Data. What stage do you think your company is currently at? I welcome comments here on the blog or via our social media channels.
Photo credits: sdecoret-stock.adobe.com