Big Data is a term that has lately been thrown around a lot. From retail management systems focused on staple goods stores to enterprise data management solutions for multinational organizations, everyone seems to be insistent on employing big data tools and techniques. This article focuses on demystifying the need for big data and the set of challenges it comes with.
It is safe to assume that hardly anyone in the industry today wouldn’t have heard the term “big data”. What exactly constitutes big data is a tad more ambiguous though. A brief summary is then in order. At a glance, big data is the all-encompassing term for traditional data AND data that is generated in addition to traditional data sources. In a plant’s context, this traditional data can be split into two streams: OT data and IT data. Where OT data for a plant consists of alarms & events data, data historians etc., while IT data for the plant is made up of ERP data primarily covering production, procurement, and access logs.
Big data then is data that both includes and goes beyond this structured periodically stored data. For one thing, while the traditional IT/OT data is stored in its own unique systems and structures, big data is “multistructured”, meaning it has the necessary knowledge management tools to access different data from different origins and contextualize it for analyses and reports. So, a major milestone for an effective industrial big data system is the integration of IT/OT data. But it doesn’t stop there, a plant has numerous other potential data points that are either not monitored in any specific system – e.g. shift logs, personnel reports, audits etc. – and data that isn’t monitored altogether – e.g. machine vibrations, planning inefficiencies, environmental variables. Addressing this unmonitored data and including its impact on decisions is another important consideration for big data systems.
That doesn’t mean big data is only concerned with the storage and acquisition of all data either. Big data systems need to be able to quickly address and analyze data on demand, without being affected by the scale and pace of data acquisition and querying. This is called the scalability of big data and is one of the first concerns for big data systems. Other concerns include system reliability – the ability to always provide similar performance – and decision support for real time analyses. These analyses also include machine learning and AI, which can be enormously beneficial in picking out data anomalies, predicting future behavior for production, equipment, and forecasts, and providing significantly more detailed scenarios for decision support. The best part about all this is that big data systems are designed to perform most of these analyses on real-time data – using simpler algorithms to pick datasets that need more analyses – regardless of the scale and speed of data ingestion.
Once the context for big data has been established, determining the need for it becomes a relatively simpler task. While most integrators and solution providers will tell you that you really do need big data – and this claim is generally true – when and how you need it is a more nuanced matter. Big data integration isn’t something anyone can jump right into as it requires an extensive effort and commitment from the entire organization, not just the IT team implementing it. At the granular level, there are pockets of information that are either invisible to the organization, or sometimes intentionally kept secret to avoid a “Hawthorne effect”. Not taking these factors into consideration can sometimes mean the difference between just investing a million and actually saving millions more from it.
For a plant operator looking to upgrade their obsolescence-ridden DCS, integrating a big data system for a more holistic view of the plant is a task both beyond the budget of the plant’s teams and the scope of that single plant. Similarly, for situations where the need of the moment is to enhance the speed and capability of data collection using newer technology, e.g. employing a data historian with real time tracking, trending, and monitoring with the ability to perform analyses, it makes little sense to employ a big data solution for such a limited scope.
The primary consideration for all such upgrades needs to be the desired results for such an undertaking. Consider a scenario where the desired data needs to be stored locally – for security or privacy considerations – the existing data is already stored structurally and only needs to be analyzed and contextualized with other similar data e.g. through an OPC server. A solution here based on remote analysis and visualization using dashboards and data connectors is going to be a lot more effective, both financially and in terms of the strain on the organization for implementing such a solution, while delivering similar decision support as a big data implementation.
Many industries are currently at a stage where simply connecting their disparate data sources and giving analyses and insights on that data is a means to achieve multifold boosts in productivity and efficiency. However, that doesn’t mean big data solutions don’t fit into the picture at all. Soon, the early adopters of big data will have enough of a competitive edge over those simply relying on integrated traditional data analyses to outperform them in nearly all markets. However, most industries as they stand today need traditional data analyses as much as big data.
Some of the biggest challenges of big data come in the form of planning for an upgrade to big data. An extensive all-inclusive solution that can be scaled continuously to integrate newer data sources needs to be designed for future inclusions and upgrades without affecting any functionality and performance. For most organizations this means switching their services to the cloud, upgrading their systems across the board for better monitoring and logging of data, and almost always increasing the human capital that possesses the skill and capability to successfully implement big data solutions across all departments and functions. Organizations that choose to employ on-premise solutions for security or other concerns also need to consider the significantly higher costs of maintaining in-house data servers with a dedicated system support team, and even them the scalability of such systems isn’t always as effective as cloud-based deployments.
Companies working with IIoT and big data solutions have a vested interest in pushing big data solutions. And while it is always going to be more beneficial in the long term to have a big data solution, if the existing systems have gaps that can be filled with a better organized approach to traditional data management and analyses, it would make a lot more sense to implement a traditional data acquisition, trending and monitoring solution, which would have much greater cost-benefits as well. If anything, a thoroughly planned traditional data analysis solution needs to be a precursor to implementing a big data solution. Only then can an organization really see what their big data systems need to be able to achieve.
Qasim Maqbool – Principal Platform Engineer Industrial Intelligence Solutions
Ahmed Habib – Marketing Manager
INTECH Process Automation