Article

Data Science in the industry: Kaggle applications and datasets

The use of Data Science, already so widespread in other areas, is still rare in industry. Get to know IHM Stefanini's profile on Kaggle and access real datesets!

Mining

From crushing to transport, solutions with great experience of the processes

Steel & Metal

Projects and solutions for the entire steel production chain

Oil & Gas

From equipment to projects in one of the most essential industries

Project Data

Today much is said about Data Science, Machine Learning and Artificial Intelligence. Examples of successful applications, mainly in the retail (churn prediction and recommendation system), banking (fraud detection) and technology (search systems) sectors are very common to find.

There are also an extensive number of online courses, many of them very interesting and practical, in which several examples are explored showing how the algorithms are used in databases known worldwide (such as the Iris dataset).

The graph below illustrates the types of algorithms most used by data scientists according to Kaggle:

With all this sea of information, it's not hard to feel lost or with the feeling of "not leaving the place". Especially when we wake up in the morning, excited to start studying some machine learning algorithm, and start being bombarded by cool posts on LinkedIn or our favorite high tech news feed.Everything seems to make no sense anymore and we end up leaving that study aside...who has never met in this situation, right?

But so far we haven't said anything about industry, because in this sector the scenario we find is quite different.

If in other sectors there are references about Data Science applications, in the industry there is little information available: databases, success cases or implementation details. This shortage even leads to the question whether the use of this technology would be possible in the industrial environment.

The lack of greater availability of references has different reasons. First, there is a cultural factor: the industry (mainly the basic one: mining, steel, oil and gas) is usually a little more conservative in relation to the use of new technologies, even for not wanting to run the risk of accidents or production losses when trying something "little known or validated" in the industrial field. Secondly, existing cases often cannot be disclosed due to intellectual property issues.

But the fact is that the use of Data Science in industry is possible, and some companies in this field have already invested in it. In a recent work we did for a large mining company in Brazil, anomaly detection and linear regression techniques were used to detect leaks in an ore transport pipeline. In another project for Anglo AmericanWe use deep learning algorithms to predict quality at one stage of the ore beneficiation process. One last example: in Nexa Resources (formerly Votorantim), we implemented an entire Big Data infrastructure to process the various data from one of its manufacturing units.

Thinking about increasing the amount of information available and raising the level of discussion, HMI Stefanini has created its profile on Kaggle, the world's largest online community of data scientists. In this profile real industry data was made available in a dataset that was even reviewed by the Kaggle team itself. More data will be made available soon, as well as the approaches used and codes developed. We hope to contribute to the growth of Data Science use in this industrial environment and to provide more learning, including for ourselves, by exchanging information with data scientists around the globe.

Context and Challenges

The success of a project depends, however, on factors that go far beyond data access and programming capacity.In 2017, a Gartner report estimated that 60% of analytics and big data projects fail. But the reality is even more serious: according to Gartner analyst Nick Heudecker, Gartner was "very conservative" with that estimate. The actual rate of failure, he said, would be close to 85%.

We believe that the high failure rate of analytics projects is related to a very important issue called Domain Knowledge: the specific knowledge and mastery of the processes we are dealing with. In the case of industry, such processes usually have high complexity and their knowledge ends up being relatively restricted. However, in the cases where we were present, it was precisely the domain knowledge of the teams involved in each project that made the most difference. The experience accumulated over the years was of great value in building the solution, always in line with the needs of the client. The old maxim is always valid:

          "Simple solutions to simple problems, complex solutions to complex problems."

At the end of the day, it is possible to apply finger science to the industry, regardless of the technology and programming language used (which may even include linear regression in some cases). What will make a real difference will be the knowledge that the team involved has to understand the common problems of the industry.

Solutions Used and Equipment Provided

Experts

Head of Data Science & AI

Eduardo Magalhães

Awards

No items found.

Whitepapers

No items found.
Connect with our team of experts in various areas of industry.
Finding experts

Data Science in the industry: Kaggle applications and datasets

The use of Data Science, already so widespread in other areas, is still rare in industry. Get to know IHM Stefanini's profile on Kaggle and access real datesets!

January 25, 2021

published by

Head of Data Science & AI

Eduardo Magalhães

Today much is said about Data Science, Machine Learning and Artificial Intelligence. Examples of successful applications, mainly in the retail (churn prediction and recommendation system), banking (fraud detection) and technology (search systems) sectors are very common to find.

There are also an extensive number of online courses, many of them very interesting and practical, in which several examples are explored showing how the algorithms are used in databases known worldwide (such as the Iris dataset).

The graph below illustrates the types of algorithms most used by data scientists according to Kaggle:

With all this sea of information, it's not hard to feel lost or with the feeling of "not leaving the place". Especially when we wake up in the morning, excited to start studying some machine learning algorithm, and start being bombarded by cool posts on LinkedIn or our favorite high tech news feed.Everything seems to make no sense anymore and we end up leaving that study aside...who has never met in this situation, right?

But so far we haven't said anything about industry, because in this sector the scenario we find is quite different.

If in other sectors there are references about Data Science applications, in the industry there is little information available: databases, success cases or implementation details. This shortage even leads to the question whether the use of this technology would be possible in the industrial environment.

The lack of greater availability of references has different reasons. First, there is a cultural factor: the industry (mainly the basic one: mining, steel, oil and gas) is usually a little more conservative in relation to the use of new technologies, even for not wanting to run the risk of accidents or production losses when trying something "little known or validated" in the industrial field. Secondly, existing cases often cannot be disclosed due to intellectual property issues.

But the fact is that the use of Data Science in industry is possible, and some companies in this field have already invested in it. In a recent work we did for a large mining company in Brazil, anomaly detection and linear regression techniques were used to detect leaks in an ore transport pipeline. In another project for Anglo AmericanWe use deep learning algorithms to predict quality at one stage of the ore beneficiation process. One last example: in Nexa Resources (formerly Votorantim), we implemented an entire Big Data infrastructure to process the various data from one of its manufacturing units.

Thinking about increasing the amount of information available and raising the level of discussion, HMI Stefanini has created its profile on Kaggle, the world's largest online community of data scientists. In this profile real industry data was made available in a dataset that was even reviewed by the Kaggle team itself. More data will be made available soon, as well as the approaches used and codes developed. We hope to contribute to the growth of Data Science use in this industrial environment and to provide more learning, including for ourselves, by exchanging information with data scientists around the globe.

The success of a project depends, however, on factors that go far beyond data access and programming capacity.In 2017, a Gartner report estimated that 60% of analytics and big data projects fail. But the reality is even more serious: according to Gartner analyst Nick Heudecker, Gartner was "very conservative" with that estimate. The actual rate of failure, he said, would be close to 85%.

We believe that the high failure rate of analytics projects is related to a very important issue called Domain Knowledge: the specific knowledge and mastery of the processes we are dealing with. In the case of industry, such processes usually have high complexity and their knowledge ends up being relatively restricted. However, in the cases where we were present, it was precisely the domain knowledge of the teams involved in each project that made the most difference. The experience accumulated over the years was of great value in building the solution, always in line with the needs of the client. The old maxim is always valid:

          "Simple solutions to simple problems, complex solutions to complex problems."

At the end of the day, it is possible to apply finger science to the industry, regardless of the technology and programming language used (which may even include linear regression in some cases). What will make a real difference will be the knowledge that the team involved has to understand the common problems of the industry.

Cases and Similar Articles

Connect with our team of experts in various areas of industry.
Finding experts