19 Apr Machine Learning is Dead – An Hybrid AI Solution is Needed for Global Industry…
When people take a detailed tour of a real Machine Learning (ML) system for the first time, they are often disappointed. We know that we was.
Data on expensive, high consequence failures would be particularly valuable to us, because we may be able to prevent those failures from happening, or it may be possible to reduce the (high) cost of maintenance if we can do it differently. The problem (Resnikoff’s conundrum) is this: we need failure data to analyse trends and to improve maintenance; but our existing maintenance schedules probably do a good job of preventing failures. This is especially true of maintenance that is intended to prevent high consequence failures. The overall result is that while there may be plenty of failure data for failures that don’t matter, there is very little for the failures that we need to prevent.
We mentioned the Resnikoff conundrum because it is directly related to the way that machine learning works. The reaction that many people with any statistical background have is something like this: “Oh, so it just produces a fancy correlation matrix?” And that really is all that most ML does: it looks for correlations in data and expresses them as a series of functions.
Those who sell AI tools often don’t place much emphasis on how the correlations are identified; perhaps it adds an air of mystery or magic to gloss over that part. However, it’s very relevant for the type of data we’re dealing with and the types of output we would like to get.
The general method used is to expose a network to a training data set. The training data contains a range of inputs of different types. In our case they could be anything: sensor readings, run hours, temperatures, weather, which shift is operating, ERP event data, absolutely anything that’s available. Some of the parameters will turn out to be irrelevant; we’re hoping that at least some of them determine the output. Each record in the training data set also contains the output. It has to. Then the ML system determines the correlation between the inputs we present and the output given (pass/fail, failing/OK, probability of failure, whatever it might be). So the output has to be present for each record in the training data set.
If the learning input to the ML algorithm has to contain both inputs and known outputs, it follows that to predict failures, we have to have both the input data and failure records. That’s where Resnikoff comes in. We will have lots of experience of failures that don’t matter, but precious few records for failures that do matter. So our learning data set will either be empty or too sparse to be useful unless we can aggregate data from thousands of users, and it would take time (months, years or more) to get the data needed.
We can’t imagine any but the most adventurous deciding to destroy components just to harvest a useful learning data set. However, in addition to any data they can extract from clients, Original Equipment Manufacturers have the time, resources and motivation to write off equipment in a laboratory to get failure trend data where it’s economically justified. The other issue is OEMs have data only for their equipment history and not the system as a whole in which their equipment is actually connected to.
How many failure records would be needed is an interesting question. One record in the learning data set is just about never enough. For a classification that is hit-you-between-the-eyes-obvious, perhaps 30-50. For subtle patterns in the presence of a lot of noise, hundreds, thousands or more. It’s no coincidence that ML and Big Data are so closely related, or that ML algorithms–which had existed for decades–really only came into their own when computer systems had the power to collect, store and process millions or billions of records.
The limitations of the available data are the reason we suggest a hybrid system with a combination of two types of Artificial Intelligence (AI)I: Rule-Based and Machine Learning. You can see the difference by thinking about trying to teach a computer to play chess in two different ways. One way is to give the system a list of rules (start with the board like this… pawns can move like this… this is how you capture pieces…). This is the rule-based system. The other way is to say nothing about the game, but let the system watch hundreds of chess games and deduce what you can and can’t do. This is the machine learning system.
The rule-based chess learning system gets a great start: from the beginning it knows how to move pieces, and it knows all the possible moves. However, it hasn’t watched any games, so it doesn’t know how a human player would typically move.
The machine learning system is going to take a while to get going. It will learn how the individual pieces move fairly quickly, and it will probably learn “typical” moves in different situations. But it will take it a very long time to encounter obscure rules like promoting a piece, stalemate or the rules for a draw.
In our reliability environment, we have a system watching an input stream from sensors, other condition monitoring, start/stops, ERP events and more. We would like the system to tell us what maintenance needs to be done and when. So ground-up learning from scratch isn’t an option here: the system has to start at day 1 with a maintenance plan that is at least as good as whatever came before. Hence the use of a rule set from the moment the system goes live:
- “Change this seal every 2 years
- Look out for high vibration on this bearing and change it within two weeks
- Test this level switch every six months…”
The rule base is pretty comprehensive, but we know it won’t be completely right, and we won’t have considered everything. That’s where the ML comes in. Instead of looking at all the data and somehow using it to construct a maintenance schedule from scratch, it looks for exceptions to the rules: things that aren’t working according to the rules, and events that weren’t expected.
Having found the exceptions, and perhaps with human authorisation, the system then starts to change the rule base. The motivation for using two types of system is to get stability and flexibility in the same package.
Summarising, we can see why data science and Machine Learning in its isolated environment is limited in use when it comes to asset management of physical assets. It cannot overcome the Resnikoff Conundrum inasmuch, we will have plenty of data for inconsequential failures but precious few of critical failures that affect our corporate bottom line and our brand reputation. So the next time you are approached to adopt predictive maintenance solutions, data-science solutions, read carefully what it says on the tin to ensure you are getting value for money, because what really is happening is a data gathering exercise to tell you pretty much what you already know – except you will probably get a great dashboard. Question is, is it worth the cost?
However, opting to work with companies like Relmar – the Rule-Base changes the whole game. Suddenly, you have system working to eliminate your critical failures not just informing you of “possible” issues derived from “data insights” like many data-solution providers offer, but getting you ahead of the curve and providing you a fantastic competitive advantage. Don’t settle for being told when failures may occur but settle for solutions that prescribe actions to eliminate the potential failures.
We are always open for no-obligation, exploratory chats. Contact us at contact@relmar.co.uk