04 May How RELMAR Supports APM in Maritime – The Next Step Change
We think it is fair to say that Asset Performance Management (APM) represents the future of our industry in managing marine assets. In this article, we will review the reliability methodologies that are applicable for asset performance management (APM) and propose a process for deploying the appropriate tools at the appropriate stages catered towards the regulated maritime and marine offshore industries.
There is a lot of work going on in this space to develop systems to anticipate failure based on health/condition. It is a common topic of discussion, which straddles ‘big data’ (people already have more data than they know what to do with) and ‘statistical analysis’. There are many sensors out there to capture data, many tools to make sense of it, and many systems to operate in real time.
BUT, to apply this effectively though, we must have a clear understanding of what we are trying to prognose – in the case of humans the best tool that we have is a Doctor, we doubt this would be appropriate for a Gas Turbine or an Azimuth Thruster though! This is where our reliability analysis and management tools are useful.
What RELMAR aims to provide to the Industry
RELMAR acts as your cost-effective mobile Asset Performance Management department, whereby we take all the perceived complexity of Systems Engineering and its sub-discipline of Reliability Engineering and deliver simplified solutions to the marine crew on-board the asset. This can range from delivery of a set of optimized maintenance schedules for ships systems in part or whole vessel to producing Reliability-Block Diagrams on DP Systems to ensure potential critical failures are proactively dealt with before losing vessel heading or position using an number of reliability tools, techniques and activities at our disposal. Not limited to the aforementioned, we can provide closed loop failure reporting and corrective action data system, (FRACAS) fleet and enterprise wide. We show what data to capture and how; we make sense of it and deliver back to you actionable insights for improved asset performance.
We do the donkey work, where you, as the Client can spend more time in doing what you do best, from transporting containers globally, drilling subsea in Ultra-Deepwater locations or laying pipelines for a green or brownfield development project.
Further in the article, we will show you what APM entails and the work RELMAR will do in order to produce the simplified deliverables to the offshore crew and indeed shore-based technical personnel.
Introduction to our Reliability Engineering Methods
Reliability engineering is a discipline that combines practical experience, maintenance, safety, physics and engineering. Observational data is combined with experience to create models in order to understand the behavior of the equipment, optimize its performance and minimize the life cycle/operational costs. It is important to note that reliability engineering is not simply statistics and it is not always quantitative. Even though quantitative analysis plays a major role in the reliability discipline, many of the available tools and methods are also process-related. It is therefore useful to separate these methods and tools into quantitative and qualitative categories.
In the quantitative category, the typical tools are:
- Life Data Analysis (a.k.a. “Distribution Analysis” or “Weibull Analysis”)
- Reliability Growth Analysis
- Accelerated Testing (a.k.a. “Life-Stress Analysis”)
- System modelling using Reliability Block Diagrams (RBDs)
- Fault Tree Analysis (FTA)
- Design of Experiments (DOE)
- Standards-based Reliability Predictions (e.g., MIL-217)
In the qualitative category, the typical tools are:
- Failure Modes, Effects and Criticality Analysis (FMEA/FMECA)
- Reliability Centered Maintenance (RCM)
- Failure Reporting, Analysis and Corrective Action Systems (FRACAS)
- Root Cause Analysis (RCA)
In this article, we will focus on some of the reliability engineering tools that are the most applicable in asset performance management. This will include a discussion of how and when each method should be deployed in order to maximize effectiveness.
The APM Process
Understanding when, how and where to use the wide variety of available reliability engineering tools will help to achieve the reliability mission of an organization. This is becoming more and more important with the increasing complexity of systems and sophistication of the methods available for determining their reliability. With increasing complexity in all aspects of asset performance management, it becomes a necessity to have a well-defined process for integrating reliability activities. Without such a process, trying to implement all of the different reliability activities involved in asset management can become a chaotic situation in which reliability tools may be deployed too late, randomly or not at all. This can result in the waste of time and resources as well as a situation in which the organization is constantly operating in a reactive mode.
The process proposed in this article is based on the Design, Measure, Analyze, Improve and Control (DMAIC) methodology that is widely used in Six Sigma for projects aimed at improving an existing business process. It includes five phases:
- Define the problem, the voice of the customer and the project goals.
- Measure key aspects of the current process and collect relevant data.
- Analyze the data to investigate and verify cause-and-effect relationships. Seek out the root cause of the defect under investigation.
- Improve or optimize the current process based upon data analysis and standard work to create a new, future state process. Set up pilot runs to establish capability.
- Control the future state process to ensure that any deviations from target are corrected before they result in defects.
The proposed process can be used as a guide to the sequence of deploying different reliability engineering tools in order to maximize their effectiveness and to ensure high reliability. The process can be adapted and customized based on the specific industry, corporate culture and existing processes. In addition, the sequence of the activities within the APM process will vary based on the nature of the asset and the amount of information available. It is important to note that even though this process is presented in a linear sequence, in reality some activities would be performed in parallel and/or in a loop based on the knowledge gained as a project moves forward. Figure 1 shows a diagram of the proposed process. Each phase in the process is briefly introduced in the following sections.
The first step of any project is to define its objectives. This phase of the process is very important because it identifies the requirements and goals that will provide a direction for all future phases and activities to be performed. All too often, projects are initiated without a clear direction and without a clear definition of the objectives. This leads to poor project execution. Therefore, it is essential for the organization to do all of the following during the “Define” phase:
- Define the asset performance/reliability objectives.
- Define requirements and goals.
- Define the scope of the analysis.
- Determine budgetary and time constraints.
- Determine personnel resources and their responsibilities.
- Plan activities and set criteria for success.
- Define the appropriate key performance indicators (KPIs) for the organization.
- Establish the KPI targets.
Prior to conducting any type of reliability analysis, it is important to collect all the data required to support the analysis objectives. It is also crucial to determine what kinds of data are available and where the information resides. The types of data available will determine which analyses can be performed so, if sufficient information is not currently available, it may be necessary to identify future steps for obtaining it. Therefore, the typical steps in the “Measure” phase are to perform a reliability gap assessment, then gather the data and select the appropriate analysis techniques.
Reliability Gap Assessment
The purpose of a reliability gap assessment is to identify the shortcomings in achieving the asset performance management objectives so that a reliability program plan can be properly developed. Many companies implement APM tasks without first understanding what drives reliability task selection. The gaps are those issues or shortcomings that, if closed or resolved, would move the company in the direction of achieving its APM targets. In addition, the available data sources can be identified during this activity. If they are inadequate, we may resort to other sources of information. During the gap assessment, answers to the following questions are sought:
- What reliability activities are currently in place?
- What personnel are currently supporting the reliability activities?
- What procedures document the current reliability and APM practices?
- How does the organization currently collect reliability data? For example, is there a CMMS (computerized maintenance management system), EAM (enterprise asset management) system, FRACAS (failure reporting, analysis and corrective action system), defect logging database, etc.?
- How are the asset reliability and performance metrics currently computed (i.e., methods and tools), if any?
- Are we able to compute all KPIs defined in the previous phase?
Data, and specifically failure time data, are like gold to us. Of course, on the flip side, the more failures that are available to be analyzed, the worse the condition of the asset! In any case, data represent the most important aspect in performing quantitative reliability analyses. It is therefore crucial for data to be collected and categorized appropriately. The data will be used in computing the different KPIs, as well as in performing a variety of reliability calculations. We provide the tools and know-how to capture the right data and when.
In addition to failure data, the repair duration is also a very important input in the reliability, availability and maintainability (RAM) model because it determines the equipment availability. Other types of data will also be necessary for a thorough RAM analysis for assets. Again, we provide the know-how which translates into uncomplicated simple tasks for the crew. The following lists provide a summary of the information typically used.
Minimal information required:
- Failure times/intervals.
- Repair durations.
- Failure codes/IDs (causes of failures).
- Current maintenance task types and intervals.
Additional information that would improve the analysis if available:
- Throughput (capability) of each piece of equipment.
- Repair crew availability (e.g. is the repaired on board or whilst in port or OEM techs flown out and corresponding logistic delays).
- Repair costs (e.g., parts, labour, etc.).
- Spare parts availability and costs.
- Inspection policies (e.g., condition monitoring).
There are multiple sources of data. For example, failure time data can be obtained from maintenance records (work orders, downtime logs, etc.), from the original equipment manufacturer (OEM) reliability specs, or from published generic equipment data.
For existing equipment, historical data can also be used. There may be a great deal of historical data that has been generated over many years. It is necessary to find out where this information resides, and to determine which information can assist in meeting the organization’s analysis objectives. We again, show how to do this efficiently.
Once the data sources have been identified, the quality and consistency of the data must be evaluated. One of the most common problems for analysis is insufficient quality of the collected data. All too often, even though records are kept, it turns out that the data are not really usable. RELMAR provides the expertise to analyse all data. The most common problems with available data include:
- No data tracking system.
- Not specifying the cause of the failure (i.e., the component, subsystem, etc. that was responsible for the downtime).
- Not having the appropriate system hierarchy in the CMMS for reliability data purposes. For example, in many maintenance management systems, the asset hierarchy is set up in a way that prevents the “roll-up” of failure frequency information from the component to the subsystem to the equipment.
- Poor implementation of the process for recording maintenance task details. For example, if maintenance task/work orders are left open after the work has been completed, and the repair duration is based on the date/time when the work order was closed, this will give a false indication of downtime.
- A CMMS or EAM system is in place but it is not capturing downtime and production loss data.
- Information is not captured regarding inspection intervals and the results of each inspection. These details can be very useful in the Risk Based Inspection (RBI) methodology.
To avoid such problems, it is imperative for the organization to implement corrective actions to ensure that good data collection processes and management are in place. RELMAR full guidance here.
Select Analysis Techniques
Finally, assuming that all the relevant information is available, the appropriate simulation and analysis techniques can be selected to estimate the system availability, downtime, production output (a.k.a. throughput), maintenance costs and other metrics of interest, provided by RELMAR. The deliverables will be simple reporting structures. Data sources and analyses etc. can also be accessed by the Client through web-based accessibility.
Depending on the objectives agreed upon during the “Define” phase and the data sources/analysis techniques identified in the “Measure” phase, the next step is to execute the appropriate analysis techniques in order to optimize the performance of the asset. In the following sections, we will briefly highlight the objectives, applications and benefits of some of the most effective reliability-related methodologies that can be used in asset performance management.
Reliability-centred Maintenance (RCM)
RCM analysis provides a structured framework for analyzing the functions and potential failures of physical assets in order to develop a scheduled maintenance plan that will provide an acceptable level of operability, with an acceptable level of risk, in an efficient and cost-effective manner. RELMAR provides facilitation and full guidance. RCM can be:
- Quantitative and based on reliability analysis.
- Qualitative and following a published step-by-step methodology.
- A combination of both of the above.
A lot has been written about RCM and its benefits. A full discussion of the topic is outside the scope of this article but it is worth mentioning some of the widely accepted benefits, which include:
- Prioritizing actions based on equipment criticality (multiple criticality classifications exist).
- Reducing and ultimately eliminating chronic failures and reliability problems.
- Documenting the maintenance program and practices.
- Reducing unscheduled maintenance.
- Reducing risk.
- Documenting the reasons for current activities and for future changes.
Life Data Analysis
Life data analysis (also called distribution analysis or Weibull analysis) refers to the application of statistical methods in determining the reliability behavior of equipment based on failure time data. Life data analysis utilizes sound statistical methodologies to build probabilistic models from life data (i.e., lifetime distributions, such Weibull, lognormal, etc.). The following graphic shows how a statistical distribution is fitted to failure data.
The probabilistic models are then utilized to compute the reliability, make predictions and determine maintenance policies and maintenance task intervals. These models should be applied at the lowest replaceable unit (LRU) level. RELMAR provides all expetise in this area and deliverables will be simple instructions. Some of the applications for this type of analysis include:
- Understanding failure patterns.
- Understanding life expectancy of components.
- Understanding repair duration patterns.
- Using these models in the RAM analysis.
- Using the results in the “Improve” phase for spare part provisions, determining optimum maintenance intervals, making design changes, etc.
Another way to calculate reliability metrics involves a type of analysis known as degradation analysis. Many failure mechanisms can be directly linked to the degradation of part of the product. Assuming that this type of information is captured (e.g., condition based maintenance – CBM – data), degradation analysis allows us to extrapolate to an assumed failure time based on the measurements of degradation over time. This analysis essentially determines the P-F curve that is often discussed by RCM practitioners (i.e., the period from when it is possible to start to recognize a potential failure, P, until it becomes an actual failure, F). The degradation analysis results can be used to:
- Understand failure patterns.
- Understand life expectancy of components.
- Build lifetime distributions that will be used in the “Improve” phase for RAM analysis and optimizations.
Recurrent Event Data Analysis (RDA)
RDA is different than “traditional” life data analysis (distribution analysis) because RDA builds a model at the equipment/subsystem level rather than the component/part level. Furthermore, whereas life data analysis uses time-to-failure data (in which each failure represents an independent event), the data utilized in RDA are the cumulative operating time and the cumulative number of failure events. Therefore, while life data analysis is used to estimate the reliability of non-repairable components, RDA models are applied to data from repairable systems in order to track the behavior of the number of events over time and understand the effectiveness of repairs. The most commonly used models for analyzing recurrent event data are the non-homogeneous Poisson process (NHPP) and the general renewal process (GRP). Again, RELMAR provides all technical expertise in this area. Nothing is expected beyond STCW requirements.
System Modelling/RAM Analysis
A reliability, availability and maintainability (RAM) analysis typically starts from the creation of a diagram that represents the overall system/process and the corresponding major subsystems. This diagram is known as a reliability block diagram (RBD). The next step is to expand the major subsystems into subsubsystems and keep repeating until you reach the level where reliability information is available (ideally at the LRU level). The analysis will be based on the failure and repair duration properties for the items in the diagram. The failure properties (i.e., reliability) determine the frequency of occurrence of failure of each LRU; the repair durations determine the downtime. The effect of the failure on the overall system is determined based on the configuration of the block diagram. The effect could be that the entire system fails or it could be a percent reduction in the total output (throughput) of the system.
To perform a complete RAM analysis, the following information is required:
- System diagrams/drawings.
- Failure data.
- Repair duration data.
- Process capabilities of individual machines.
- Repair costs.
- Maintenance types and intervals.
- Repair crew availability. (This should not be problematic considering on-board crew are available 24/7)
- Spare parts availability and costs.
The results of such an analysis may include:
- Number of failures
- Number of spares used
- Production output
- Life cycle costs
Having the system RBD model will also help later in the “Improve” phase to perform what-if analyses and investigate the effect of any proposed changes/improvements.
Root Cause Analysis (RCA)
RCA is a method to logically analyze failure events, identify all the causes (physical, human and primary) and define corrective actions to prevent their recurrence. It is a critical activity in understanding failures and being able to determine corrective actions. Without a formal RCA procedure, the wrong remedies might be frequently implemented.
The main objective of an APM process is to drive improvements, thus the “Improve” phase represents the most critical step of the process. During this phase, the objective is to identify the improvements that can increase the performance of the asset and optimize it, including:
- Defining the most appropriate maintenance policy.
- Determining the optimum maintenance task intervals.
- Determining adequate spare part provisions.
- Applying design changes when necessary/feasible.
- Driving new requirements to suppliers.
- Adding cost information to the simulation in order to run a dynamic life cycle cost (LCC) analysis.
As an example, the following section provides a brief overview of one of the most commonly used reliability tools that can be employed in this phase: calculating the optimum preventive maintenance (PM) interval.
Calculating the Optimum PM Interval
We can use the following equation to find the optimum interval for a preventive maintenance action. The equation is solved for the time, t, that results in the least possible cost per unit of time.
- R(t) = reliability at time t. This is determined by performing life data analysis on available data.
- CP = Cost per incident for planned (preventive) maintenance.
- CU = Cost per incident for unplanned (corrective) maintenance.
This calculation is also demonstrated graphically in the following picture.
Every time the APM process is initiated, it is imperative to execute activities that can sustain the achieved results. As such, certain activities to monitor and control the performance need to be applied during the “Control” phase, including:
- Implementing the new maintenance tasks and new intervals via the maintenance management system.
- Seeking continuous improvement (e.g., by monitoring KPI levels and defining new targets when applicable).
- Monitoring the asset’s performance using reliability growth/tracking models. For example, the Crow-AMSAA model is typically used to model the reliability performance of assets over time (e.g., month-to-month).
Another critical function in this phase is sustaining the knowledge acquired by all previous activities, as well as retaining the analyses that have led to a particular action or change. Failing to retain this knowledge can lead to “reinventing the wheel” down the road, as well as the risk of repeating past mistakes. Different activities (including analysis, action plans and decisions) should be recorded properly and stored in a location where other professionals involved in the asset’s management can access the information in the future.
In this article, we suggested to Maritime and Marine Offshore of an Asset Performance Management modular service. We reviewed the role of reliability engineering methodologies in asset performance management, and we proposed a flexible APM process for deploying different reliability tools and methods where they can be most effective. The proposed process is general enough to be easily adopted by Maritime and Marine Offshore and can be used in conjunction with current reliability practices. Equally, our services apply to Equipment Manufacturers who wish to provide Condition Monitoring, Prognostic Health Monitoring (PHM) to the industries mentioned. RAM greatly complements PHM.
Thanks goes to Wilde Analysis and Reliasoft for their material and support in bringing Reliability Analyses and Management to the maritime industry, which will provide the step-change in the maturity level of Marine Asset Management and Assurance.