Interpreting Reliability Metrics with Confidence.

By Jim Renfroe, Executive Vice President (Retired),
Production Optimization, Halliburton
(First published in JPT's guest editorial in January 2006)

The intensity of today's global energy appetite requires our industry to become increasingly demanding in terms of innovative, cost-effective technologies and day-to-day performance. And, these requirements are growing almost exponentially as we go after deeper and more-difficult-to-access reserves. As we stretch to achieve new limits, we realize that reliable performance in all facets of operations is pivotal to reducing downtime, improving efficiencies and optimizing production. In short, now more than ever, reliability is a business imperative – and achieving maximum reliability doesn’t happen without intentional focus and effort. There are no random acts of reliability.

Reliability has to be designed and built into oilfield technology, equipment, and processes by service companies who have made it a strategic imperative. Once achieved, reliability enables all of us to focus a bit sharper on maximizing financial performance while never diverting our attention from minimizing exposure and risk.

What is reliability? Essentially, it provides an answer to the question, “how much reliance should you place in a particular product and/or technology?” Stated another way, it is the probability that a device will perform its intended function under known conditions for a specified time. For my team, I define it as consistently providing the expected results over and over. That's reliability.

Suppliers have a unique opportunity to improve reliability because it is an attribute influenced and affected at all stages of product and/or service delivery:

  • Research, design, and testing procedures implemented during product development.
  • Process controls for continual verification and maintenance of tolerance levels during manufacturing.
  • Training for and continual evaluation of safe and efficient operating envelopes and procedures.

It is during the procurement process that operators have a unique opportunity to choose reliable technology in order to positively impact their overall economic equation. After all, service, maintenance and repair often contribute the majority of total life-cycle costs. Factoring reliability into the procurement equation can result in overall economic improvements.

When reliability is a core value for suppliers seeking to better meet operators’ needs, it echoes throughout the organization—from engineering to manufacturing to operations—with all involved consistently working to eliminate even the smallest factors that lead to down time and lost revenue. Reliability can be ascertained through metrics like functional details, life metrics, and MTBF/MTTF. But, a word of caution: metrics are open to interpretation.

What, then, are some factors to consider when assessing metrics? How can you be more assured of translating probabilities of reliability into actualities? How can metrics represent more accurate performance indicators to better assist in economic calculations during procurement?

Functional Details: Functional details relate to the boundaries of intended use, for example temperature, pressure, and load. Primary functional details are easy to quantify and qualify and are typical of most engineering specifications. But, are there other design parameters to consider? For example, a design specifies a small 5/8-inch diameter electric motor and gear train which will be used to manipulate a small high-pressure shuttle valve in a downhole tool for use in pressures as high as 15,000 psi and in environments up to 149 Celsius (300° Fahrenheit). The functional specification is easily matched for the environment and load requirements to shuttle the valve back and forth. However, in this example, a proper functional specification should note the motor-gear train assembly will be powered to a hard stop. Without this latter specification, the motor-gear train unit would likely not be designed to withstand the inertia effect of a hard stop.

Life Metrics: Life metric requirements define the retention of successful function to complete the planned mission. In this case, the electric motor and gear train assembly should be specified with life metrics such as 99% reliability for 1000 hours service or ‘X’ cycles of maintenance-free service. Combining and documenting functional details with life metrics provides a greater probability of achieving the desired mission.

So, in the case of the 5/8-inch diameter electric motor and gear train, both the functional specifications and the life metrics would deviate from what can be characterized as "commoditized indicators" of reliability. When interpreting metrics, the challenge is understanding that specific functional requirements may not necessarily be captured in traditional measures.

Meantime Between Failures or Meantime To Failure: MTBF or MTTF can be misleading if used as a sole reliability measure. MTBF is a mean time measure of repairable fatigue for maintainable devices such as replacing the oil and filter on your car every 4,800 kilometers (3,000 miles). MTTF is used in conjunction with non-repairable items that are discarded when they reach end of life, such as light bulbs. While this may be obvious, it is important to note that both MTBF and MTTF represent the mean value (average), not the median value (midpoint of entire data set) of when a device reaches a deficit in functionality. The only time the mean and median values are the same is when the dataset conforms to a symmetrical distribution, the bell curve. For datasets that conform to asymmetrically shaped distributions, the mean and median are no longer the same point in the data set. In these distributions, a group of fatigue-related incidents may be occurring at a faster rate either before or after the median time.

Another misconception of MTBF or MTTF is that either one represents a measure of the failure-free period or the time to the first failure. Since both measures are the average before reaching a fatigue event, there will be events for individual items occurring before and after. Therefore, using MTBF or MTTF as the only reliability metric should be done with knowledge of the risks associated with the possibility of selecting something that falls outside of the "average" band.

To better illustrate the explanation of the risk, here is an example. Figure 1 charts a studied evaluation of four different model numbers of a downhole tool. Unit #1 had the highest MTBF (2609) followed by unit #2 (1898), unit #4 (1046), and last, unit #3 (610). Typically one would anticipate unit #1 to be the most reliable regardless of service time since it has the greater MTBF; however, for a period of time unit #3 is slightly more reliable up to 220 hr and unit #2 is slightly more reliable up to about 840 hr. This may be important if the usage life or time between maintenance activities is under 840 hours. In some instances, the reliability curves can exhibit significant shape changes that cannot be detected by MTBF or MTTF values.


Achieving a highly reliable system requires relentless focus and evaluation of failure-free operating envelopes:

  • Identifying weaknesses in a design before a unit goes to manufacturing
  • Determining and maintaining calibrations during manufacturing to address secure interfaces
  • Analyzing the potential for parts obsolescence
  • Addressing the efficiency or simplicity of repair procedures
  • Developing a simple design without sacrificing sophistication
  • Instituting effective communication systems for reporting, analyzing and corrective action on individual units throughout the traverse from design to manufacturing to operations

Reliability-in-action means reliability designed and built in to all the components of a device, process or system. Properly interpreting reliability metrics can make the difference between minimizing exposure and realizing the potential of your assets.