Calculations for Functional Safety
Quantities, Formulas and Methods

A Models for basic events

The basic events of fault trees or the edges (transitions) of Markov models either model events or conditions (states). Different models are required for this, depending on whether we are dealing with one-time or multiple random events, regular events, or conditions (states). The most important ones are mentioned below, but there are also others. The exact definition of each model might differ between different tools, since there is no harmonization or at least de-facto standard available.

A.1 Model restorable

With this model, also called "Dormant Failure Model" or "Repairable" or "Testable" 24, for example, the following failure scenarios can be modeled:

  • • failures that lead directly to system failure (and thus are detected immediately).

  • • failures that lead to system failure only in the presence or subsequent occurrence of other events, but which can be detected by regular tests.

This model is the standard model, it can be used to represent almost all failures of technical components such as electrics/electronics, hydraulics, pneumatics etc., from a simple wire up to complete control systems.

If neither detection time nor repair time are negligible, the unavailability \(Q(t)\) is very well approximated by the formula (48) mentioned in chapter 4:

\[ Q(t) = 1-\frac { e^{-\;\dfrac {\lambda \cdot (t \bmod T_{\mathrm {test}}) }{ \lambda \cdot \mathrm {MRT} + 1 }} }{ \lambda \cdot \mathrm {MRT} + 1 } \]

with the test interval \(T_{\mathrm {test}}\) and the mean repair time per failure \(\mathrm {MRT}\) (including time to replace).

The mean unavailability \(\overline {Q}\) is given by the formula (44) derived in chapter 4.

\[ \overline {Q} = \frac { e^{-\lambda \cdot T_{\mathrm {test}}} - 1 } { \lambda \cdot T_{\mathrm {test}} + \lambda \cdot \mathrm {MRT} \cdot ( 1-e^{-\lambda \cdot T_{\mathrm {test}}} ) } + 1 \]

as well as by formula (45) in case of negligible detection time \(T_{\mathrm {test}} \rightarrow 0\)

\[ \overline {Q} = \frac { \lambda \cdot \mathrm {MRT} } { \lambda \cdot \mathrm {MRT} + 1 } \]

or formula (46) in case of negligible repair time \(\mathrm {MRT} \rightarrow 0\):

\[ \overline {Q} = \frac {e^{-\lambda \cdot T_{\mathrm {test}}}-1} {\lambda \cdot T_{\mathrm {test}} }+1 \]

The model can often also be used if there is no defined test or inspection interval, but a failure reveals itself during operation in a non-critical situation. If there are no tests and a failure only reveals itself, if another event occurs, the test interval must be set to the nominal operation time, or the model "non-restorable" must be used.

If the failure rate is not constant, a mean failure rate must be calculated using formula (26) in conjunction with (25):

\begin{equation} \lambda _{\mathrm {eff}} = \frac {1}{\mathrm {MTTF}} = \frac {1}{\int \limits _0^\infty t\cdot f(t)\,dt} = \frac {1}{\int \limits _0^\infty t \cdot h(t) \cdot \mathrm {e}^{-\int \limits _0^t \sum \limits _{i=1}^n h_i(\tau )\,d\tau }\;dt} \end{equation}

The model can also be used as a description of conditions.

24 Some tools provide separate model for Testable and Repairable. The clear separation simplifies understanding, however, sometimes both a failure detection time greater than zero and a repair time greater than zero are needed

A.2 Model non-restorable

This model can be used to model the following failure scenarios, for example:

  • • Failures that lead directly to system failure,

  • • failures that only lead to system failure, if other events have already occurred or are still occurring, and which are recognized also only then.

Only with this event model it makes sense to consider time-variant failure rates (e. g. Weibull distributions) for a transient calculation.

The unreliability is calculated according to the known formula (8) to be

\begin{equation} \label {eq:F_h_wdh} F(T) = 1-\mathrm {e}^{-\int \limits _0^T h(t)\,dt} \end{equation}

The average failure rate with respect to \(F(T)\) is thus given by

\begin{equation} \label {eq:h_mean_wdh} \begin{split} F(T) &= 1-\mathrm {e}^{-\int \limits _0^T h(t)\,dt} = 1-\mathrm {e}^{-\overline {h(T)} \cdot T} \\ \Rightarrow \overline {h(T)} &= \frac {1}{T} \int \limits _0^T h(t)\,dt \end {split} \end{equation}

Thus, if an unreliability \(F(T)\) is to be calculated for a non-repairable element with a principally time-dependent failure rate, either the unreliability must be calculated via integration using formula (75), or a constant failure rate must be used, which was calculated according to formula (76). It is not allowed to use the mean effective failure rate calculated according to formula (29). \(\lambda _{\mathrm {eff}}\) must be used!

The average unavailability over the planned operating time \(\overline {Q(T)}\) is given by

\begin{equation} \label {eq:nonrep_qmean} \overline {Q(T)} = \frac {\int \limits _0^T F(t)\,dt}{T} \end{equation}

The maximum unavailability is given at the end of the planned deployment time.

The model may obviously only be used if it is certain that the component is not defective at time \(t=0\). It is therefore forbidden to assume any short (planned) operating times, such as a single flight or a single car trip, since before these it is not ensured that all components are as new (in contrast to a space mission).

The model can also be used as a description of conditions.

A.3 Model constant

In particular, for constant unavailabilities \(Q=\mathrm {const}\), but also for the (constant) probability that an external boundary condition is satisfied, the model "constant" is used.

The model is used almost only as a description of conditions.

A.4 General recommendations on basic models

It is useful to combine multiple failure modes into one basic event. It is important that all failure modes to be modeled by this one basic event, do not differ with respect to the modeling of the restoration, i. e., in particular, have the same fault detection time and, if applicable, repair time.

In the case of complex components (such as an electronic board or an entire control system), it is neither necessary nor useful, to record each of the tens of thousands of failure modes individually as a basic event. Typically, the number of failure modes observable at the interfaces is quite manageable anyway (typical: binary signal high instead of low or vice versa, analog signal too high/too low, bus communication failed/life sign invalid, bus variable unrecognizably wrong).

Thus, as a rule, it makes sense to describe a complex assembly by two to four basic events:

  • 1. one basic event for failures that are detected immediately,

  • 2. one basic event for failures that are detected during daily tests,

  • 3. one basic event for failures detected during regular inspections,

  • 4. one basic event for failures that are never detected or detected only when requested.

All possible failures are typically assigned to one of these categories in an FMEA. The failure rate for each of these maximum four basic events is the sum of the individual failure rates of all failure modes, which have been assigned to the respective category in the FMEA.