Fault tree analysis is the most common method for hazard analysis. The algorithms provided by Functional Safety Suite allow calculations of all values necessary for safety analysis or reliability analysis. In particular, adequate algorithms for calculation of occurrence rates related to repairable systems are implemented, therefore fault trees can also be used for occurrence rate (PFH, failure rate, hazard rate) calculations in accordance to [EN 61508], [EN 50126], [EN 50129], [ISO 13849], [ISO 26262-5] or similar.
Probably the most cited book related to fault tree analysis is the “Fault Tree Handbook” [NUREG], published in 1981 by the US Nuclear Regulatory Commission following the Three Mile Island accident in 1979. In 2002 NASA published the “Fault tree handbook with aerospace applications” [NASA]. Even though this book refers to [NUREG], its focus is different and just its existence already shows, that the spectrum of problems is too large to be explained in one book.
The most remarkable difference is, that in [NASA] the class of technical processes to be analyzed is a mission characterized by
whereas in [NUREG] as well as in all machinery or transportation related standards such as [EN 61508], [ISO 13849], [EN 50126], [ISO 26262-5] the problems can be characterized by
Both [NUREG] and [NASA] are available on the Internet for free and describe in detail the method of fault tree analysis. In addition they provide detailed information of how to construct a fault tree correctly. Therefore this documentation focuses on the specific characteristics and the usage of Functional Safety Suite. Note that [EN 61025] does not cover repairable systems and hence is of very limited use (in particular, it doesn’t cover calculation of a system failure rate h).
According to [NUREG] a
fault tree is a graphic representation of the various parallel and sequential combinations of faults, that will result in the occurrence of the predefined undesired event.
Typically a fault tree analysis is used on a high level. It is appropriate whenever the architecture of a safety function includes some kind of redundancy. A fault tree analysis is a deductive method, thus fault trees are always developed top-down (already having basic events in mind when starting to create a tree is one of the most common mistakes).
A basic event of a fault tree can describe the status of an element of the system (a situation or condition lasting for a while) or the occurrence of something in just a moment (a failure, or an action of the operator for instance).
Each basic event is assigned a failure or occurrence model with a specific set of parameters. Based on these parameters, the (conditional) occurrence rate h (unit 1∕h), the unconditional occurrence rate for repairable elements w (unit 1∕h), the (unconditional) failure density f (unit 1∕h), the unavailability Q, and the unreliability F can be calculated for each basic event. The occurrence rates h or w and the unavailabilities Q of the basic events are needed to calculate both occurrence rates and unavailabilities of higher level gates. The mean unavailability Q of the top event is the PFD, its mean occurrence rate h is the PFH. For many systems, the system unreliability Fsys(0,Tmission), i. e. the probability of mission failure can directly be calculated based on the unreliabilities F of the basic events. If there are conditions in the fault tree, i. e. elements that are described by their unavailability Q instead of F, the system unreliability must be calculated based on the mean occurrence rate hsys or by the time-dependent failure density fsys(t) = hsys(t) ⋅ (1 - Fsys(t)).
If used for THR apportionment (often named preliminary FTA), the values for the basic events are defined based on what seems realistic and achievable with reasonable effort. Thus, experience is necessary to perform a preliminary FTA. The parameters assigned to each basic event serve as the tolerable probability of failure on demand (TPFD) or tolerable failure rate (TPFH, TFFR), that has to be achieved by the responsible system element.
Features of Functional Safety Suite related to fault trees:
Note: The structure of a fault tree is often more important with respect to the correctness of the derived conclusions than the actual quantitative values. It is absolutely necessary that the structure of a fault tree reflects reality and that no important events are omitted because of rules or “political” reasons. All relevant conditions (e. g. responsibilities, maintenance cycles etc.) must be known in order to enable the safety engineer to develop a correct fault tree. Where the relevant conditions are not known, assumptions can be used, but these must be mentioned explicitly. In fact for most systems a small number of critical elements (basic event) can clearly be determined, e. g. by a importance analysis (see section 11.6.2). It makes sense to concentrate on elements with high impact and to validate that they are correctly modeled.
Presentation related properties of the fault tree are edited in the fault tree properties panel directly, see below. Evaluation related properties are set in the fault tree evaluation properties dialog, see section 7.5. All properties of the fault tree are stored in the fault tree file (extension .ftl).
Description: A user defined description of the fault tree.
Note that in case the presentation related features don’t fulfill your needs, you can export all graphics in SVG format for further processing by vector graphics tools.
Horizontal offset: The margin between the window border and the left edge of the leftmost basic event. Standard is 5 [pixel] (multiplied by the zoom-factor). A bigger value makes sense for trees with few basic events in order to create space for the tree description (avoid overlap of description and top event).
Vertical offset: The margin between the window border and the upper edge of the top event. Standard is 5 [pixel] (multiplied by the zoom-factor).
Event width: Select the width of the name boxes in fault trees. The description boxes of basic events have the same width, description boxes of gates are about 20% wider.
Standard is 118 pixel, allowing to display both occurrence rate and unavailability in one line. If you only want to display unavailabilities or no values at all, you can enter a smaller width. If you need more space especially in description boxes, you can enter a larger width.
Header X Position: The margin between the window border and the left side of the header. Standard is 5 [pixel] (multiplied by the zoom-factor). You can shift the header to the right by setting to a higher value.
Show values: If values shall not be shown, you can switch them off here.
A basic event of a fault tree consists of the reference to the generic basic event, an optional suffix, the ‘second text line’ (for qualitative fault trees only), and the background color.
The parameters in the ‘Tree Basic Event’ section belong to the specific basic event, which is part of a fault tree, and thus are stored in the fault tree file (extension .ftl).
The parameters in the ‘Generic Basic Event’ section belong to the generic basic event as selected by the field ‘GBE name’, and thus are stored in the library.
Remember: Changing parameters in the ‘Generic Basic Event’ section will change the properties of all other basic events referring to the same generic basic event too.
Package Select whether the generic basic event is in the library of the global package or of the local package.
GBE Name The identifier of the generic basic event, also serving as the name of the basic event. You can select a name (and by this the referred generic basic event) out of a list of the generic basic events belonging to the selected package.
Suffix Two possibilities exist with respect to common causes in general (also see section 7.6.1):
Thus the suffix is used to distinguish multiple basic events referring to the same generic basic event, that are not identical, but share a common cause factor β > 0.
Since dots ‘.’ are used automatically to delimit the name from the suffix and also to indicate multiple parts of the suffix in prime implicants (cut-sets), they should not be used in the name or suffix (even it would be technically possible).
For qualitative fault trees a ‘second text line’ is displayed instead of the numeric values in the name field of each event. It doesn’t get lost when changing the project type in the project properties dialog to quantitative evaluation. Since this text is specific to each basic event, it is a property of the basic event, not the underlying generic basic event. Therefore it is stored in the fault tree file.
If you want to check the fault tree according to [SiRF]-rules, the ‘second text line’ must begin with either ‘SAS’ or ‘SL’, followed by one optional space, followed by a number 0 to 4. After that arbitrary text is allowed. Also see section 7.7 and the example provided with Functional Safety Suite.
The background color can be selected separately for each event.
Description A user defined description of the generic basic event and therefore identical for all basic events referring to this generic basic event.
House event, Condition event, Not developed See section 4.2.1.
The probabilistic model of the generic basic event. See section 4.3 for details.
The values needed by the model of the generic basic event. See section 4.3 for details.
The following types of gates are supported:
Events connected to a gate are called ‘inputs’. The gate an event serves as input is called ‘parent’. The topmost event of a fault tree is called top event, in case of the highest level fault tree also ‘top hazard’.
For evaluations, And, Priority-And, Or and Combination gates must have at least one input. If only one input is connected to a gate of one of these types, the gate has no effect; in case of an And or and Or gate, the symbol is not displayed, therefore.
Inhibit gates have either two or three inputs.
Not gates have exactly one input.
Reduced combination gates have exactly one graphically displayed input, that is internally duplicated up to 170 times.
Each input can be a basic event or another gate. The only exception is the Not gate: Since the Not-operation cannot be applied to rates or densities, but only to probabilities, a Not gate cannot deliver an occurrence rate to a higher gate. Therefore it only makes sense in the condition branch of an Inhibit gate.
The Transfer-In gate in fact has one input, but this is not drawn, but instead stated as verbal reference to a gate in another fault tree. The referred fault tree must either be a member of the same package or of the global package.
All properties of the gate are stored in the fault tree file (extension .ftl).
Name: A user defined identifier of the gate. Every gate should have another identifier although this is not required by Functional Safety Suite.
Description: A user defined description of the gate. The description of Transfer-In gates is copied from the referred event, whenever you change the fields for the referred tree name or the referred event name.
AND The And gate has at least one input event. Each input can be a basic event of any kind or a gate.
Common cause contributions are considered when calculating a gate, see section 7.6.1.
PRIORITY-AND In order to be true, a Priority-And gate requires all inputs to become true in a certain sequence, starting with the leftmost input to the rightmost.
Evaluation of this kind of gate requires translation to a Markov model. This is done automatically during evaluation, however you should be aware of this for two reasons:
INHIBIT The Inhibit gate has either two or three input events. In case of two inputs, the first is the event input, the second is the condition input. In case of three inputs, the first is the event input if the condition is true, the second is the event input if the condition is false, and the third is the condition input. In any case, the condition input is displayed beside the gate on the right.
In case of two inputs, the gate’s occurrence rate and unavailability are those of the event below the gate times the probability Q of the event connected to the condition input. Thus the Inhibit gate with two inputs is similar to an AND gate, except of that the occurrence rate of the second input is ignored. Since the first input often describes a standard situation but not a failure, it is often a basic event marked as House Event.
In case of three inputs, the gate’s occurrence rate and unavailability are those of the first event below the gate times the probability Q of the event connected to the condition input, plus those of the second event below the gate times the negated probability 1 -Q of the event connected to the condition input. Thus the Inhibit gate with three inputs is kind of an If-Then-Else gate. It is in particular useful to model diagnostics: The first event describes the situation with defect diagnostics (e. g. long detection time, some additional dangerous failure modes), the second event the situation with working diagnostics (e. g. short detection time, no failure modes that aren’t dangerous if detected). Note that due to the intended purpose of the gate, the two event inputs of the three input inhibit gate are mutually exclusive per definition, i. e. the combination of the two event inputs is not included in the list of prime implicants.
Note that the use of the three input Inhibit gate will form a non-coherent fault tree, including all the negative characteristics of non-coherent fault trees. Also see the examples containing Inhibit gates.
Since the condition is quantified by a probability only (no occurrence rate), the basic events of the branch connected to the condition input must be marked with the modifier Condition Event, see section 4.2.1.1.
When converting a fault tree to a Markov model, an Inhibit Gate will lead to instantaneous transitions.
There are two special cases related to the condition input of a two-input Inhibit gate:
These rules might be useful in order to adapt the structure of a (generic) fault tree to different specific applications, i. e. to simplify the re-use of a fault tree for different projects. You can check the effect if you export the final fault tree by Export – Final Fault Tree.
NOT The Not gate has exactly one input event.
A Not gate only makes sense within the condition branch of an Inhibit gate, or if only the unavailability Q or the unreliability F(T) is calculated, since a frequency and thus an occurrence rate cannot be inverted. In fact, if you want to calculate the occurrence rate h (PFH), a Not gate outside a condition branch is a modeling error, and the model cannot be evaluated.
OR The Or gate has at least one input event. Each input can be a basic event of any kind or a gate.
COMBINATION The Combination gate has at least one input event. Each input can be a basic event of any kind or a gate.
The output of a Combination gate is true (faulty), if at least M of the inputs are true (faulty).
In fact the Combination gate is just an abbreviation of or-ed And gates according to M and the number of inputs. Therefore calculation is done as with standard Or and And gates. You can check that by calculating and exporting the prime implicants or by exporting the resulting fault tree by Export – Final Fault Tree. Identical events or common cause factors between basic events contained in the branches below the Combination gate will be considered as if it would be one single fault tree.
Starting with version 4.0 of Functional Safety Suite, the number of inputs not limited anymore.
REDUCED COMBINATION The Reduced Combination gate has exactly one input. The input can be a basic event of any kind or a gate. Logically the one graphically displayed input is handled as if n independent inputs of this kind would be connected to the gate.
The output of a Reduced Combination gate is true (faulty), if at least m of the n (virtual) inputs are true (faulty).
In contrary to all other gates, Reduced Combination gates numerically evaluate the input event (they calculate qi and hi or Fi of the connected input). Unavailability qg and occurrence rate hg or unreliability or Fg of the gate are then calculated as function of n, m, qi and hi or Fi and stored in a new temporary generic basic event, referred by a new temporary (internal) basic event of type Link. The formulas used to calculate higher gates only consider this basic event and thus do not get information of the structure below the Reduced Combination gate.
The suffix of the temporary (internal) basic event can be set to ‘#’ in order to create multiple instances of the element described by the Reduced Combination gate in higher level fault trees. This is achieved by adding an ‘#’ at the end of the name of the Reduced Combination gate, also see section 7.6.1.
TRANSFER-IN A Transfer-In gate is a reference to another event defined by the name of the referred fault tree (which must be member of the same local package or the global package) and the name of an event in the referred fault tree. The referred event can be any basic event or gate in the tree, not only the top event. The referred fault tree cannot be the tree the Transfer-In gate belongs to. Regarding calculations, sub-trees referred by Transfer-In gates are treated as if they would be stated directly in the higher level tree. Therefore you can split trees wherever you want. Please note the possibilities for handling of basic event names and common cause factors as described in section 7.6.1.
Note: Circular references are forbidden for obvious reasons. They are detected in the evaluation, the evaluation will be aborted and the error indicated in the status bar.
You can open the referred fault tree by double-clicking the gate.
For qualitative fault trees a ‘second text line’ is displayed instead of the numeric values in the name field. It is stored in parallel to the numeric values and therefore doesn’t get lost when changing the evaluation mode in the project properties dialog (tab Fault Trees & RBDs) and vice versa.
If you want to check the fault tree according to [SiRF]-rules, the ‘second text line’ must begin with either ‘SAS’ or ‘SL’, followed by one optional space, followed by a number 0 to 4. After that arbitrary text is allowed. Also see the example in the doc-directory.
The background color can be selected separately for each gate.
The value of interest for each safety function is either
The value of interest and several parameters related to quantitative evaluation of fault trees are set in the fault tree evaluation properties dialog, see below.
To evaluate a fault tree, select Calculate – Calculate Model Values. First the final tree is determined, this is the fault tree in which all Combination gates and Transfer-In gates have been replaced by the adequate branch, and all Reduced-Combination gates and Priority-And gates have been replaced by a link to another model. Also Inhibit gates might have been eliminated, see section 7.4.2.3. Circular references are detected and indicated in the message window before the evaluation starts. The final tree can be exported to a new fault tree by Export – Final Fault Tree, e. g. in order to check if all modules referred by Transfer-In gates have been considered as intended, see section 11.7.9.
After that, prime implicants and/or Binary Decision Diagrams (BDDs) describing the unavailability Q, the unreliability F and/or the occurrence rates h or densities w are created. Then all lower level models connected by links and all generic basic events are evaluated, so that finally the prime implicants and/or BDDs can be quantified. Depending on your choice in the fault tree evaluation properties dialog, either only the top event is calculated or all gates.
Results of the evaluation are displayed at two places. The values of the top event are displayed in the header of the fault tree. In addition the values of interest are displayed in each event’s symbol, see section 7.5.1.1.
Note: If the fault tree is referred in another fault tree by a Transfer-in gate, these parameters are irrelevant, since this fault tree will just be a branch of the higher fault tree during evaluation.
Calculation Value Select which value(s) to calculate:
In steady-state evaluation, the mean occurrence number N(T) is calculated by
![]() | (34) |
with T being the system lifetime or mission time. If the unreliability F(T) is calculated via occurrence rate (h), it is calculated by
![]() | (35) |
In transient evaluation, the occurrence number N(t) is calculated by
![]() | (36) |
If the unreliability is calculated via occurrence rate, it is calculated by
![]() | (37) |
The mean values for occurrence rate h and unavailability Q are calculated by
![]() | (38) |
and
![]() | (39) |
For small values of N (N ≪ 1, as fulfilled typically for higher level events), F ≈ N is valid. For lower level events h often does not have the meaning of a failure rate, but determines usually occurring events, so that often N ≫ 1 applies for a longer system lifetime, and F ≈ 1 accordingly.
Evaluation Mode Select whether the fault tree shall be evaluated in steady-state mode or in transient (time-variant) mode. In case of transient evaluation, the time interval must be set as well.
Quantitative steady-state evaluation A steady-state analysis is appropriate for all systems that are supposed to operate for many years, with certain test intervals and optionally some down-times for maintenance and repairs. Several parts of the system might be replaced or repaired during the system’s lifetime. In case that all failures are detected in adequate time (either by continuous diagnosis, by periodic tests or by malfunction of the system), both the failure rate h (PFH) and the unavailability Q (PFD) of the system don’t depend on its actual age, but will reach some pseudo-stationary state where both values will oscillate around a mean value. The frequency of this oscillation is equal to the longest detection interval or a multiple of it. This is even correct in case the failure rates of some particular components depend on their specific age, if the lifetimes of these components are shorter than the system life time. The value of interest for each safety function performed by such a system is either the mean unavailability on demand Q (PFD), or the mean occurrence rate h (PFH). The related standard is mainly [EN 61508] and the derived standards. Examples for those systems are machines, cars, trains, air-crafts, chemical plants, power plants, etc. and their control systems.
A steady-state analysis is very fast, because all values have to be calculated only once (compared to time-variant analysis, where all values have to be calculated many times, see below).
Unfortunately, there is one major issue related to steady-state calculation of unavailabilities: The mean value of the product of two (or more) time-variant values is in general not equal to the product of the mean values:
![]() |
In most applications, using mean values of the components unavailabilities is too optimistic. Of course you can use maximum values of the components (basic events) unavailabilities, i. e. the unavailability just before the next test, but this is quite pessimistic. Functional Safety Suite provides three options on how to deal with this issue in steady-state evaluations, see section 7.5.2.1. The only way how to calculate the exact values is using a transient evaluation, but this needs much more computing time, unfortunately.
Quantitative transient evaluation A transient (or time-variant) analysis typically produces more precise results for
Also if the ‘mission failure probability’, namely the system’s unreliability F(0,Tmission) is the value of interest, a transient evaluation usually makes sense, even though many systems of this kind can also be modeled by steady-state fault trees or Markov models, using generic basic events of type non-repairable, see section 4.3.2.
Time interval: The step size for transient evaluation in hours. A smaller step size means more steps for the given system lifetime and thus takes more time in calculation. This might be an issue in case of large systems. The step size must be less than a 10th of the smallest periodic (cyclic) event you want to evaluate for discrete times. Time constants less than 10 times the step size are handled as rates. However step size should be even smaller in order to reduce the computational errors.
If a value of a generic basic event, the suffix of a basic event or the structure or evaluation parameters of the fault tree is changed, all values that might be affected by this change are automatically marked invalid and not displayed anymore, so that no inconsistent values are displayed.
Gate Calculation Mode Usually you should calculate all gates, because the intermediate gate values typically help understanding the fault tree and the critical paths. However, in order to save calculation time, you might want to calculate the top event only.
Calculate top event only: Only the top event of the fault tree is calculated, but no lower gates. This option just saves evaluation time, the top results will be the same.
Calculate all gates: All gates of the active fault tree are calculated. Select this option if you want to analyze where the top event’s results come from.
Note: If the fault tree is referred in another fault tree by a transfer gate, these parameters are irrelevant, since this fault tree will just be a branch of the higher fault tree during evaluation.
Steady-State Evaluation Unavailability Mode The unavailability function Q(t) of a repairable component is a periodic function: It becomes Q(t) = 0 after each (complete) test at time tn = n ⋅ tcheck + t0 and increases until the next test at time tn+1 = (n + 1) ⋅ tcheck + t0.
The following examples show the unavailability as function of time for a system of two repairable (and therefore periodically tested) components. The first example considers non-redundant components (connected by an OR gate), the second considers redundant components (connected by an AND gate).
Example 1: A safety system consists of two components (both needed, not redundant) with two failure events E1 and E2.
E1 has a failure rate of λ1 = 10-4∕h and is tested every 150 h.
E2 has a failure rate of λ2 = 10-3∕h and is tested every 30 h.
Figure 37 shows the unavailability function Q(t) if at t0 = 0 both components are tested. Due to the given test intervals, also at t = n ⋅ 150h both components will be tested in parallel. The dotted lines are the single event unavailabilities, the solid line shows the overall unavailability, which is approximately the sum of both single event unavailabilities.
Figure 38 shows the unavailability function Q(t) if the test for E1 is executed at t1 = t0, whereas the test for E2 is executed at t2 = t0 + 15 h. Since there is no complete test, the overall unavailability never evaluates to 0 (at least not for t > 0).
Note that the mean value is the same, independent of the relation of the test times.
The lifetime is usually much longer than a period of Q(t), because the component is tested or maintained several times during the lifetime.
Another special case is that the component/function is used only for one mission, e. g. it is created/tested before or at the start of the mission and becomes invalid after the end of the mission. In that case the interval is identical to the mission time.
Example 2: A safety system consists of two redundant subsystems S1 and S2. S1 has a failure rate of λ1 = 10-4∕h and is tested every 150 h. S2 has a failure rate of λ 2 = 10-3∕h and is tested every 50 h. With these values the mean unavailability of S1 is Q1 = 7.45 ⋅ 10-3, of S2 it is Q 2 = 2.45 ⋅ 10-2. Multiplication of both values gives 1.83 ⋅ 10-4.
Figure 39 shows the unavailability function Q(t) if at t = 0 h both subsystems are tested.
The mean unavailability is Q = 2.03 ⋅ 10-4 but not 1.83 ⋅ 10-4! Simple multiplication of the values is obviously not correct and not even conservative.
Figure 40 shows the unavailability function Q(t) if the test for S1 is executed at t1 = 0 h, whereas the test for S2 is executed at t2 = 15 h.
With this shift the mean unavailability is Q = 1.77 ⋅ 10-4, with a shift of t 2 = 25 h it would be Q = 1.72 ⋅ 10-4.
As shown in example 2, when performing the conjunction of events (using an AND-gate) the mean overall unavailability is not given by the product of the mean unavailabilities of each event. For synchronized tests without shift, the mean unreliability of the system is always higher than the product of the mean unavailabilities of its events. Also refer to [EN 61508-6], section B.2.2.
Optimistic In ‘optimistic’ mode, the mean unavailabilities of generic basic events are used, as calculated according to section 4. As explained here before, this is usually optimistic.
Corrected In ‘corrected’ mode, also mean unavailabilities of generic basic events are used, but combinations of unavailabilities are multiplied by a factor greater than 1 (depending on the length of the cut-set), so that the result is for sure not too optimistic (however it might be pessimistic). Unfortunately this correction cannot be performed on BDDs directly, and thus if unavailability is calculated by BDDs (see section 7.5.2.2) the BDDs need to be converted to a sum of products of events, what requires quite some calculation effort for large fault trees.
Safe In ‘safe’ mode, the maximum unavailabilities of generic basic events are used, as calculated according to section 4. Obviously this is pessimistic for most basic event types, and the longer the cut-sets, the more pessimistic is the result.
Hint: If a fault tree contains components modeled by basic events of type repairable, standby or link, that are tested at the same time and only rarely (e. g. in a preventive maintenance once per month or year), so that their unavailabilities are not obviously negligible, you should perform a transient evaluation instead of a steady-state evaluation to get precise results. If a fault tree contains non-repairable events, you should always go for transient evaluation.
Calculation by BDDs Calculations based on BDDs are both accurate and very fast even for huge fault trees. The only reason not to select this option is in steady-state evaluation with unavailability mode ‘corrected’ (see section 7.5.2.1) for big fault trees.
Calculation by PIs Calculations based on PIs are more or less pessimistic due to missing disjointedness between cut-sets (or prime implicants). The only reason to select this option is in steady-state evaluation with unavailability mode ‘corrected’ (see section 7.5.2.1) for big fault trees.
Unreliability Algorithm Up to version 4 of Functional Safety Suite, the unreliability has been calculated based on the occurrence rate h or h(t). This method is suitable for all kinds of systems, but a little conservative and quite slow. Version 5 provides an algorithm to calculate the unreliability directly. This is quite simple and fast, and in fact this is how all “traditional” fault tree tools calculate unreliability, and what is explained in all books including [EN 61025]. Unfortunately, this is wrong in case the fault tree contains conditions, as demonstrated in the following example.
Example 3: A top event ‘TE’ occurs, whenever a periodic event ‘H’ appears and a condition ‘Cond’ is fulfilled when ‘H’ appears. This is modeled in the fault trees shown in figure 41. Event H appears every 1000 hours with a probability of 0.1. Thus, given a system lifetime of 200 000 hours, it will appear about 20 times (of course this is no fix value, but the expected value). In the left fault tree, the top event’s unreliability is calculated “directly”, i. e. by Fsys = FH ⋅ QCond = 0.1. Even though both FH and QCond are probabilities, they must not be multiplied, because they are different quantities. The correct result has to be calculated based on the occurrence rate by Fsys = 1 - exp(-hsys ⋅ T) with hsys = hH ⋅ QCond = 1E-5 /h, as shown on the right.
Thus if the fault tree doesn’t contain conditions, select Direct, if it contains conditions select Via Occurrence Rate.
In case of direct calculation of the unreliability, you can select whether it shall be calculated By BDDs or By Prime Implicants (minimal cut-sets). In fact there is no reason for using prime implicants – it is much slower than via BDDs and the result is pessimistic (by BDDs the result is exact).
Occurrence Rate Algorithm If the minimal cut-sets (or prime implicants, PIs) of a fault tree are known, the occurrence rate of the top event hsys can be estimated based on the occurrence rates and unavailabilities of the basic events by
![]() | (40) |
The PIs are determined by an algorithm using modified BDDs (in fact ternary decision diagrams, TDDs), which is very fast. The evaluation of the PIs doesn’t need much memory, but some computing time. In case of high unavailabilities (Q > 0.5) this estimation can be too conservative. However, since such high unavailabilities shouldn’t occur in a safety related system, the estimation is sufficiently precise for most problems.
In case of high unavailabilities, a more precise estimation can be calculated based on the (unconditional) occurrence density wsys and the unavailability Qsys:
![]() | (41) |
![]() | (42) |
This algorithm obviously needs some more calculations and is therefore slower.
Both algorithms are conservative estimations also due to the fact, that the prime implicants are not disjointed. The exact calculation requires disjointed prime implicants, which need to be determined by converting the PIs to BDDs again, one BDD per literal. This is a resource intense operation and thus is only possible for small to medium size fault trees. But once the BDDs have been created (if they can be created with given memory resources), numerical evaluation will be very fast.
In addition to these algorithms, an algorithm not using PIs at all is implemented, i. e. the occurrence rate is directly calculated by BDDs. This is the fasted algorithm. Unfortunately there is no formal proof of the correctness (or at least conservativeness), therefore you shouldn’t rely on it before having it crosschecked by another algorithm.
All possible combinations of algorithms are available for selection. Algorithms using the occurrence rates of the basic events are always somewhat faster than their counterparts using unconditional occurrence frequencies (by a factor of 2 approximately).
occurrence rate by PIs via rate The occurrence rate is calculated based on the occurrence rates and unavailabilities of the basic events contained in the PIs. Doesn’t need much memory, but high computing effort for transient evaluation. Result is conservative, in particular for high unavailabilities or large fault trees.
occurrence rate by disjointed PIs via rate PIs are sorted for literals and then being disjointed by BDDs. The occurrence rate is then calculated based on the occurrence rates and unavailabilities of the basic events. Needs much memory, but only medium computing effort for transient evaluation. Result is slightly conservative, in particular for high unavailabilities or large fault trees.
occurrence rate by BDDs via rate The occurrence rate is directly calculated based on BDDs, using occurrence rates and unavailabilities of the basic events. No PIs are determined. Needs few memory and only small computing effort for transient evaluation. Result might not be correct for some trees (deviates from PI based algorithms in both directions).
occurrence rate by PIs via density The occurrence rate is calculated based on the occurrence densities and unavailabilities of the basic events contained in the PIs. Doesn’t need much memory, but high computing effort for transient evaluation. Result is conservative, in particular for large fault trees.
occurrence rate by disjointed PIs via density PIs are sorted for literals and then being disjointed by BDDs. The occurrence rate is then calculated based on the occurrence densities and unavailabilities of the basic events. Needs much memory, but only medium computing effort for transient evaluation. Gives the correct result.
occurrence rate by BDDs via density The occurrence rate is directly calculated based on BDDs, using occurrence densities and unavailabilities of the basic events. No PIs are determined. Needs few memory and only small computing effort for transient evaluation.
Fault trees containing priority-AND gates are called “dynamic fault trees”, since they model a system, whose structure varies in consequence to some event(s). The branches topped by the priority-AND gates are automatically converted into Markov models when evaluating the fault tree in the background. Independent of this, each fault tree or branch of a fault tree can be converted to a Markov model by Edit – Convert Gate to Markov model on request by the user.
A complete conversion to a Markov model considers, that the system’s state can in fact “jump” from one (yet incomplete) chain to another chain, until a final state is reached. Please see section 9.5.2 for an example. The program can be told to consider this by selecting create complete chains. The internal conversion can get quite complex even for not very large fault trees. Therefore by default, only the direct Markov chains are created, which are equivalent to the minimal cut-sets. The difference is rarely visible, and typically not significant.
Two common cause scenarios are to be considered:
See figure 42 for an example. Given two components X.1 and X.2 of same type X. A common cause factor β of 1% is assumed between components of this type due to their construction, manufacturing and environment. For one function ‘A’ at least one of the two components X.1 and X.2 is needed, in another function ‘B’ component X.1 and another component Y is needed.
As expected, a common cause factor of 0.01 is used between the two components X.1 and X.2 of the same type X (see the values of ‘FKT_A_FAILS’) whereas events X.1 are treated as the identical event: ‘FKT_B_FAILS’ has no impact to the top event since event X.1 in ‘FKT_B_FAILS’ will always occur whenever event ‘FKT_A_FAILS’ occurs (obviously such a system architecture makes no sense). ‘FKT_B_FAILS’ is not even calculated, since component Y doesn’t occur in the cut-sets at all and therefore is not initialized. You can check the cut-sets by clicking Calculate – Show Minimal Cut-sets.
Note that if a basic event is selected, all basic events referring to the same generic basic event as the selected one are highlighted in blue color as well (see figure 42). This applies also to basic events in other models, thus when you switch to another model, you can also directly see the related basic events.
For the same reasons as in other fields of engineering, it usually makes sense to split a complex system into several ‘modules’ or ‘units’. This is the same for fault trees. In fact one will aim to reflect the architecture of the system in the fault trees.
See figure 43 for an example of a system, consisting of four modules, characterized as follows:
Units A could be sensors, whereas B1 and B2 could be computers working with the sensor data.
The fault tree of this system is shown in figure 44.
At least if units A and B are even more complex, you’ll feel a need to split this tree into several sub-trees. In principle Functional Safety Suite provides two possibilities: Transfer-in gates and links.
Modularization by Transfer-in gates Apart from dividing large fault trees to several pages, gates of type transfer-in are used in two cases:
The two cases are handled in a simple way in Functional Safety Suite:
However imagine that in case 2, the sub-tree refers to some events, that are not specific to each unit described by this sub-tree, but shared by all units of this kind (and maybe even other units). In order to describe also this case correctly, the following rule applies:
Rule 1: If the suffix of a basic event ends with ‘#’, a new basic event will be created internally when constructing the cut-sets during calculation. The suffix of the new basic event will be the name of the transfer-in gate plus ‘.’ plus the original suffix (excluding the ‘#’). In case of multiple levels of transfer-in gates, the names of all transfer-in gates will be stringed together, separated by ‘.’ and starting with the lowest level.
By this mechanism, the overall example system could now be split as shown in figures 45 and 46.
The suffixes ‘#’ of ‘A_Fault1’ and ‘A_Fault1’ will be extended to ‘A1’ and ‘A2’. Now you might also want to model units of type B with one generic fault tree B. If you’d just replace the gates ‘B1_Fail’ and ‘B2_Fail’ by transfer-in gates, rule 1 would apply again. Thus the suffixes ‘#’ of ‘A_Fault1’ and ‘A_Fault1’ would be extended to ‘A1.B1_Fail’, ‘A1.B2_Fail’, ‘A2.B1_Fail’ and ‘A2.B2_Fail’, i. e. you’d model independent units A1 and A2 for each of the units B1 and B2. But remember, that the same units (sub-tree) A1 and A2 are needed in all units of type B. So you’ll have to tell in tree B, that you mean identical A’s in all instances of B.
Therefore the following rule applies:
Rule 2: If the name of the transfer-in gate ends with ‘%’ , or if a transfer extension is stated, the names of higher level transfer-in gate names will be ignored.
Given this, it is possible to split the top tree as shown in figure 47.
In any case you can check the result by having a look at the list of minimal cut-sets (see section 11.6.3). It should always be as shown in table 4.
Cut-set |
A_FAULT1.COM
A_FAULT2.COM A_FAULT3 B_CompX.COM B_CompY.COM B_CompX.B1.X1 * B_CompX.B2.X1 B_CompX.B1.X2 * B_CompX.B2.X1 B_CompX.B2.X1 * B_CompY.B1 B_CompX.B1.X1 * B_CompX.B2.X2 B_CompX.B1.X2 * B_CompX.B2.X2 B_CompX.B2.X2 * B_CompY.B1 B_CompX.B1.X1 * B_CompY.B2 B_CompX.B1.X2 * B_CompY.B2 B_CompY.B1 * B_CompY.B2 A_FAULT1.A1 * A_FAULT1.A2 A_FAULT1.A2 * A_FAULT2.A1 A_FAULT1.A1 * A_FAULT2.A2 A_FAULT2.A1 * A_FAULT2.A2
|
Modularization by Links The linking mechanism (see sections 2.4 and 4.3.6) provides another possibility to split fault trees. In the given example, it can be used as shown in figure 48. The difference to using a transfer-in gate is, that the link hides all internal details of ‘IN’ from fault trees using this link.
In contrary to all other gates, reduced combination gates numerically evaluate the list of minimal cut-sets of the input event (they calculate qi and hi of the input). Unavailability qg and occurrence rate hg of the gate are then calculated as function of n, m, qi and hi and stored in a new temporary generic basic event, referred by a new temporary basic event. Minimal cut-sets of higher level events only contain this basic event and thus do not contain lower level information including any common cause factor.
Since they reduce information, they are called ‘Reduced’ Combination gates. This reduction has three reasons:
The suffix of the temporary (internal) basic event can be set to ‘#’, in order to create multiple instances of the element described by the reduced combination gate in higher level trees. This is achieved by adding an ‘#’ at the end of the name of the reduced combination gate.
The decision, whether fault trees are qualitative or quantitative is defined in the project properties dialog. No data gets lost if the type is changed between qualitative and quantitative (steady-state or transient). Thus it is possible to use the same project and the same fault tree for qualitative and quantitative evaluation.
Events of qualitative fault trees are sometimes classified in some way. For example a ‘safety level’ (SL) is assigned to each of them. Therefore qualitative fault trees provide a second text line, displayed in the name field below the name, intended to be used to indicate some “safety level” or another qualitative specifier. The content of the second text line can be entered separately for each event and belongs to this event, not to the generic basic event (as the quantity related values do). This is intended due to the fact, that the “safety levels” of events of qualitative fault trees are often determined top-down and therefore different safety levels might be assigned to the same event in different trees — or even within different branches of the same tree.
The second text line is stored together with the gate or basic event in the fault tree file (.ftl).
Since there is no mathematical rule how to “calculate” the SL of a gate based on the SL’s of the input gates or basic events, this task has to be performed manually. Therefore when evaluation type ‘qualitative’ is selected, the second line can be filled with an arbitrary text — as for example the assigned SL.
Nevertheless there might be rules of how to assign or apportion classes or SL’s. One example are the rules defined for the authorization of railway vehicles in Germany ([SiRF]). Functional Safety Suite includes an algorithm to check qualitative fault trees according to the [SiRF] rules.
Create a fault tree by File – New Model. Select a name and package for the new tree that will be created. Finally a simple fault tree, only consisting of the top gate, is created.
Select the top gate by clicking on it. Add a gate below by Edit – Add Gate.
Add a basic event based on a new generic basic event below the selected gate by Library – New Generic Basic Event. After entering a name, a new generic basic event will be created, and a basic event referring to this generic basic event will be added below the selected gate. Note that the new generic basic event is created in the local package of the fault tree by default, but you can also choose to create it the global package.
Add a basic event referring to an existing generic basic event below the selected gate by Edit – Add Tree Basic Event. The new basic event will refer to the latest generic basic event by default. You can select any other existing generic basic event by selecting it via its name and package in the Tree Basic Event Properties Panel, see section 7.3.
The sequence of events below a gate can be changed by ‘Shift-→’ and ‘Shift-←’.
A gate or a basic event can be deleted with the ‘Delete’ key. In case of a gate, the inputs of the deleted gate will be added to the parent gate.
A branch can be deleted by ‘Shift+Delete’.
If you want to assign a new (not yet existing) generic basic event to an existing basic event, select Library – New Generic Basic Event. The name of the basic event will change to the name of the new generic basic event, showing that it now refers to the new generic basic event.
You can copy or cut branches by selecting the top gate of the branch and pressing ‘Ctrl+C’ or ‘Ctrl-X’. The branch saved like this can be pasted below any gate of the same or another fault tree by ‘Ctrl+V’. Neither the names of the gates nor the suffixes of the basic events of the pasted branch will be changed automatically, so it’s up to you to change them according to their new meaning and relation to other events.
Changing properties of gates or basic events is done in the properties window. The only exception is the change of the name of a basic event (⇔generic basic event), for which a special command Library – Rename Generic Basic Event is foreseen. The properties of the generic basic event referred by a basic event can be edited in the library view as well, see section 4.1.
A fault tree that has not been saved after the latest modification is marked with an asterisk ‘*’ in its title.