Markov models are another commonly used method for hazard analysis and reliability analysis in general. The values describing the “quality” of a system,
can be calculated for Markov models as well. For more details see section 9.5.
In contrast to fault trees, Markov models are created inductively (bottom-up): Starting with the up-state (indicating that the system is completely ok), events that can occur in this state are identified, and the resulting system states respectively. Then for each new identified state, the possible events and consecutive states are identified — and so on, up to when no further change of the system’s state is possible (except due to restoration, see below). This final state is typically a fail state of the system, whereas the preceding states are intermediate states. In an intermediate state, the system described by the model still performs its function, but it has detected or hidden faults.
The edge between two states represents a transition from one state of the system (the source state of the edge) to another state (the target state of the edge). In standard Markov models this transition takes place due to the occurrence of an event, thus an edge is characterized by the occurrence rate of this event (compare [EN 61165]).
Fail state for a low demand function means, that the function is unavailable in this state, namely the probability of this state contributes to the unavailability Q (PFD) of the function.
Fail state for a high demand or continuous demand mode function means, that the transition rate(s) towards this state contribute to the failure rate h (PFH) of the function.
Fail state for a non-repairable function means, that the probability of being in this state contributes to the unreliability F(T).
Note: If you want to calculate the unreliability F(T), there should be no restorations from any of the fail states, because in that case it would be a restorable system, but the unreliability F(T) is not relevant for restorable systems. If there are restorations anyhow, the unreliability cannot be calculated based by just summing up the probabilities of the fail states at time T. Therefore, the unreliability will be calculated based on the failure density fsys(t) instead, which is derived from the failure rate hsys(t).
If no restoration would take place, after some time the system would be in one of the final states (fail states), thus the sum of the probabilities of these states would be 1, no events and thus no transitions could occur anymore. All systems that shall work for a long time usually need some kind of restoration, based on preventive of corrective maintenance. In Markov models, restorations are also events, represented by edges, and thus characterized by transition rates. Even they are usually called restoration rate, mathematically they are transitions, just as those representing failures or any other event.
If an edge represents the correction of exactly one failure, this restoration edge is exactly anti-parallel to the related failure edge (the ‘forward’ part), it ‘returns’ to the source state. In fault trees both failure and restoration are defined in each basic event (if there is a restoration) — and since Functional Safety Suite aims for best equivalency between fault trees and Markov models, this principle is used for edges as well. Therefore instead of defining two independent edges, the restoration is inherent to the edge describing the failure. Thus the same generic basic events can be used for edges as for basic events of fault trees. The restoration rate μ is automatically calculated and written below the edge.
Important note: Usually inspections and tests are executed in predefined intervals. Thus in redundant architectures, all channels are checked and repaired at the same time. Thus all restorations related to these inspections and tests will take place at the same time, not independent of each other. For correct modeling, you should separate the restoration from the failure part of the event, see section 9.3.1.6.
The direction of the restoration is indicated by a small text arrow left of the restoration parameter text below the edge, see figure 53.
Conditions and Instantaneous transitions Functional Safety Suite provides some extensions to standard Markov models, named instantaneous transitions and cyclic transitions, in addition to transition rates.
An instantaneous transition is not a defined by a transition rate, but a transition probability. That means, the source state is immediately left with the probability represented by the edge towards the target state. The sum of the probabilities of the instantaneous transitions leaving a state must be 1. Thus the probability of finding the system in the source state of such edges is 0, and the state is called virtual state accordingly. Since the probability of a virtual state is 0, edges of another type (not instantaneous) starting in a virtual state don’t make sense and are therefore forbidden.
Instantaneous transitions are necessary to model conditions. Conditions are necessary in two scenarios:
If the edges leaving a state are determined by probabilities (namely Q(t) and A(t) = 1 -Q(t)), the edges represent instantaneous transitions. That means, that this state is immediately left with the probability represented by each edge towards the next target state. The sum of the probabilities of the instantaneous transitions must be 1, and there must be no edges of another type starting in this state. Thus the probability of finding the system in the source state of such edges is zero, the state is called virtual state accordingly. Virtual states are displayed with gray text and circle.
Cyclic events An event that periodically appears, can be modeled by a generic basic event of type cyclic. This leads to an edge describing a cyclic transition. This edge is described by a deterministic period and a probability, that it actually appears after each period. In contrary to source states of instantaneous transitions, the source state of a cyclic transition is not “virtual”, since its probability is not (always) 0. Thus also other transitions are allowed with this state as source, e. g. continuous transitions, and the sum of the probabilities of cyclic transitions doesn’t need to be 1.
Presentation related properties of the Markov model are edited in the Markov model properties panel directly, see below. Evaluation related properties are set in the Markov model evaluation properties dialog, see section 9.5. All properties of the Markov model are stored in the Markov model file (extension .mdg).
Description: A user defined description of the Markov model.
Note that in case the presentation related features don’t fulfill your needs, you can export all graphics in SVG format for further processing by vector graphics tools.
Show edges without restoration part: In order to get a better overview of restorations, you can disable the display of edges without restoration. Evaluation is not affected.
Show edges without forward part: In order to get a better overview of forward edges, you can disable the display of edges only describing a restoration. Evaluation is not affected.
Show common cause edges: In order to get a basic view of the forward edges, you can disable the display of common cause edges. Evaluation is not affected.
An edge models a transition from a source state to a target state and optionally vice verse (restoration). The target state is the state the edge points to. Each edge of a Markov model consists of the reference to the generic basic event and several modifiers.
The parameters in the ‘Edge’ section belong to the specific edge, which is part of the Markov model, and thus are stored in the Markov model file (extension .mdg).
The parameters in the ‘Generic Basic Event’ section belong to the generic basic event as selected by the field ‘GBE name’, and thus are stored in the library.
Remember: Changing parameters in the ‘Generic Basic Event’ section will change the properties of all other basic events referring to the same generic basic event too.
Package Select whether the generic basic event is in the library of the global package or of the local package.
GBE Name The identifier of the generic basic event, also serving as the name of the edge. You can select a name (and by this the referred generic basic event) out of a list of the generic basic events belonging to the selected package.
Suffix A user defined identifier of the specific instance or the event described by the generic basic event. In contrary to the suffix of basic events of fault trees, the suffix of an edge has no effect on the evaluation. It is only for easier creation and better readability of the Markov model.
Selection of partial rates or probabilities Whereas in fault trees the common cause factors are considered completely internally during evaluation, in Markov models a common cause factor results in multiple edges with different occurrence rate. In order not to require several generic basic events for the same failure — one for the single failure, one for the common cause part — you can select which part of the occurrence rate to use in each edge.
Also for conditions the non-negated probability can be split into the single cause probability and the common cause probability. Of course the negation of a condition is always the complete (negated) condition probability ¬p = 1 - pcomplete = 1 - (psingle + pcommon).
Negate probability checkbox For virtual states the sum of probabilities of the leaving edges must be 1. Typically one leaving edge models the unavailability of an element or sub-system, the second leaving edge the availability A(t) = 1 - Q(t) of an element. The unavailability is often modeled by an appropriate generic basic event or a link to another model, and not given immediately. In that case the edge for the availability shall be set to the same generic basic event, but with the ‘negate probability’ checkbox set. By this the value A(t) = 1 - Q(t) will be assigned to it. An edge with negated probability is displayed in dark green color.
No forward checkbox Depending on the maintenance and repair strategy, the detection and repair of a fault modeled by an edge doesn’t necessarily lead back to the source state of the edge. This is the case e. g. if a larger unit with multiple potential faults is exchanged if at least one error is detected, or in case of common cause failures. Also when using instantaneous transitions, the restoration path is different from the forward path. In these cases the restoration must be modeled by a separate edge. However in order not to need a separate generic basic event only for the restoration, this separate restoration edge can refer to the same generic basic event as the edge representing the failure(s), as long as all failures are restored at the same time. In that case set the no forward flag in order to indicate to the program, that only the restoration part of the generic basic event shall be used for this edge.
However the direction of the edge is still defined by the generic basic event, thus the edge must point towards the failure’s target state. See figure 55: All states will directly being restored to the OK state – either after check every 1000 h (restoration part of edges “X” or “Y” or separate edges “Rest_1000h”) or immediately when the hazard occurs (edge “Rest_imm”). The restoration edges “Rest_1000h” and “Rest_imm” are modelled by generic basic events of type Repairable, just as “X” and “Y”. Since their forward part is not used, the failure rates don’t matter.
No restoration checkbox This is just the opposite of the No forward flag, see there for explanation.
Continuous forward checkbox This option makes sense for edges of generic basic event type Cyclic only. There must be at most one cyclic edge leaving any state, because in case of multiple cyclic edges, ambiguousness could appear during evaluation. If there is more than one cyclic edge leaving a state, select this checkbox for those edges that shall be treated as normal transitions using their equivalent constant transition rate, see section 4.3.4.
Continuous restoration checkbox There must be at most one cyclic restoration leaving any state, because in case of multiple cyclic restorations, ambiguousness could appear during evaluation. If there is more than one cyclic restoration of a state, select this checkbox for those restorations that shall be treated as normal transitions using their equivalent constant restoration rate.
Description A user defined description of the generic basic event and therefore identical for all basic events referring to this generic basic event.
House event, Condition event, Not developed The house event and not developed modifiers have no effect in Markov models. They are for information only and cannot be changed here.
The effect and usage of the condition event modifier in Markov models is explained in sections 9.1.3.1 and 9.5. Also see section 4.2.1.
The probabilistic model of the generic basic event. See section 4.3 for details.
The values needed by the model of the generic basic event. See section 4.3 for details.
A state of a Markov model refers to a state that a system can enter physically, e. g. due to a failure, a maintenance action, an action of the operator, the mission profile, etc.
A state can be target of multiple edges and source of multiple edges. Two states can be connected by one edge only (this edge may have a forward part and a restoration part).
All properties of the state are stored in the Markov model file (extension .mdg).
Name: A user defined identifier of the state. Every state must have a different name, since the name is used to determine the sources and targets of edges.
Description: A user defined description of the state.
Start probability p0 States can be assigned a start probability p0 = p(t = 0), except of virtual states. For states being source or target of edges, the start probability is only used in transient evaluation. The sum of all start probabilities of the Markov model must be 1.
States with fix probability States with no edges keep their start probability p0 forever. By this, it is possible to define states with a fix probability, e. g. to model a constant basic unavailability.
Contributes to unavailability Each state can contribute to the unavailability Q(t) of the element modeled by the model. States contributing to the unavailability are marked with a violet circle. States that also contribute to the occurrence rate are marked with a red circle, see below.
Contributes to occurrence rate and unreliability Each state can contribute to the occurrence rate h(t) and unreliability F(T) of the top event modeled by the Markov model. States contributing to the occurrence rate and unreliability are marked with a red circle. They also contribute to the unavailability, since an element in this state cannot perform any function anymore.
The background color can be selected separately for each state.
The value of interest for each safety function is either
The value of interest and several parameters related to quantitative evaluation of Markov models are set in the Markov model evaluation properties dialog, see below.
To calculate these values for a given Markov model, select Calculate – Calculate Model Values. The mean unavailability Q and mean occurrence rate h can be calculated by steady-state or transient evaluation, the unreliability F(T) can only be determined by transient evaluation. Some modeling features are only available for transient evaluation, allowing a more realistic description of some systems. As for fault trees, a transient evaluation is more precise but takes much more computing time.
If the Markov model refers to other models by edges of type link, these models are evaluated before. Circular references are recognized and indicated in the message window before the evaluation starts.
The values of the event modeled by the Markov model are displayed in the upper left corner in the graphics tab. The probability of each state is displayed in each state’s symbol, depending on the method of evaluation (steady-state or transient), see below.
If a parameter of a state or an edge is changed, all values that might be affected by this change are automatically marked as invalid and not displayed anymore.
Calculation Value Select which value(s) to calculate. In contrary to fault trees, there aren’t different algorithms for the different values. Therefore, in fact in steady-state evaluation the mean unavailability Q and the mean occurrence rate h will be calculated independent of your selection (calculation of the unreliability F(T) is not possible in steady-state mode). In transient evaluation mode, h(t), f(t), w(t), F(t), Q(t) and N(t) are calculated and available for display in the chart frame. Depending on the selected value, Q or Q, h and the estimated number of occurrences of the top event N(T) or the unreliability F(T) is displayed in the header.
The unavailability Q is the sum of the probabilities of all states marked as contributing to the unavailability:
![]() | (43) |
The occurrence density of a state j wj is the sum of the transition rates of all edges pointing to it, multiplied by the actual probability of the source state of each edge
![]() | (44) |
If the edge represents an instantaneous transition, the transition rate of this edge is the sum of the transition rates to the source state of this edge, (i. e. the virtual state) multiplied by the probability of the instantaneous transition.
The occurrence density of the event modeled by the Markov model wsys is the sum of the occurrence densities of all states marked as contributing to the occurrence rate of the model (see section 9.4.2):
![]() | (45) |
Note: Typically there will be no restorations leading to a state contributing to the occurrence rate of the model, but if there were any, they would not be considered in calculation of wsys, since a restoration obviously doesn’t contribute to the occurrence of an undesired state of the system.
For calculus of hsys (PFH) it is presumed, that the system is always in an up-state when the (last, dangerous) failure occurs. The higher the probability of the final state(s) (indicating the unavailability of the function), the lower the occurrence density of the event modeled by the Markov model. Therefore, if divide density by availability is selected in the project properties dialog(tab Markov Models), the occurrence rate (usually a failure rate) of the Markov model is calculated by
![]() | (46) |
If divide density by start state probability is selected in the project properties dialog the occurrence rate is calculated by
![]() | (47) |
which is more conservative.
Note that the failure rate h can be calculated only for Q ⁄≃ 1, since the difference cannot be calculated with sufficient accuracy for values close to 1. If this happens, h is set to ‘NaN’ and not displayed, thus it is ensured that only values with sufficient numeric accuracy are displayed. For (correctly built and modeled) safety systems this is never an issue, since Q ≪ 1 is always fulfilled. In transient evaluation mode, this must be fulfilled for each time step, i. e. not only the final step or an average value.
Evaluation Mode Select whether the Markov model shall be evaluated in steady-state mode or in transient (time-variant) mode. In case of transient evaluation, the time interval must be set as well.
Quantitative steady-state evaluation A steady-state analysis is appropriate for all systems that are supposed to operate for many years, with certain test intervals and optionally some down-times for maintenance and repairs. Several parts of the system might be replaced or repaired during the system’s lifetime. In case that all failures are detected in adequate time (either by continuous diagnosis, by periodic tests or by malfunction of the system), both the failure rate h (PFH) and the unavailability Q (PFD) of the system don’t depend on its actual age, but will reach some pseudo-stationary state where both values will oscillate around a mean value. The frequency of this oscillation is equal to the longest detection interval or a multiple of it. This is even correct in case the failure rates of some particular components depend on their specific age, if the lifetimes of these components are shorter than the system life time. The value of interest for each safety function performed by such a system is either the mean unavailability on demand Q (PFD), or the mean occurrence rate h (PFH). The related standard is mainly [EN 61508] and the derived standards. Examples for those systems are machines, cars, trains, air-crafts, chemical plants, power plants, etc. and their control systems.
In steady-state evaluation cyclic edges due to periodic tests of generic basic events of type repairable will be replaced by restoration rates μ, cyclic edges due to periodically occurring events (edges referring to generic basic events of type cyclic) will be replaced by forward transition rates λ. Note that these conversions do not correctly reflect reality, but are only approximations.
In principle a Markov model is evaluated for a steady-state by solving a linear equation system, whose variables are the state probabilities and whose coefficients are the transition rates. Probabilities of instantaneous edges are multiplied with the transition rate of the preceding continuous edge.
The steady-state probability of each state is displayed inside the state’s symbol. Based on the state probabilities in steady-state, the mean unavailability Q and the mean occurrence rate h are calculated, see section 9.5 above.
Based on h, finally the expected occurrence number N(T) is calculated by
![]() | (48) |
with T being the system lifetime.
A steady-state analysis is very fast, because all values have to be calculated only once (compared to time-variant analysis, where all values have to be calculated many times, see below).
Quantitative transient evaluation A transient analysis is mandatory, if the ‘mission failure probability’, namely the system’s unreliability F(0,Tmission), is the value of interest. This is typical for non-restorable systems, that are supposed to perform their function in a pre-defined way for a pre-defined lifetime (e. g. a certain mission), such as a rocket or spacecraft. These systems are characterized by final states without restoration paths.
A transient evaluation is also necessary for time-variant systems, that shall be examined in detail. The Markov models describing those systems might include edges of type cyclic, and the restoration might need to be modeled by discrete restoration times. Also if time variant occurrence rates are used, e. g. due to links to other models or due to generic basic events of type non-restorable with increasing or decreasing failure rates, a transient analysis is recommended. In general, a transient (or time-variant) analysis produces more precise results, but is much slower.
From the mathematical point of view, the model is in principle a linear differential equation system. Thus for transient evaluation, a differential equation system must be integrated. For most practical problems the rates differ for multiple orders of magnitude, thus the differential equation system is typically stiff. Therefore an implicit integration algorithm is used. If rates are constant over the lifetime, the differential equation system is time in-variant and therefore doesn’t need to be created in each step. In any case the system is linear, so the Jacobian matrix is directly given. In case of non-constant failure rates, the Jacobian must be calculated for each step, which is quite a time-consuming operation. You should avoid time variant failure rates, therefore, but it makes no difference whether you’ve got one or many.
The differential equation system only describes the continuous transitions. Due to the extensions of Functional Safety Suite, also instantaneous transitions and cyclic (periodic) transitions occur. Therefore in each step the following is done:
The occurrence number N(t) is calculated by
![]() | (49) |
The unreliability at a given time t is typically just the sum of the probabilities of the final states marked as contributing to the unreliability:
![]() | (50) |
If there is at least one restoration from any final state contributing to the unreliability, the unreliability is automatically calculated via the system occurrence rate hsys:
![]() | (51) |
The mean values for occurrence rate h and unavailability Q are calculated by
![]() | (52) |
and
![]() | (53) |
The state’s probabilities after the last calculation step p(Tmission) are displayed in each state’s symbol. Note that these are displayed for better understanding only, but are in general not equivalent to the system’s safety values. Therefore they are displayed in light gray.
Time interval: The step size for transient evaluation in hours. A smaller step size means more steps for the given system lifetime and thus takes more time in calculation. This might be an issue in case of large systems. The step size must be less than a 10th of the smallest periodic (cyclic) event you want to evaluate for discrete times. Time constants less than 10 times the step size are handled as rates. However step size should be even smaller in order to reduce the computational errors.
Example: Given a redundant control system, consisting of two similar channels 1 and 2. Each channel has two failure modes A and B, optionally with some common cause factor. The proof test intervals are 24 h and 168 h.
Its Markov model is presented in figure 59.
With a step time of 0.5 h, the evaluation considers the tests at multiples of 24 hours and 168 hours, resulting in a h(t) and Q(t) displayed in figure 60.
With a step time of 20 h, the proof test intervals are less than 10 times the step time, so that a continuous restoration rate of μ is considered instead of a cyclic transition. The resulting h(t) and Q(t) is displayed in figure 61.
You might wonder why the occurrence rate h(t) is never zero, even in figure 60. This is due to the common cause factor between both channels, modeled by direct edges from state “OK” to some failure states. If you set the common cause factor of both generic basic events to zero, the result will look completely different, compare figure 62.
Pre-processing Mode Starting with version 3.2 of Functional Safety Suite, the common cause factors between edges of Markov models can be handled automatically for most Markov models. In addition, most Markov models can be completed automatically. Automatic completion always includes automatic creation of common cause chains. In both cases, the Markov model finally used for evaluation will be created internally.
To understand the difference between “direct” chains and “complete” chains, have a look at the fault tree shown in figure 64.
The Markov model representing the minimal cut-sets of this fault tree is presented in figure 65.
In fact, event “Rep1” can occur also in states “Rep2” and “Rep3”. The complete Markov model also considering these degrees of freedom is shown in figure 66.
The completed Markov model also considers, that the system’s state can “jump” from one (yet incomplete) chain to another chain, until a final state is reached — even multiple times. This option includes the internal creation of common cause chains.
However in most practical systems, the difference is negligible. Here in fact the values have been selected explicitly in order to show a difference.
Note: If the Markov model is pre-processed, no state probabilities will be shown, since only the final model is evaluated but not the model displayed in the graphics tab.
The final Markov model used for evaluation can be exported by Export – Export Final Markov Model. It is recommended to check the correct pre-processing and apply manual corrections or adaptations if necessary.
Create a Markov model by File – New Markov Diagram.
You can add states by Edit – Add State and clicking the mouse at the position you want to set the state. A unique name will be automatically assigned to the new state. At each grid position, only one state can be placed.
In order to add an edge, select the source state, then select Edit – Add Edge, then click on the target state. An edge referring to the last generic basic event in the library will be created. You can select any other existing generic basic event by selecting it via its name and package in the Edge Properties Panel, see section 9.3.
You can also cut (or copy) and paste states and edges: Select the state by Edit – Cut (or ‘Ctrl+X’) or Edit – Copy (or ‘Ctrl+C’) , then press Edit – Paste (or ‘Ctrl+V’), finally click on the position where you want to paste it. For an edge, cut or copy it, then select the source state, press Edit – Paste (or ‘Ctrl+V’), finally click on the target state.
States are moved by ‘Shift+Cursor’, connected edges will stay connected. If a state is deleted, all connected edges will be deleted too (the generic basic event will not be deleted of course).
Multiple states and edges can be selected with the mouse, either by clicking the left mouse key together with ‘Shift’, or by pressing the left mouse key and pulling the selection rectangle around several states. All edges between selected states will be selected also. A selection can be moved by holding the left mouse key while dragging, or you can cut or copy and paste it as single events. All states pasted into the same or another Markov model will get a unique name, consisting of the original one, appended by a number if necessary to become unique.
Changing properties of states or edges is done in the properties window. The only exception is the change of the name of an edge (⇔generic basic event), for which a special command Library – Rename Generic Basic Event is foreseen. The properties of the generic basic event referred by an edge can be edited in the library view as well, see section 4.1.
A Markov model that has not been saved after the latest modification is marked with an asterisk ‘*’ in its title.