If you are a designer of an electronic system and you need to understand how well your system reacts to a state that can cause harm to a human being (e.g. electric vehicle battery catching fire due to overvoltage), it is useful to calculate a so-called “Performance Level”.
This guide will help you develop an understanding of
- What are Performance Level calculations?
- Why and where are they needed?
- Prerequisite steps and calculations needed to perform Performance Level calculations.
This guide will take an example of a battery-management system for performing and explaining performance level calculations. The guide first introduces the metrics used and then uses Siemens Norms (SN29500) for performing Performance Level calculations.
If you use an industrial norm such as Siemens SN-29500 standard, you would need to purchase the corresponding documentation as that will be the basis for FIT calculations that lead to the Performance Level calculations. It is also recommended to purchase ISO-13849. ISO-13849 is the basis on which all performance level calculations are performed.
1.1 Performance Level
“Performance Level” is a performance metric that provides an insight into a control system’s ability to detect conditions that may lead to a catastrophic failure or that can create a hazardous situation for humans. For example, in case of battery-management systems, an example of catastrophic failure would be the batteries catching fire due to overcharging / short circuiting or the batteries getting deeply discharged and getting damaged.
In short performance levels (PL) are applied to different safety related aspects in a complete system. The performance grading is performed on a scale from “a” to “e”. PLe (Performance Level e) is the best performance level attainable and PLa is the worst. This rating helps provide the manufacturer with important information ascertaining to what part of their system needs improvement.
Before proceeding further with PL (Performance Level) calculations, we need to explain two terms; FIT (Failure In Time) and MTTF (Mean Time To Failure).
PL (Performance Level) is calculated using a multistep process that involves calculating the FIT (Failure In Time) for each component you have in the system.
FIT (Failure in Time), MTTF (Mean Time To Failure) are two terms used to provide an estimation of how many times a component fails in a specified period for specific operating conditions. FIT is a number that denotes the number of failures in 1×109 hours. MTTF and FIT are directly related to each other, in the following relation:
MTTF = 1×109/ FIT (hours)
For example, an FIT of 5 would have an MTTF of 2×108 hours. An FIT of 5 means that the component is likely to fail 5 times in 1×109 (1 billion) hours.
1.2 FIT (Failure In Time) and MTTF (Mean Time To Failure)
FIT (Failure In Time) is a measure of the reliability of the component and how it will perform and can also help you find out which component in your design is most vulnerable to faults! Components with moving mechanical parts such as relays, switches etc. are usually more prone to failures, and have therefore a high value of FIT, depicting that they may fail more often in a given period of time.
All reliability estimates are based on a well-defined standard such as MIL-HDBK-217 or Siemens SN29500. These form the basis for calculations performed and the FIT values calculated. A good way to get an understanding of FIT can be to use MTBF (Mean Time Between Failure) calculators like Ram Commander (the MTBF calculator is a small part of it) and ItemSoft (which can import BOMs and perform batch calculations). Using these tools requires some knowledge of the standard, therefore, it is helpful to spend some time reading the relevant documentation of the standard used.
To read more about FIT, MTTF and MTBF, you can read our blog.
1.3 Difference Between Performance Level and FIT (Failure in Time) Calculations
Performing an FIT (Failure in Time) analysis for your circuit is different than calculating performance levels. A performance level is performed only on the components that relate to the safety (hazard prevention) related aspects of the system and no other features. For example, a BJT circuit that simply turns on a status LED is not part of the performance level analysis.
1.4 Safety Functions
Safety functions is the grouping of parts of a control system that will detect failure conditions and manage the reaction to these conditions. Each safety function will deal with a unique condition. E.g., in case of Lithium-Ion battery-management systems, the control system should deal with, and include safety against conditions such as over voltage, over temperature, over current etc.
These functions use the SRP/CS (Safety-Related Parts of the Control System) to define inputs, logic devices and outputs. These three are displayed in the block diagram below (ISO 13849-1).
Figure 1 (Image by: Oxeltech)
In the block diagram above we assume a Battery Management System (BMS) will have a safety function called “Over-Voltage Protection” and the Inputs, Logic, Output will be as follows:
Input: Analog-voltage-sense circuitry.
Logic: Microcontroller circuitry makes a safety decision based on the input.
Output: Relays that prevent any over-voltage scenario from harming the end user.
1.5 System Categories
ISO 13849-1 assigns a system into 1 (or more) of 4 categories that are used to define your system. The document details the definitions and methodologies of calculating PL and deciding other details such as what category your system will belong to.
The ISO-13849 standard documentation should be studied extensively as to come to an understanding of your exact requirements and what category. In this documentation, we will be looking into “Category 2” which has the following architecture:
Figure 2 (Image: Oxeltech)
This category has the following two additional blocks;
- TE (Test Equipment) Block
- OTE (Output for Test Equipment) Block
The TE (Test Equipment) block will consist of any parts of the system that can act as a redundancy to your main input block. For example, if you are using cell voltages as the measurement for the over-voltage protection safety function, then the TE (Test Equipment) block can contain the complete battery pack voltage as a redundant measurement. The output to the TE block can contain any additional redundant protective measures such as a secondary relay or a pyrofuse. This OTE block can also include broadcast of the malfunction to an external device.
2. How to Calculate FIT (Using SN29500)
2.1 Introduction to the basic terminology used in SN29500
Ambient Temperature: The ambient temperature refers to the temperature of the immediate surroundings of the component when it is not in operation.
Mean Ambient Temperature: The mean ambient temperature represents the average value of the ambient temperature experienced by components in similar applications. This may involve considering temperature fluctuations over time.
Reference Conditions: The reference conditions specified in this standard’s individual parts are chosen to align with most applications for the mentioned components. These conditions serve as a basis for expected values.
Operating mode: The operating mode description indicates whether the components are continuously operated or if there are breaks during their application. The following distinctions are made:
- Relatively long duration with a constant load (e.g., process controls).
- Relatively long duration with a changing load (e.g., telephone switching equipment).
- Relatively long duration with a constant minimum load and short-duration maximum loads (e.g., fire-alarm systems).
- Constant load during the operating phases (e.g., process controls).
- Changing load during the operating phases (e.g., control units in machining installations, road traffic signals).
Description of Environment: The description of the operating environment outlines the prevailing climatic and mechanical stresses that occur during the component’s operation.
2.2 Calculation Methodology / Example
The primary factor used in determining reliability for sub-assemblies and equipment units is the failure rate. Siemens AG utilizes the component failure rates specified in the standard SN29500, as a consistent foundation for reliability predictions. This standard also includes the specific conditions under which the component failure rates are applicable (known as reference conditions). Reference conditions are essential when stating failure rates or comparing values from different sources. The IEC 61709 serves as the basis for defining the reference conditions and conversion models that account for failure rates under varying stress conditions. The stress models outlined in this standard are used as the foundation for converting failure rate data from reference conditions to actual operating conditions.
For an overview, SN29500 standard uses the following method to estimate the reliability:
Figure 3 (Image: Oxeltech)
The Reference FIT refers to the FIT value that a component of a particular category is estimated to have at any time. The Operating FIT refers to the FIT value when a component is tested at operating conditions. On the contrary, the stress profile is an accelerated testing for the component where a component is under stress.
Overall, you will need to know the base reference FIT which will then be converted to the operating FIT. The result is an FIT that is appropriated at your desired conditions. These operating conditions should not be more than the absolute maximum ratings. The reference conditions for most of the components are mentioned in the standard document for all component categories.
2.2.1 Calculating FIT for a Single Component
To calculate the FIT (Failures in Time) of a component, you can follow the steps outlined below:
- Identify the specific type of component you are working with, such as a passive component, discrete semiconductor, or integrated circuit.
- Refer to the appropriate Siemens norm for that component type, such as SN29500-1, SN29500-2, SN29500-3, and so on. For example, if you are working with a passive component, you would refer to SN29500-4. The Figure below shows the norm number for each category:
Figure 4 (Image: SN29500 documentation)
- Study the norm document corresponding to the identified norm number. Look for the specific formulas provided for the component category you are dealing with. These formulas will help you calculate the FIT.
- Within the norm, identify the component category and note the reference FIT value mentioned for that category. This reference FIT value serves as a baseline for further calculations.
- Calculate the operating FIT (λ) based on the reference FIT (λref). To do this, refer to the formulas provided in the norm document and consult any accompanying tables or information. Consider dependencies such as temperature dependence (πT) and voltage dependence (πU), which may be listed in tables.
- Use the identified formulas and relevant dependencies to calculate the operating FIT for the specific component you are analysing.
Let’s calculate the FIT for C1901 in the following sample BOM.
Figure 5 (Image: Oxeltech)
As in the Figure 4, the “Part” (extension) for passive components is 4, so we will refer to SN29500-4. Once you read the document, you can conclude that following assumptions are made for the calculation:
Operating Voltage: For capacitors 50 % of the rated voltage, if not otherwise stated.
Mean Ambient Temperature: 40°C for all passive components
After the assumptions, you will find the table for capacitor failure rates as shown in figure 6
Figure 6 (Image: SN29500 documentation)
As given in the BOM, C1901 capacitor is a ceramic capacitor with the temperature coefficient X7R, following information can be concluded for the component from the table.
Reference FIT (λref) = 2
Reference Capacitor Temperature (Θ1) = 40°C
Ratio of Reference Voltage to Maximum Voltage (Uref/Umax) = 0.5
Now that you have the reference FIT, you now need to find the formula to calculate operating FIT. The formula for capacitors is given as:
Figure 7 (Image: SN29500 documentation)
To calculate λ, you need πT, πU, & πQ which is temperature dependence, voltage dependence and quality factor respectively. The formula for each of these is given or as an alternative, lookup tables are also provided. You can either refer to tables that implement that formula or directly use the formula. Once you know the value of πT, πU, & πQ, multiply each dependency and get the value of λ. we will use tables to get the value.
Figure 8 (Image: SN29500 documentation)
As shown in Figure 8, the πU, for the ceramic capacitor goes from 0.2 to 7.4 based on the ratio of operating voltage to the max voltage. Given that we consider the value of U/Umax to be 0.2, πU is 0.3.
Figure 9 (Image: SN29500 documentation)
As shown in Figure 9, the πT for capacitor goes from 0.41 to 16 based on the actual capacitor temperature (Θ2). Considering an actual temperature of 60 °C, the πT is 2.2.
Please note that in the formulae, temperature values are always used in Kelvin, so don’t forget to convert your operating temperatures respectively!
Figure 10 (Image: SN29500 documentation)
Based on the categories, the general-purpose capacitor has a πQ of 2.
Therefore, the λ can be given as 2*0.3*2.2*2 = 2.64
This value you have calculated is your FIT for that specific component.
2.2.2 Calculating FIT for the Whole System
Above we have calculated the FIT for a single component. Use the same methodology for all components on your system.
Once you have FIT values for all components on 1 single board, you can simply add the relevant FIT values to calculate the overall FIT for the board. You can analyse each FIT value to gain an understanding of what component has the worst FIT (if that is your goal).
Some of the components on your board might not be included in any safety channel. Therefore, we do not have to calculate the FIT of all the components. The complete board FIT is only used when all components of the board are a part of FIT channel.
In the next part, we will present the safety functions we developed for our board and how it helped to predict the performance level.
3. Performance Level
3.1 Introduction to Safety Channels and Safety Functions
Safety Channels and Safety functions are two distinct parts of the performance level analysis.
Safety Functions are the different indexes on which a system is analysed. Examples of these are under-voltage protection, overvoltage protection etc. Each safety function relates to a specific feature or system state that you want to check the performance level for.
Safety Channels on the other hand is a division of your components into different parts. This division simplifies the assignment of components into the safety functions. These safety channels have FIT assigned to each of them. Once you define your safety channels and assign components to them, you then must figure out which safety channel will belong to what safety function (detailed below).
3.2 Safety Channels
Safety channels are the division of your entire system into smaller segments. These smaller segments can then be assigned into the input/logic/output blocks as discussed before (1.4 System Categories). Each segment will have an FIT value which is a sum of the relevant FIT’s of components that are assigned to it. A detailed analysis of the schematics in your system is necessary to understand and analyze the components and understand where they fit into your safety channels.
For example, in the safety functions, the microcontroller will be part of the logic block and the microcontroller will end up being part of a safety channel. Let’s call this channel “MCU”. The “MCU” safety channel will include all components that are relevant to the functioning of the microcontroller. For example, all decoupling capacitors on its supply pins and all of its external oscillators will be part of the safety channel.
Another example is the voltage sense for a system that measures voltages using an external battery-management-system IC. The external battery-management IC will be part of the input block and can be assigned to a safety channel. This safety channel we could, for example, call “Cell Voltages Sense”.
In figure 11 below, a sample safety channels definition has been listed. These channels are defined with the characters “A to Y” based on the system we were analyzing. Each channel was named and defined by the type of components it consists of. The figure represents the channels we have defined, and the relevant channel FIT we have calculated for them.
Note: There can be external components that are not on the PCBs themselves. You can find their FIT values separately and then assign these FIT values to the respective channel these components belong to.
Figure 11 (Image: Oxeltech)
Each channel has a specific number of components for which the FIT is calculated in the same way as described in the previous section. The FIT of all components in a particular channel are summed to get a single FIT number (number of failures in 1×109 (1 billion) hours). The relevance of the component to that channel is understood by understanding how that component works. For example, in the “CAN bus (external)” channel JJ, the FIT for a Can bus transceiver IC (through which all commands will flow through to the outside of the system) has to be included in the channel JJ.
3.3 Safety Functions
The safety functions are the key metrics on which you will judge the safety related hazardous states that your system might attain. These states are defined keeping in mind the context of the system. So sample safety functions can be Over-voltage protection, Short-circuit protection, Over-temperature protection etc. All of the examples are states that a system can attain and can be hazardous to the human user and to the surrounding environment.
Each safety function can and usually does have multiple safety channels assigned to it. The assignment of channels depends on its relevance to that safety function. For example, the “Voltage measurement cells” channel C is highly relevant to the safety function “Over-Voltage Protection”. But the “Fluid Measurement” channel N has no relevance to the same “Over-Voltage Protection” safety function.
Let’s now take a look at the following block diagram representation of how the safety channels will fit into the safety function block diagram architecture.
Figure 12 (Image Courtesy: Oxeltech)
The architecture above shows that the input is the “Cell Voltages Sense” safety channel, the logic is the “microcontroller” safety channel, and the output is the “Relays” safety channel. The output block is the device that will prevent the system from reaching a hazardous state which can cause injury or harm to the user or to the system itself.
In the diagram above, a battery pack sense channel has been added and a pyrofuse to TE (Testing Equipment) and OTE (Output to Testing Equipment) respectively. The TE block and OTE block will both function as redundancies to the main input, logic, and output blocks. If the cell voltages are within the desired range, but the battery pack voltage is not in the correct range then the control system will activate the pyrofuse to disable the system to prevent a hazardous state (function of the TE+OTE).
Using the above block diagram, we add up the FITs of each of the input, logic, and output blocks. We then convert this value into the MTTF (Mean Time To Failure). This MTTF value is the complete sum for this safety function and will be used in the next step.
Let’s assume that from the above-mentioned 28 safety channels, we made 13 safety functions. Each safety function covered more than one safety channel. The FIT for safety function was the sum of the FIT of the safety channel it included. This was later used to calculate MTTF using the formula (=1000000000/FIT). Then MTTFD was also calculated. This MTTD is important in the next step.
Figure 13 (Image: Oxeltech)
Once we have the definitions of all the safety functions and all safety channels have their FIT values calculated, we can move on to the Diagnostic Coverage part of the calculations.
3.4 Diagnostic Coverage
The diagnostic coverage average or DC average is a measure of how much weightage a particular safety channel has on the entire safety function itself.
The formula for DC average uses DC values which are assigned to each safety channel. The value for each safety channel is assigned by studying ISO 13849-1. Table E .1 in the ISO standard defines how DC values are assigned to each channel (depending on the relevance). Once DC values are assigned to each channel, we can then utilize the following formulae to calculate the DC average for the entire safety function.
In the above DC average formula MTTF (Mean Time to Failure) is obtained by using the conversion method for FIT to MTTF. MTTF is measured in hours.
For our board, we calculated the DC average for each safety function by first calculating the MTTFs and then based on the channel, we assigned DC values using the ISO-13849 document. The figure below shows the snapshot of the DC value assigned to each channel.
Figure 14 (Image Source: Oxeltech)
Once we have assigned DC values to each safety channel, we calculate a DC average value for each safety function. This will be calculated by the formula:
Inputting values into the above equation will give us a DC average for that safety function. In the example, we use the 7 DC values for each safety channel that was relevant to the safety function “Under-Voltage Protection”.
Then we input the MTTFD1 – MTTFD7 values. MTTFDn and MTTF are not the same and the “D” indicates that it is the number of dangerous failures (as calculated before). So, the MTTFD1 and DC1 are both supposed to be from the same safety channel.[FB3] [OK4]
MTTFD is simply the number of dangerous failures which is two times the MTTF of the safety function.
Figure 15 (Image: Oxeltech)
Once we have a DC average value for each of the safety functions, we used the following table to check if our DC average for each safety function is high, medium, or low. For our under-voltage protection safety function example shown below, the DC average is 96.2 which is a Medium DC average.
Figure 16 (Image Source: ISO-13849)
3.5 Analyzing Performance Level
Once done with the DC averages, we can come to the next part which is finding out what performance level we are obtaining. To come to this metric, we use the DC average and the PFH (Probability of Failure per Hour) values. PFH is calculated by:
PFH = 1/ MTTFD
MTTFD = 2 * MTTF (Estimation that 50% of the failures are dangerous)
The performance level can be visualized and understood by the following images. The table below uses PFH and DC average as the deciding factor for performance levels. The category we selected for our system is 2 which narrows down things a bit.
Figure 17 (image source: ISO-13849)
The below table can also be used to directly visualise what PL your system is cohering to. You use the PFH value you have calculated earlier to find what Performance Level your safety function has achieved. For example, if the PFH you have calculated is 2×10-6 , then according to Figure 18 below, your safety function has a PLc. A PLc is a good performance level (better than PLa and PLb) and as explained earlier, the higher the PL, the better the performance level of the safety function. The below image is simply the tabular form of figure 17.
Figure 18 (image source: ISO-13849)
Figure 19 below represents the calculations we performed for estimating PL for our safety functions. The table shows data calculated by using the above process. The safety function Overvoltage Protection we were looking at earlier is denoted by “OVP”. It has a resulting Performance Level of “c”. As you can see, one safety function “UCFP” has a high performance level “d”. While on the other hand the safety function “BCA” has a poor performance level “b”. By improving the components that make up this safety function, we can improve the performance level.
Figure 19 (Image source: Oxeltech)
Once you know the performance level for all of your safety functions you have completed the overall process. However, based on the performance level you get, measures can be taken to improve the performance of the overall system.
In the end we have come to different performance levels for our system. This performance level is a good metric (KPI) of how our system will behave in hazardous scenarios.
A Short Recap of Steps for PL Calculations:
Below is a recap of PL Calculation method, summarized in a few short steps:
- Calculate FIT for all components in the system.
- Define safety channels and safety functions for your system.
- Assign component FIT to each safety channel and sum for the channel FIT.
- Assign safety channels to safety functions.
- Calculate Safety Function MTTF by summing channel FITs.
- Assign Diagnostic Coverage (DC) values to your safety channels.
- Calculate Diagnostic Coverage Average and then PFh values.
- Assess the performance level using DCavg and PFh values.