The Zephyr State Machine Framework (SMF)
Zephyr provides a state machine framework (SMF) which can be used to implement finite state machines (FSMs) and hierarchical state machines (HSMs). SMF was added to Zephyr in v2.7.0
1.
Building A FSM
The first thing you need to do is enable the framework in your prj.conf
:
Then, you can start creating a state machine in a .c
file. First things first, make sure to import the SMF header file:
You will likely want to access data within your state machine. The recommended way to do this is to create a struct
to hold your data, with the first member being special struct smf_ctx ctx
. The SMF will pass this object to all of your state machine functions, and also access and modify the ctx
member to keep track of the state machines internal state.
We also need to define our states as an enumeration:
Now let’s create some state machine functions. The SMF is designed so that you can have up to three functions per state:
- The entry function: Called whenever the state machine enters the state.
- The run function: Called when the state machine is in this state and
smf_run_state()
is called. - The exit function: Called whenever the state machine exits the state.
Let’s create a simple FSM (no hierarchical states) with two states: s1
and s2
.
The void *data
parameter is actually a pointer to the state machine object (e.g. of type MyStateMachine *
in our example). You can safely cast this void *data
pointer to the correct type and access “member variables” of the state machine object. For example:
Transitioning Between States
You can call the smf_set_state()
function inside state run()
functions to transition to a different state. The below example makes the state machine transition from S1
to S2
when the event
member of the MyStateMachine
object is set to 1
.
We need to register all these functions using a state table. We can use the SMF_CREATE_STATE()
macro to create the state table easily. It takes 5 parameters:
The parent
parameter is optional and can be used to create a hierarchical state machine (more on this below). The initial
parameter is optional and can be used to set the initial state of the state machine.
We can then run the state machine!
Note that the example above is not very ideal since is sleeps for 1s between calls to the run()
function of whatever state it is in. This would make the state machine seem unresponsive. If you decrease the sleep time, the state machine will appear more responsive, but you’ll end up churning through CPU cycles needlessly. This would be an issue in a multi-threaded application or if low power consumption is a concern. A better approach is to build a event system alongside the state machine. When events arrive, they are handled by the state machine. Events could be created by interrupts, timers, or other threads. Because you cannot pass extra variables into state functions, you have to save the event data to the sm
object which the state functions have access to.
Below is pseudo code of the idea:
This is a placeholder for the reference: tbl-smf-prj-conf-options lists the different options you can set in your prj.conf
to configure the SMF2.
Option | Description | Default |
---|---|---|
CONFIG_SMF | y enables the state machine framework, n disables it. | n |
CONFIG_SMF_ANCESTOR_SUPPORT | y enables support for ancestor state machines, n disables it. | n |
CONFIG_SMF_INITIAL_TRANSITION | y enables support for initial transitions in HSMs, n disables it. | n |
Hierarchical State Machines
SMF supports hierarchical state machines (HSMs). HSMs are very useful in real-world applications for managing complex systems. They help manage complexity and reduce code duplication by allowing you to group states into sub-trees and have events propagate “up” to parent states until they are handled.
To create a HSM, you pretty much do the same thing as with a FSM, except you provide a non-null parent
parameter to the SMF_CREATE_STATE()
macro. For example:
The state functions (entry()
, run()
, and exit()
) are defined in the exact same way as with a FSM. This would create the state machine shown in This is a placeholder for the reference: fig-hierarchical-state-diagram.
Transitioning between states in HSMs follows these rules:
- All of the exit functions from the current state up to but not including the lowest common ancestor (LCA) state are called. The current state exit function is called first, and then in order working upwards towards the LCA (but not the LCA itself).
- All of the entry functions from (but not including) the LCA state down to the next state are called. The entry functions are called in order from the child of the LCA all the way down to the next state.
If we use our root
, S1
, S1A
, S1B
, and S2
states from the example above, the transitions would be handled as follows:
- An initial transition to
S1
would callroot_entry()
ands1_entry()
. EVT1
arriving would cause a transition fromS1
toS1A
. This is a transition to a child state and would just calls1a_entry()
.EVT2
arriving would cause a transition fromS1A
toS1B
. This is a transition to a sibling state and would calls1a_exit()
thens1b_entry()
.EVT3
arriving would cause a transition fromS1B
toS2
. This is a transition to a different branch of the state machine and would calls1b_exit()
,s1_exit()
, and thens2_entry()
.
Event Propagation
When you call smf_run_state()
with a HSM, the current states run()
function is not the only one that can be called. The SMF library will also call the run()
function of all ancestor states, unless the event is deemed handled. The event is considered handled if ANY of the following occur:
smf_set_state()
is called.smf_set_handled()
is called.
Event propagation is a powerful way of generalizing code and reducing duplication by adding handlers to parent states that will then be applied to all child states (presuming they don’t handle the event themselves).
Practical Guidelines
Use Large Comment Blocks
A large state machine in a .c
or .cpp
file can be difficult for unfamiliar developers to understand. State diagrams can help, but they require maintenance to keep them in sync with the code. Instead, I recommend adding large comment blocks before each state with the state name (including the hierarchy, which I like to write in “path” format, e.g. root/parent_state/child_state
) and a brief description of what the state does.
Do Not Use States For “Transient” Events
States should represent “states” the devices can in for a non-zero amount of time. In general, do not create states for things where you only run things on entry to that state and immediately transition away. States should always have a run function that looks for a particular event before transitioning to a different state.
Use A root State To Perform Actions That Would Apply In Any State
Most of my state machines end up needing a root
state to capture events and perform actions that would apply in any state. For example, you may have an ERROR
event in which you would want to transition to an error state, no matter what state you are in. Be aware however that this would also mean that you would re-enter the error state if the error event occurred when already in the error state. If you don’t want this to happen, make sure to add a handler for the ERROR
event in the error state and call smf_set_handled()
to prevent event propagation.
Don’t Pack Too Much Into A Single State Machine
It can be a bad idea to write all of your code in a single state machine. As the state machine grows in complexity, it can become difficult to understand and maintain.
Furthermore, many concepts when writing software are orthogonal to one another and should be managed in separate state machines. For example, consider a state machine with states LIGHT_1_ON
and LIGHT_1_OFF
. You then add a second light and add two child states, LIGHT_2_ON
, and LIGHT_2_OFF
to LIGHT_1_ON
. You realize you also need to handle the 2nd light when the first light is off, so you again and add two child states, LIGHT_2_ON
and LIGHT_2_OFF
to LIGHT_1_OFF
. You might start to see here that this is generally not a good design pattern, as the number of states will grow multiplicatively with the number of lights. Instead, you should probably just create two separate state machines, one for each light. If they need to be coordinated, you can use a event system to send events between the two state machines.
Nesting State Machines
As well as splitting up state machines, a valid design pattern to reduce code duplication and/or complexity is to nest state machines within one another. This is useful to encapsulate parts of your application which would be considered “implementation details” of another state machine.
The parent state machine can completely encapsulate the child state machine, in the sense that the user of the parent state machine has no idea the child state machine even exists! The parent state machine can create/initialize the child state machine in it’s own initialization, and the parent state machine can forward events to the child state machine
Event System
The Zephyr SMF does not provide an event system, so you will normally want to implement one yourself. This is a good thing, since they are not tricky to do and it allows flexibility for the many different ways applications can be designed.
Events can be generated in these ways:
- By an interrupt: You might have set up a GPIO pin to call a interrupt handler when the pin goes high. This can create an event and place it on the event queue.
- By a timer: Timers are a very popular way of generating events for things like flashing LEDs at a particular rate, or implementing timeouts. You can use Zephyr’s timers (which run in a different thread), and make their callback functions create events and place them on the event queue.
- By the state machines themselves: At any point in the
entry()
,run()
, orexit()
functions, you can create and post events to the event queue. This is useful for inter-state machine communication.
State machines can be placed into threads in a few different ways:
- Thread per state machine: Each state machine has a event queue that it blocks on, waiting for events. Each state machine can perform time consuming work, and it won’t hold up the other state machines.
- Single thread running all state machines (event loop): There is an event queue which the event loops blocks on, waiting for events. When a event is received, it is passed to each state machine in sequence (the order may or may not matter, depending on your application). In this case, you may only need the one thread (the one that
main()
runs in), so you have a single threaded application. - Hybrid approach: Most state machines run inside the same thread, but certain time consuming state machines are placed in separate threads.
I prefer a single thread running all state machines (event loop) approach. Any time consuming work can be offloaded into worker threads, and now you don’t have an synchronization issues between the state machines to deal with. It also keeps the state machines lightweight, and means you can more easily nest state machines or split them up to separate concerns.
The event loop (in pseudo code) looks like this:
Preventing Timer Synchronization Issues
The built in Zephyr timers call the timer expiry callbacks from the content of the system clock interrupt handler. Generally, you would post a TIMER_EXPIRED
event to the event queue (with no waiting if queue is full) and return. The event loop will then receive the event and pass it to the state machines. Because the callback is not synchronized with the event loop, you have to be mindful of synchronization issues. Be aware that after you stop a timer from a state machine function, you still may receive TIMER_EXPIRED
callbacks, and these could be already waiting on the event queue. Even if you emptied the queue first, you could get unlucky and have the timer interrupt fire just as you are calling the timer stop function.
A solutions to this is to not use Zephyr’s timers, but instead utilize the variable you can pass into a queue to wait for up to a specific maximum amount of time. Using this, you can implement timer functionality yourself. You can design a module which can be used to create multiple timers, and register timeouts in the future. When it comes around to blocking on the queue, you work out what timer would expire next, and provide that timeout to the queue. When the queue returns, you can check the return code to see if it received an event or it was due to the timeout. If it was due to the timeout, you can create a TIMER_EXPIRED
event and pass it to the state machines.
This approach makes the timers synchronous with the event loop, and fixes the problems discussed above.
Working Examples
For a full working example of the SMF that runs on Linux (the native_sim
board), see the GitHub repo gbmhunter / zephyr-examples.
Footnotes
-
Zephyr. Releases > Zephyr 2.7.0 [release notes]. Retrieved 2024-11-11, from https://docs.zephyrproject.org/latest/releases/release-notes-2.7.html. ↩
-
Zephyr (2024, Apr 21). State Machine Framework [documentation]. Retrieved 2024-11-11, from https://docs.zephyrproject.org/latest/services/smf/index.html. ↩