The Zephyr State Machine Framework (SMF)

Authors:

Published On:

Oct 2, 2024

Last Updated:

Nov 22, 2024

Zephyr provides a state machine framework (SMF) which can be used to implement finite state machines (FSMs) and hierarchical state machines (HSMs). SMF was added to Zephyr in v2.7.0¹.

A diagram of a basic hierarchical LED state machine (HSM) that you could build using the Zephyr SMF.

Building A FSM

The first thing you need to do is enable the framework in your prj.conf:

CONFIG_SMF=y

Then, you can start creating a state machine in a .c file. First things first, make sure to import the SMF header file:

#include <zephyr/smf.h>

You will likely want to access data within your state machine. The recommended way to do this is to create a struct to hold your data, with the first member being special struct smf_ctx ctx. The SMF will pass this object to all of your state machine functions, and also access and modify the ctx member to keep track of the state machines internal state.

typedef struct {
    struct smf_ctx ctx;
    // User defined data goes here
    uint32_t event;
} MyStateMachine;

We also need to define our states as an enumeration:

typedef enum {
    S1,
    S2,
} MyState;

Now let’s create some state machine functions. The SMF is designed so that you can have up to three functions per state:

The entry function: Called whenever the state machine enters the state.
The run function: Called when the state machine is in this state and smf_run_state() is called.
The exit function: Called whenever the state machine exits the state.

Let’s create a simple FSM (no hierarchical states) with two states: s1 and s2.

static void s1_entry(void *data) {
    printf("State 1 entry.\n");
}

static void s1_run(void *data) {
    printf("State 1 run.\n");

static void s1_exit(void *data) {
    printf("State 1 exit.\n");
}

static void s2_entry(void *data) {
    printf("State 2 entry.\n");
}

static void s2_run(void *data) {
    printf("State 2 run.\n");
}

static void s2_exit(void *data) {
    printf("State 2 exit.\n");
}

The void *data parameter is actually a pointer to the state machine object (e.g. of type MyStateMachine * in our example). You can safely cast this void *data pointer to the correct type and access “member variables” of the state machine object. For example:

static void s1_run(void *data) {
    // Cast the data pointer to the correct type
    MyStateMachine *obj = (MyStateMachine *)data;
    // Now we can access member variables
    printf("myEvent: %d.\n", obj->myEvent);
}

Transitioning Between States

You can call the smf_set_state() function inside state run() functions to transition to a different state. The below example makes the state machine transition from S1 to S2 when the event member of the MyStateMachine object is set to 1.

static void s1_run(void *data) {
    MyStateMachine *obj = (MyStateMachine *)data;
    if (obj->event == 1) {
        smf_set_state(SMF_CTX(obj), &l_myStateTable[S2]);
        // We are now in state S2!
        return;
    }
}

We need to register all these functions using a state table. We can use the SMF_CREATE_STATE() macro to create the state table easily. It takes 5 parameters:

SMF_CREATE_STATE(entry_function, run_function, exit_function, parent, initial)

The parent parameter is optional and can be used to create a hierarchical state machine (more on this below). The initial parameter is optional and can be used to set the initial state of the state machine.

// Populate state table
static const struct smf_state l_myStateTable[] = {
        [S1] = SMF_CREATE_STATE(s1_entry, s1_run, s1_exit, NULL, NULL),
        [S2] = SMF_CREATE_STATE(s2_entry, s2_run, s2_exit, NULL, NULL),
};

We can then run the state machine!

int main(void)
{
    // Create state machine object
    MyStateMachine sm;

    // Set initial state
    smf_set_initial(SMF_CTX(&sm), &l_myStateTable[S1]);

    // Run the state machine
    while(1) {
        int32_t ret = smf_run_state(SMF_CTX(&sm));
        if (ret) {
            // State machine terminates if a non-zero value is returned
            break;
        }
        k_msleep(1000);
    }
}

Note that the example above is not very ideal since is sleeps for 1s between calls to the run() function of whatever state it is in. This would make the state machine seem unresponsive. If you decrease the sleep time, the state machine will appear more responsive, but you’ll end up churning through CPU cycles needlessly. This would be an issue in a multi-threaded application or if low power consumption is a concern. A better approach is to build a event system alongside the state machine. When events arrive, they are handled by the state machine. Events could be created by interrupts, timers, or other threads. Because you cannot pass extra variables into state functions, you have to save the event data to the sm object which the state functions have access to.

Below is pseudo code of the idea:

while(1) {
    event = eventQueue.get(); // Blocks until an event is available
    sm.event = event; // Save the event to the state machine object, this is the only way to get data "into" the state machine
    smf_run_state(SMF_CTX(&sm)); // Run the state machine
}

This is a placeholder for the reference: tbl-smf-prj-conf-options lists the different options you can set in your prj.conf to configure the SMF².

Zephyr state machine framework (SMF) configuration options.
Option	Description	Default
`CONFIG_SMF`	`y` enables the state machine framework, `n` disables it.	`n`
`CONFIG_SMF_ANCESTOR_SUPPORT`	`y` enables support for ancestor state machines, `n` disables it.	`n`
`CONFIG_SMF_INITIAL_TRANSITION`	`y` enables support for initial transitions in HSMs, `n` disables it.	`n`

Hierarchical State Machines

SMF supports hierarchical state machines (HSMs). HSMs are very useful in real-world applications for managing complex systems. They help manage complexity and reduce code duplication by allowing you to group states into sub-trees and have events propagate “up” to parent states until they are handled.

To create a HSM, you pretty much do the same thing as with a FSM, except you provide a non-null parent parameter to the SMF_CREATE_STATE() macro. For example:

// Populate state table
static const struct smf_state l_myStateTable[] = {
    [ROOT] = SMF_CREATE_STATE(root_entry, root_run, root_exit, NULL, NULL),
    [S1] = SMF_CREATE_STATE(s1_entry, s1_run, s1_exit, &l_myStateTable[ROOT], NULL),
    [S1A] = SMF_CREATE_STATE(s1a_entry, s1a_run, s1a_exit, &l_myStateTable[S1], NULL),
    [S1B] = SMF_CREATE_STATE(s1b_entry, s1b_run, s1b_exit, &l_myStateTable[S1], NULL),
    [S2] = SMF_CREATE_STATE(s2_entry, s2_run, s2_exit, &l_myStateTable[ROOT], NULL),
};

The state functions (entry(), run(), and exit()) are defined in the exact same way as with a FSM. This would create the state machine shown in This is a placeholder for the reference: fig-hierarchical-state-diagram.

A diagram of the hierarchical state machine created in the above code.

Transitioning between states in HSMs follows these rules:

All of the exit functions from the current state up to but not including the lowest common ancestor (LCA) state are called. The current state exit function is called first, and then in order working upwards towards the LCA (but not the LCA itself).
All of the entry functions from (but not including) the LCA state down to the next state are called. The entry functions are called in order from the child of the LCA all the way down to the next state.

If we use our root, S1, S1A, S1B, and S2 states from the example above, the transitions would be handled as follows:

An initial transition to S1 would call root_entry() and s1_entry().
EVT1 arriving would cause a transition from S1 to S1A. This is a transition to a child state and would just call s1a_entry().
EVT2 arriving would cause a transition from S1A to S1B. This is a transition to a sibling state and would call s1a_exit() then s1b_entry().
EVT3 arriving would cause a transition from S1B to S2. This is a transition to a different branch of the state machine and would call s1b_exit(), s1_exit(), and then s2_entry().

Event Propagation

When you call smf_run_state() with a HSM, the current states run() function is not the only one that can be called. The SMF library will also call the run() function of all ancestor states, unless the event is deemed handled. The event is considered handled if ANY of the following occur:

smf_set_state() is called.
smf_set_handled() is called.

Event propagation is a powerful way of generalizing code and reducing duplication by adding handlers to parent states that will then be applied to all child states (presuming they don’t handle the event themselves).

Practical Guidelines

Use Large Comment Blocks

A large state machine in a .c or .cpp file can be difficult for unfamiliar developers to understand. State diagrams can help, but they require maintenance to keep them in sync with the code. Instead, I recommend adding large comment blocks before each state with the state name (including the hierarchy, which I like to write in “path” format, e.g. root/parent_state/child_state) and a brief description of what the state does.

/* ======================================================================== */
/* root/mode_1/idle: Initial idle state. */
/* ======================================================================== */
static void idle_entry(void *data) {
    printf("idle entry.\n");
}

static void idle_run(void *data) {
    printf("idle run.\n");
}

static void idle_exit(void *data) {
    printf("idle exit.\n");
}

/* ======================================================================== */
/* root/mode_1/run: Runs when an event ID of 1 occurs. */
/* ======================================================================== */

// ...

/* ======================================================================== */
/* root/mode_2: Error state, device goes into disabled state until manual reset. */
/* ======================================================================== */

// ...

Do Not Use States For “Transient” Events

States should represent “states” the devices can in for a non-zero amount of time. In general, do not create states for things where you only run things on entry to that state and immediately transition away. States should always have a run function that looks for a particular event before transitioning to a different state.

Use A root State To Perform Actions That Would Apply In Any State

Most of my state machines end up needing a root state to capture events and perform actions that would apply in any state. For example, you may have an ERROR event in which you would want to transition to an error state, no matter what state you are in. Be aware however that this would also mean that you would re-enter the error state if the error event occurred when already in the error state. If you don’t want this to happen, make sure to add a handler for the ERROR event in the error state and call smf_set_handled() to prevent event propagation.

Don’t Pack Too Much Into A Single State Machine

It can be a bad idea to write all of your code in a single state machine. As the state machine grows in complexity, it can become difficult to understand and maintain.

Furthermore, many concepts when writing software are orthogonal to one another and should be managed in separate state machines. For example, consider a state machine with states LIGHT_1_ON and LIGHT_1_OFF. You then add a second light and add two child states, LIGHT_2_ON, and LIGHT_2_OFF to LIGHT_1_ON. You realize you also need to handle the 2nd light when the first light is off, so you again and add two child states, LIGHT_2_ON and LIGHT_2_OFF to LIGHT_1_OFF. You might start to see here that this is generally not a good design pattern, as the number of states will grow multiplicatively with the number of lights. Instead, you should probably just create two separate state machines, one for each light. If they need to be coordinated, you can use a event system to send events between the two state machines.

Nesting State Machines

As well as splitting up state machines, a valid design pattern to reduce code duplication and/or complexity is to nest state machines within one another. This is useful to encapsulate parts of your application which would be considered “implementation details” of another state machine.

The parent state machine can completely encapsulate the child state machine, in the sense that the user of the parent state machine has no idea the child state machine even exists! The parent state machine can create/initialize the child state machine in it’s own initialization, and the parent state machine can forward events to the child state machine

Event System

The Zephyr SMF does not provide an event system, so you will normally want to implement one yourself. This is a good thing, since they are not tricky to do and it allows flexibility for the many different ways applications can be designed.

Events can be generated in these ways:

By an interrupt: You might have set up a GPIO pin to call a interrupt handler when the pin goes high. This can create an event and place it on the event queue.
By a timer: Timers are a very popular way of generating events for things like flashing LEDs at a particular rate, or implementing timeouts. You can use Zephyr’s timers (which run in a different thread), and make their callback functions create events and place them on the event queue.
By the state machines themselves: At any point in the entry(), run(), or exit() functions, you can create and post events to the event queue. This is useful for inter-state machine communication.

State machines can be placed into threads in a few different ways:

Thread per state machine: Each state machine has a event queue that it blocks on, waiting for events. Each state machine can perform time consuming work, and it won’t hold up the other state machines.
Single thread running all state machines (event loop): There is an event queue which the event loops blocks on, waiting for events. When a event is received, it is passed to each state machine in sequence (the order may or may not matter, depending on your application). In this case, you may only need the one thread (the one that main() runs in), so you have a single threaded application.
Hybrid approach: Most state machines run inside the same thread, but certain time consuming state machines are placed in separate threads.

I prefer a single thread running all state machines (event loop) approach. Any time consuming work can be offloaded into worker threads, and now you don’t have an synchronization issues between the state machines to deal with. It also keeps the state machines lightweight, and means you can more easily nest state machines or split them up to separate concerns.

The event loop (in pseudo code) looks like this:

while (true) {
    evt = queue_get();
    dispatch_event_to_state_machines(evt);
}

Preventing Timer Synchronization Issues

The built in Zephyr timers call the timer expiry callbacks from the content of the system clock interrupt handler. Generally, you would post a TIMER_EXPIRED event to the event queue (with no waiting if queue is full) and return. The event loop will then receive the event and pass it to the state machines. Because the callback is not synchronized with the event loop, you have to be mindful of synchronization issues. Be aware that after you stop a timer from a state machine function, you still may receive TIMER_EXPIRED callbacks, and these could be already waiting on the event queue. Even if you emptied the queue first, you could get unlucky and have the timer interrupt fire just as you are calling the timer stop function.

A solutions to this is to not use Zephyr’s timers, but instead utilize the variable you can pass into a queue to wait for up to a specific maximum amount of time. Using this, you can implement timer functionality yourself. You can design a module which can be used to create multiple timers, and register timeouts in the future. When it comes around to blocking on the queue, you work out what timer would expire next, and provide that timeout to the queue. When the queue returns, you can check the return code to see if it received an event or it was due to the timeout. If it was due to the timeout, you can create a TIMER_EXPIRED event and pass it to the state machines.

This approach makes the timers synchronous with the event loop, and fixes the problems discussed above.

Working Examples

For a full working example of the SMF that runs on Linux (the native_sim board), see the GitHub repo gbmhunter / zephyr-examples.

Zephyr. Releases > Zephyr 2.7.0 [release notes]. Retrieved 2024-11-11, from https://docs.zephyrproject.org/latest/releases/release-notes-2.7.html. ↩
Zephyr (2024, Apr 21). State Machine Framework [documentation]. Retrieved 2024-11-11, from https://docs.zephyrproject.org/latest/services/smf/index.html. ↩