Rethinking State Machines

I have made a few extensions and Domain Specific Languages (DSLs) to represent state machines.

Some were frameworks in a native language, including versions using the State Chart pattern. One was in C++ integrated with an event and entity framework so that entities could participate in threaded declarative workflows assembled instead of derived.

But the ones I enjoyed most were done in Lisp. The simplest was a sexpr based notation that was read by clisp and emitted Java. It was, of course, a macro. The expression of the machine was executable and the execution was to emit Java. It generated interfaces, abstract base classes, 100% of the POJO that represented the messages, and required no fences — it was a full code generator, not a wizard (no generated code was edited by hand or saved in the version control system).

I did another recently. This time, I decided to tackle the interesting problem: state machines become unbearably big and repetitive if done following the conventional model.

So I decided to toss the rules aside.

The Problem (Using an External View)

I’m going to model a mechanical lamp. Unlike the prior lamps I’ve written about, this time it’s purely an implementation detail.

From the outside, the lamp has two states: on and off. In UML:

Initial On and Off

This lamp has two inputs, though. It has a power plug and a pull-string that flips a switch (so the user doesn’t see the switch, and flips it indirectly).

This is important: there are actually two attributes that “make up” the state of the lamp. The on and off are the external presentation, but they aren’t internal.

The attributes are:

boolean plugged_in  -- true when the plug is plugged in
boolean switched_on -- true when the mechanical switch is in the closed position so power flows through it

Of course, that leads to the following “rule” for when the lamp is lit:

plugged_in && switched_on  -- when both attributes are true the lamp is lit

The events that can happen (they have no parameters) are:

event pull_plug
event insert_plug
event flip_switch

Now, the actions are external: how they work we don’t care, but “somehow” this machine can cause external effects. They are:

action turn_on()  -- causes the light to turn on
action turn_off() -- causes the light to turn off

This is a simple system. Now, the challenge is this: when should the “turn the light on” and “turn the light off” action be performed? The state machine shows that the light has been turned on or off. The attributes indicate what the light should be showing.

Now, using a conventional state machine, let’s solve this. First, we’ll do one using the On/Off model above.

We’ll start in off, so the first step is reaching on:

Turning it on

Well … that doesn’t seem so bad. However, with that diagram so far, it’s not possible to turn on the lamp. We have no way to set is_powered or plugged_in when the other isn’t already set. So, more transitions are needed …

Setting the attributes that do NOT cause a change of state when off

Of course, that’s not looking so simple … may as well finish it. We have to turn off still …

Final Naive Machine

Yeah … that’s a hideous machine. Honestly, the code is simpler to understand than that machine is.

Why is it so bad? Was it because I modeled the machine on the external view? Partly. Most would have modeled the state machine as an internal view. Would that have made a dramatic difference? Well, let’s see …

The Problem (Using an Internal View)

Going internal means more states … in fact, twice as many. The reason is obvious with the choice of the first, starting powered off and switched off:


OK, let’s see the actual four states before we see what happens to our transitions …

Just the states

The transitions are almost trivial, but here they are to turn on the light …

Turning on the Light

And of course, the reverses …

Full Internal State Diagram

Immediately, the internal view is simpler to understand. It actually lost all the guard conditions. Of course, we’re also relying on the fact that when the event has no effect it’s ignored, which is why there’s no event for pull_plug in the starting state for instance (it would have no effect).

This trivial machine is actually wrong. In fact, by using internal view, we discard the state variables we defined earlier. The state becomes that. The machine collapses into this:

Normal Internal Diagram

We could have removed the action calls for turn_on() and turn_off() by making them entry/exit on the only on state as well.

However, even with the above, it’s more complex than the logic warrants.

Discarding the “Zero or One Transition Per Event” Rule

The actual logic I want is very simple:

In state OFF:
  on event plug_inserted: plugged_in = true
  on event flip_switch: switched_on = ! switched_on

  on event AUTO: [plugged_in and switched_on]: transition: ON

That’s just random pseudo-code but it’s readable. Handle the event, then AFTER handling the event, fire the AUTO events.

For this to work, of course, we need entry/exit events (which would have reduced the number of action references to turn_on() and turn_off() had we used them in the earlier examples, but would not have reduced the number of transitions):

State ON:
  entry: turn_on()
  exit: turn_off()

In state ON:
  on event pull_plug: plugged_in = false
                      transition: OFF
  on event flip_switch: switched_on = false
                        transition: OFF

Going from ON to OFF is normal rules. It’s the AUTO in the OFF state handler that breaks the zero or one transition rule.

In a more complex machine, it’s actually very common to have the pattern where a (single) event occurs, and there are cascading effects. This can be “faked” in a normal state machine, by generating internal events of higher priority than external events. It can be faked other ways. But in all cases, the point is that an event has a single reaction or none.

I decided to discard that in the above example. But it’s not just AUTO … it’s the whole concept. Why shouldn’t an event have many concurrent reactions so long as there’s no conflict? In other words, why do I have to control things so tightly? Why not allow my system to do the right thing without having to handle combinatorial explosion of everything explicitly?

A Multiple Firing Event Reaction

Sticking with the shockingly simple example, let’s add a non-stateful attribute. I want to count how many times I’ve been plugged in or had the switch flipped. Of course, I can add more code in the actions, but setting the state and updating counters are independent. Or, to think of it another way: an event might have multiple complex and orthogonal consequences.

To support this, I want the ability to add more logic without altering existing working behavior. This additivity is very simpler to being “open to extension but closed to modification” in OO.

I might add (NOT EDIT) the following bit of code (a NEW BLOCK without changing the prior):

In state *:
  on event plug_inserted: pluged_in_counter += 1
  on event flip_switch:  switched_on_counter += 1

I need parallel firing logic (I’ll describe that next) but that additional bit of code extends the machine without altering any of the existing machine. Of course, it fires in all states (assuming ‘*’ means any state) but … it’s much better than tracking down every transition and adding all the cases.

This machine now has plug_inserted and flip_switch having MANY “transitions.” But, honestly, it doesn’t. It has many reactions though. It actually has a set of them.

So long as the set of reactions doesn’t have conflicting transition clauses there is no reason that all reactions shouldn’t occur — in parallel. There’s that parallel concept again.

A conflict would occur if there were reactions that has “transition: OFF” and “transition: ON” in the same set of guarded event reactions. It’s not possible to simultaneously transition to multiple states. But when there is no transition, or when there are one or more that go to the same state (regardless of the non-transition cases) those do not conflict.

In this way, the gathering of the reactions allows us to validate integrity before we do any alterations to the system. First we find reactions, then we verify that there are no transitions conflicts, and only then do we fire all the side-effects and (optionally) transition.

And once we’ve handled all the reactions … we set the event to AUTO and do it all again. It is also parallel. Of course, this can lead to non-halting — which will be examined after the parallel firing.

Parallel Firing Logic

Something I learned when working in Totl inside GTS (which we pulled from the market) was the power of parallel logic. Parallel firing means that the operations are all double-buffered.

While the state is unchanged all the logic is evaluated. All the guard conditions see the same values because they only read but never update values. If there are three or 2000 various guard conditions checking variables, they would all be checked before any actions (which have side-effects by intent) are run.

This is the initial buffer in the double-buffer system. All “read and access” to all attributes and data is to the start of the cycle (which starts with the arrival of the event). It’s as if a snapshot of the state were taken.

Once all decision making is done, the cycle now checks for a conflict. This is the concern above where there are transitions to more than one new state. So long as there are no conflicts, all the actions selected (no matter how many reactions are in the execution set discovered to match by the guards and state/event sets) are carried out.

This is the second half of the double buffer system. Changes occur.

This logic is radically different from conventional program logic. In programming, it’s the responsibility of the developer to specify the sequence in which things happen. When working in a parallel firing world actions happen “concurrently” even in a single-threaded system. Formally, the order in which the actions occur is non-deterministic (or undefined).

This does require that the actions be capable of invocation in any order — but that’s better design. It “exposes” assumptions of temporal ordering, which are always implicit in sequential programming, but which are absent in parallel.

When writing in a parallel model, some things are harder. It’s very hard to say “when A occurs, do B, C, and D in order.” On the other hand, it’s very easy to define “when some special case occurs, do E and F in any order.”

Just using the limited parallel scoping in code generated from a state chart makes it impossible to specify the sequence of actions. When multiple actions are listed in a single reaction, those will go “in that order.” But the order in which reactions themselves are invoked is non-deterministic. Should there need to be precise control of sequence a parallel firing state machine adds states.

Of course, because of AUTO handling, those states can be stepped through in response to a single arriving event. This is the means by which control of sequence is retained in an additive world of parallel firings.

Which is why there’s a very real risk (alluded to earlier) of becoming non-halting.

The Infinite Loop of Agony (Non-Halting Risks)

The concept of “After a cycle, evaluate AUTO reactions” enables very powerful effects. It also puts a loop into the event handling state machinery.

The logic is a do until loop. This simple activity diagram shows the steps (and also shows the reason double-buffering doesn’t require copying in this scenario):

Event Handling LOOP

Having a reaction that always fires (such as for logging) would cause the system to loop forever.

In regular usage the guard conditions and states make such a situation unlikely, but it absolutely can happen that something always fires and the system seems to hang.

This Model Isn’t Theoretical

In fact, I needed a Lisp implementation for a WebSocket based reliable protocol. I built the state machinery described in this article in Lisp, and the names I used in my lamp examples are from it (though with underscore instead of hyphen as Lisp does).

The machine that drives the server-side of the protocol is also relying on a notion of generated events (“derived” but not in the OO sense) so is outside the scope of this article. But, here’s the lamp’s machine:

(define-state-machine machine-lamp-2 ((&optional plugged-in switched-on) off)
  (:guard is-switched-on (() switched-on))
  (:guard is-plugged-in (() plugged-in))
  (:event insert-plug ())
  (:event pull-plug ())
  (:event flip-switch ())
  ;; control the light itself (assume these are external commands that run relays)
  (:action turn-on (() (format t "Light Turns On~%")))
  (:action turn-off (() (format t "Light Turns Off~%")))
  ;; control the inner state of the input devices
  (:action plug-pulled (() (setf plugged-in nil)))
  (:action plug-inserted (() (setf plugged-in t)))
  (:action reverse-switch (() (setf switched-on (not switched-on))))
  (:state off)
  (:state on
          (:entry (turn-on))
          (:exit (turn-off)))
  (:in on
       (pull-plug (:actions (plug-pulled)) (:to off) )
       (flip-switch (:actions (reverse-switch)) (:to off)))
  ;; Respond to input events, which in off, only changes state
  (:in off (insert-plug (:actions (plug-inserted))))
  (:in off (pull-plug   (:actions (plug-pulled))))
  (:in off (flip-switch (:actions (reverse-switch))))
  (:in off (:auto (:guards (and (is-switched-on) (is-plugged-in))) (:to on))))

Of course, the syntax and naming is Lisp. Define-state-machine macro is a declarative specification of a machine that has plugged-in and switched-on as the named variables it works with.

And using the machine (creating it and running it) via the REPL with the logging turned on gives this:

GWC-STATE-MACHINE> (defvar **sm (make-machine-lamp-2))

GWC-STATE-MACHINE> (start-machine **sm)
Consume-event given: (#:|:STARTING4| NIL)
  Event processing starts in state: :STARTING4
    Matched event :STARTING4 with args NIL
  Current state: :STARTING4
  Event checking for (#:|:STARTING4| NIL)
      matched-transition (#:|:STARTING4| #:|:STARTING4| NIL NIL OFF)
  No actions for matched transition (#:|:STARTING4| #:|:STARTING4| NIL NIL OFF)
  Transitioning to state OFF
  Checking non-event transitions
    Guard  IS-SWITCHED-ON fails
  Event process complete with ending state: OFF

GWC-STATE-MACHINE> (consume-event **sm :insert-plug)
Consume-event given: :INSERT-PLUG
  Event processing starts in state: OFF
    Matched event INSERT-PLUG with args NIL
  Current state: OFF
  Event checking for (:INSERT-PLUG NIL)
      matched-transition (OFF :INSERT-PLUG NIL ((PLUG-INSERTED)) NIL)
  Performing actions in current state
    Action PLUG-INSERTED executed
  Checking non-event transitions
    Guard  IS-SWITCHED-ON fails
  Event process complete with ending state: OFF

GWC-STATE-MACHINE> (consume-event **sm :flip-switch)
Consume-event given: :FLIP-SWITCH
  Event processing starts in state: OFF
    Matched event FLIP-SWITCH with args NIL
  Current state: OFF
  Event checking for (:FLIP-SWITCH NIL)
      matched-transition (OFF :FLIP-SWITCH NIL ((REVERSE-SWITCH)) NIL)
  Performing actions in current state
    Action REVERSE-SWITCH executed
  Checking non-event transitions
    Guard  IS-SWITCHED-ON passes
    Guard  IS-PLUGGED-IN passes
      matched-transition (OFF #:|:AUTO3|
                          ((AND (AND (IS-SWITCHED-ON) (IS-PLUGGED-IN)))) NIL ON)
  No actions for matched transition (OFF #:|:AUTO3|
                                       (AND (IS-SWITCHED-ON) (IS-PLUGGED-IN))))
                                     NIL ON)
  Transitioning to state ON
  Entry actions for state ON
Light Turns On
    Action TURN-ON executed
  Checking non-event transitions
  Event process complete with ending state: ON

GWC-STATE-MACHINE> (consume-event **sm :flip-switch)
Consume-event given: :FLIP-SWITCH
  Event processing starts in state: ON
    Matched event FLIP-SWITCH with args NIL
  Current state: ON
  Event checking for (:FLIP-SWITCH NIL)
      matched-transition (ON :FLIP-SWITCH NIL ((REVERSE-SWITCH)) OFF)
  Exit actions for state ON
Light Turns Off
    Action TURN-OFF executed
  Performing transition actions for (ON :FLIP-SWITCH NIL ((REVERSE-SWITCH)) OFF)
    Action REVERSE-SWITCH executed
  Transitioning to state OFF
  Checking non-event transitions
    Guard  IS-SWITCHED-ON fails
  Event process complete with ending state: OFF

The Power of Declarative Machines

I added the blank lines after each interaction. I invoked the machine and sent the fewest events needed to get the light to come on and then go off. In the midst of that spew are the external actions “Light Turns On” and “Light Turns Off” at the left margin. The rest is logging output.

Ignoring the overwhelming detail of that run … there was no logging code in the declarative machine. The logging is all generated as part of the code-generation from the declarative specification.

No matter your DSL foundation, a declarative machine provides for standardization in ways that are very powerful. Because this machine is actually built using the violations of the normal state machine expectations, being able to detect what’s happening is even more important for the new user. The spew from my production protocol handler is overwhelming.


I found that discarding some of the state machine’s fundamental expectations made for very powerful machines with fewer states and less repetition. Becoming extensible while being closed to modification was another benefit I gained.

The lamp itself is a trivial example. It’s simple enough that coding it manually is only a few lines.

My protocol handler was a 240 line state machine. Most of the lines were comments or white-space, like any other declarative program. But, even so, it generated 1934 lines (per the macro expansion — no blank lines or comments) to support the declarative machine. I’d have written “by hand” less than 1934 lines, but I’d never have written anywhere near as low as 240 in order to implement that protocol.

I’ll later write about the machine I implemented, and notion of “derived events.”

Until then, remember: just because something is forbidden or impossible doesn’t mean you can’t or shouldn’t do it and possibly benefit greatly from it. Break those rules!

2 thoughts on “Rethinking State Machines

  1. Pingback: The Trap of Diagrams for Meetings | Limitless Knowledge Association

  2. Pingback: Starting a State Machine External Domain Specific Language (DSL) | Limitless Knowledge Association

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s