Dom Williams

Devlog #6: async activities

A previous devlog outlined the system that controls a living entity's behaviour. While the concept is still sound, the implementation became a total mess as more activity types were added.

Since then I have refactored it all to use Rust's excellent async/await support. This post will cover the ergonomic API this gives us, and this one will go into the guts of the implementation.

Past mistakes

Before getting to the juicy new design, I'll first go over what exactly made the old implementation unbearable to work with. You can skip to the next section if you don't want to dwell on bad design decisions.

So, what was wrong with the old implementation? I think enough to justify a multi-month rewrite:

Fragile state machine boilerplate

Every activity had to manually implement a state machine to handle its own control flow. Despite a simple and linear flow of events (such as "go to this position, pick up the item, carry it here, drop it"), control flow would continually jump between on_tick and on_event, with a lot of boilerplate getting in the way.

Here's a small pseudocode1 example to show what a headache this quickly became:

enum State {
    NotYetStarted,
    GoingToTheThing,
    DoingTheThing,
    FinishingTheThing,
    ...
}

fn on_tick():
    match self.state:
        NotYetStarted:
            request_navigation()
            self.state = GoingToTheThing
            subscribe_to_arrival_event()
            block_activity_until_event()

        DoingTheThing:
            start_doing_the_thing()
            subscribe_to_completion_event()
            ...

        ...

fn on_event(evt):
    if evt is Arrival:
        self.state = DoingTheThing
        unblock_activity()

One event handler for all states

As activities progress through different states, they will be subscribed to different events - the Arrived event during navigation, the PickedUp event when hauling an item, etc. This quickly causes the single per-activity event handler to balloon in complexity, as it has to cope with all combinations of event types with current state.

There's no need for a pseudocode example of this, just marvel at the gross complexity in the haul activity event handler.

No event handling in subactivities

Subactivities were meant to enable easy sharing of common behaviour, like navigation. In practice this didn't provide as much convenience as expected, due to the flawed event handler design.

Subactivities could subscribe to events but didn't have their own handler callback. This meant the calling activities had to know about their exact implementation details so they could handle the expected events... if this isn't the definition of "leaky abstraction" I don't know what is.

Manual interruption handling

A key goal of the activity system is for entities to react intelligently to their surroundings. Interruptions therefore need to be handled as well as the success case.

Juggling all the state manually while balancing the event free-for-all that was subactivities massively complicated this, which led to annoying bugs and edge cases.

But that's enough wallowing in self-pity, let's get onto the new design.

The way of async

I'll get into the implementation details of the custom async executor in a future post; for now I'll show its API and how it makes implementing new activities so much easier.

The wander behaviour is a simple example that demonstrates a pure activity that only uses subactivities without any of its own event handling:

// pseudo-code
async fn wander_activity(ctx) {
    loop {
        // walk to a nearby position
        let pos = choose_wander_destination();
        ctx.go_to(pos).await?;

        // loiter for a few seconds
        ctx.update_status(Loitering);
        ctx.wait(3).await;
    }
}

Compare its new implementation with the old - what a difference!

Linear control flow

The control flow is very easy to follow; there's no jumping around between functions. There's also no manual state wrangling, even though implicitly there are three (initial, going to the wander destination, loitering there).

The beauty of async/await is that the compiler is generating these automatically, with all the boilerplate for managing them. This includes the state of local variables, but I'm getting ahead of myself - see below.

Self-contained subactivities

It uses the go_to and wait subactivities, which have the same behaviour as before, but look - no leakage! The activity needs no event handler of its own, because all event handling is fully contained within the subactivities. How it does this will be covered in a future post.

Another bonus is that there is no longer any Subactivity trait, as they can be implemented as simple async methods.

Interruptions via RAII

As the state of local variables is automatically tracked for us by the compiler, this makes it really easy to handle interruptions via destructors. Let's have a look inside the go_to subactivity2:

// semi pseudo-code
struct GotoSubactivity { ... }

impl GotoSubactivity {
    async fn go_to(&self, ctx, destination) -> Result<(), GotoError> {
        // request navigation to destination, and get a unique token back [^2]
        let path_token = request_navigation_to(destination);

        // await specific arrival event
        let goto_result = ctx
          .subscribe_to_specific_until(me, EntityEventType::Arrived, |evt| match evt {
              EntityEventPayload::Arrived(token, result) if token == path_token => Ok(result),
              _ => Err(evt), // other unhandled events
          })
          .await;

        let result = match goto_result {
            None => return Err(GotoError::Cancelled), // bail out early
            Some(Ok(_)) => Ok(()),
            Some(Err(err)) => Err(err.into()),
        };

        // track completion
        self.complete = true;

        result
    }
}

// destructor
impl Drop for GotoSubactivity {
    fn drop(&mut self) {
        if !self.complete {
            self.ctx.abort_path();
        }
    }
}

When the wander activity invokes ctx.go_to, this instantiates a GotoSubactivity on the stack, which contains a bool field complete to track the state. Its destructor drop is invoked on return in all cases, for completion as well as interruptions.

Within the destructor we can then cancel the navigation if it was not completed. In other behaviours like hauling we can do more complicated interruption handling such as dropping the item on the ground.

Activity status

Although not specific to the async runtime, there has been an improvement to the subactivity info displayed in the UI. Previously the current subactivity was displayed in the UI, as seen below (borrowed from devlog 4):

GIF of current subactivity shown in the UIPrevious implementation showing the current activity and subactivity.

This was restricted to the type of the subactivity, so it could not be very easily customised to show more fine-grained steps.

The API now looks like this:

pub trait Status: Display {
    fn exertion(&self) -> f32;
}

impl ActivityContext {
    fn update_status(&self, status: impl Status + 'static) { ... }
}

This lets an activity or subactivity define its own self-contained types for the current task, which allows for more useful status information, like the following:

GIF of more details activity statusAs the entity collects materials for a build, there are many more status updates than previously.

Was the churn worth it?

This effort took multiple months of rewriting a whole lot of code3 that itself was only a few months old (and already documented in a devlog!), and seems to be the very definition of unnecessary code churn.

However I think in this case it was warranted; the feeling of despair that hit me when looking to add a new "go to and build block" activity would not have led to much enthusiasm for working on the project. The only way I can keep up the motivation to work on this game for years is if the codebase is a pleasure to work on, which this refactor has contributed to!

Next up is a deep dive into the internals of the custom async runtime.

Or a detour into other interesting topics like engine integration tests, terrain generation or data-driven architecture... I've made a lot of dev progress in the last year that needs writing about!


  1. Trying to look at a real code example isn't a nice experience, but here's a link to the 500 lines of state management for the hauling activity. 

  2. The path token returned from request_navigation_to serves to identify our specific path request amongst potentially many. This can happen when a new destination is requested while another is in progress; a cancellation event for the original will be triggered (Arrived(token, Err(GotoError::Cancelled))), which unless filtered on the token will match the arrival logic and erroneously terminate the goto early. 

  3. And time spent dev'ing is time not spent writing devlogs... The rate at which I get these out is due to increase substantially.