A previous devlog outlined the system that controls a living entity's behaviour. While the concept is still sound, the implementation became a total mess as more activity types were added.
Since then I have refactored it all to use Rust's excellent async/await
support. This post will
cover the ergonomic API this gives us, and this one will go into the guts of the implementation.
Past mistakes
Before getting to the juicy new design, I'll first go over what exactly made the old implementation unbearable to work with. You can skip to the next section if you don't want to dwell on bad design decisions.
So, what was wrong with the old implementation? I think enough to justify a multi-month rewrite:
Fragile state machine boilerplate
Every activity had to manually implement a state machine to handle its own control flow. Despite a
simple and linear flow of events (such as "go to this position, pick up the item, carry it here, drop it"),
control flow would continually jump between on_tick
and on_event
, with a lot of boilerplate
getting in the way.
Here's a small pseudocode1 example to show what a headache this quickly became:
- The initial state is
NotYetStarted
, and the activity is unblocked, soon_tick
is called. - Navigation to the destination is requested, and the state is now
GoingToTheThing
. We subscribe to the arrival event, and block the activity, soon_tick
will not be called again. on_event
is eventually called with the arrival event, which updates the state toDoingTheThing
and unblocks the activity, schedulingon_tick
again.on_tick
is called again, which kicks off whatever the "thing" is for this activity (e.g. breaking a block, or picking up an item).
enum State {
NotYetStarted,
GoingToTheThing,
DoingTheThing,
FinishingTheThing,
...
}
fn on_tick():
match self.state:
NotYetStarted:
request_navigation()
self.state = GoingToTheThing
subscribe_to_arrival_event()
block_activity_until_event()
DoingTheThing:
start_doing_the_thing()
subscribe_to_completion_event()
...
...
fn on_event(evt):
if evt is Arrival:
self.state = DoingTheThing
unblock_activity()
One event handler for all states
As activities progress through different states, they will be subscribed to different events -
the Arrived
event during navigation, the PickedUp
event when hauling an item, etc. This quickly
causes the single per-activity event handler to balloon in complexity, as it has to cope with
all combinations of event types with current state.
There's no need for a pseudocode example of this, just marvel at the gross complexity in the haul activity event handler.
No event handling in subactivities
Subactivities were meant to enable easy sharing of common behaviour, like navigation. In practice this didn't provide as much convenience as expected, due to the flawed event handler design.
Subactivities could subscribe to events but didn't have their own handler callback. This meant the calling activities had to know about their exact implementation details so they could handle the expected events... if this isn't the definition of "leaky abstraction" I don't know what is.
Manual interruption handling
A key goal of the activity system is for entities to react intelligently to their surroundings. Interruptions therefore need to be handled as well as the success case.
Juggling all the state manually while balancing the event free-for-all that was subactivities massively complicated this, which led to annoying bugs and edge cases.
But that's enough wallowing in self-pity, let's get onto the new design.
The way of async
I'll get into the implementation details of the custom async
executor in a future post; for now
I'll show its API and how it makes implementing new activities so much easier.
The wander behaviour is a simple example that demonstrates a pure activity that only uses subactivities without any of its own event handling:
// pseudo-code
async fn wander_activity(ctx) {
loop {
// walk to a nearby position
let pos = choose_wander_destination();
ctx.go_to(pos).await?;
// loiter for a few seconds
ctx.update_status(Loitering);
ctx.wait(3).await;
}
}
Compare its new implementation with the old - what a difference!
Linear control flow
The control flow is very easy to follow; there's no jumping around between functions. There's also no manual state wrangling, even though implicitly there are three (initial, going to the wander destination, loitering there).
The beauty of async/await
is that the compiler is generating these automatically, with all the
boilerplate for managing them. This includes the state of local variables, but I'm getting ahead of
myself - see below.
Self-contained subactivities
It uses the go_to
and wait
subactivities, which have the same behaviour as before, but look - no
leakage! The activity needs no event handler of its own, because all event handling is fully
contained within the subactivities. How it does this will be covered in a future post.
Another bonus is that there is no longer any Subactivity
trait, as they can be implemented as
simple async
methods.
Interruptions via RAII
As the state of local variables is automatically tracked for us by the compiler, this makes it
really easy to handle interruptions via destructors. Let's have a look inside the go_to
subactivity2:
// semi pseudo-code
struct GotoSubactivity { ... }
impl GotoSubactivity {
async fn go_to(&self, ctx, destination) -> Result<(), GotoError> {
// request navigation to destination, and get a unique token back [^2]
let path_token = request_navigation_to(destination);
// await specific arrival event
let goto_result = ctx
.subscribe_to_specific_until(me, EntityEventType::Arrived, |evt| match evt {
EntityEventPayload::Arrived(token, result) if token == path_token => Ok(result),
_ => Err(evt), // other unhandled events
})
.await;
let result = match goto_result {
None => return Err(GotoError::Cancelled), // bail out early
Some(Ok(_)) => Ok(()),
Some(Err(err)) => Err(err.into()),
};
// track completion
self.complete = true;
result
}
}
// destructor
impl Drop for GotoSubactivity {
fn drop(&mut self) {
if !self.complete {
self.ctx.abort_path();
}
}
}
When the wander activity invokes ctx.go_to
, this instantiates a GotoSubactivity
on the stack,
which contains a bool
field complete
to track the state. Its destructor drop
is invoked
on return in all cases, for completion as well as interruptions.
Within the destructor we can then cancel the navigation if it was not completed. In other behaviours like hauling we can do more complicated interruption handling such as dropping the item on the ground.
Activity status
Although not specific to the async
runtime, there has been an improvement to the subactivity info
displayed in the UI. Previously the current subactivity was displayed in the UI, as seen below
(borrowed from devlog 4):
This was restricted to the type of the subactivity, so it could not be very easily customised to show more fine-grained steps.
The API now looks like this:
pub trait Status: Display {
fn exertion(&self) -> f32;
}
impl ActivityContext {
fn update_status(&self, status: impl Status + 'static) { ... }
}
This lets an activity or subactivity define its own self-contained types for the current task, which allows for more useful status information, like the following:
Was the churn worth it?
This effort took multiple months of rewriting a whole lot of code3 that itself was only a few months old (and already documented in a devlog!), and seems to be the very definition of unnecessary code churn.
However I think in this case it was warranted; the feeling of despair that hit me when looking to add a new "go to and build block" activity would not have led to much enthusiasm for working on the project. The only way I can keep up the motivation to work on this game for years is if the codebase is a pleasure to work on, which this refactor has contributed to!
Next up is a deep dive into the internals of the custom async runtime.
Or a detour into other interesting topics like engine integration tests, terrain generation or data-driven architecture... I've made a lot of dev progress in the last year that needs writing about!
-
Trying to look at a real code example isn't a nice experience, but here's a link to the 500 lines of state management for the hauling activity. ↩
-
The path token returned from
request_navigation_to
serves to identify our specific path request amongst potentially many. This can happen when a new destination is requested while another is in progress; a cancellation event for the original will be triggered (Arrived(token, Err(GotoError::Cancelled))
), which unless filtered on the token will match the arrival logic and erroneously terminate the goto early. ↩ -
And time spent dev'ing is time not spent writing devlogs... The rate at which I get these out is due to increase substantially. ↩