Announcing a new operation: Workflow Update

temporal logo meteor indigo

Dan Davison

Senior Software Engineer

We’re excited to announce that Workflow Update is now Generally Available. This is a major new feature: in this blog post we’re going to look into why it’s important, see some examples of what it looks like to use it, and learn about design patterns when using Update in applications.

Introducing Workflow Update

Fundamentally, Update is about sending a message to a Workflow, waiting until the message handling is complete, and receiving a result or error in response. The Update handler in your Workflow code can do anything that normal Workflow code can do. That’s an extremely useful operation when building applications, but the existing messages, Query and Signal, did not provide it on their own. Let’s recap why that is:

  • A Query returns a result, but an Update handler can additionally wait for any amount of time or for an arbitrary condition, execute Activities and Child Workflows, and mutate Workflow state; a Query can do none of those things.
  • A Signal handler can do all of those things but can’t communicate anything back to the caller.

Since getting a result back after your Workflow has done some async work is such a common need, there were workarounds — for example sending a Signal and then polling repeatedly with a Query, or using an Activity to get the result back to your client code by some other means — but these feel very much like workarounds, and can be inefficient, complex, and error prone. Contrast those with executing an Update in Python:

update_result = await workflow_handle.execute_update(MyWorkflow.my_update, update_input)

Update opens the path for many low-latency use cases (and combines well with triggering a Local Activity). Furthermore, your Workflow code can use an Update Validator to reject an invalid Update as soon as it's received, so that it doesn’t leave any trace in Workflow history.

How Update works

As we’ll see below, a common pattern is to use an Update to modify the Workflow in some way that involves executing an Activity. The client waits for the Activity to finish, and receives a result. Let’s look at what happens behind the scenes when we do this. For comparison, the sequence diagram also shows the Signal + poll-for-result-with-Queries technique.

Use Signal+Queries to execute an Activity update

Use Update to execute an Activity update-blog-update-2

Note that for clarity the Activity worker is not depicted in these diagrams.

An important point here is that Signal is asynchronous — all you get is an immediate ACK from the server — whereas with Update your Worker must be online, and you won’t get any response at all until the Workflow has received and responded to the Update. So a Signal on its own will have much lower latency than an Update. However, to track progress and/or get data back from your Workflow, Update has lower latency than sending a Signal and then polling with Queries, as this timing experiment showed:

latency

The sequence diagrams above give an overview of how Update works behind the scenes, but there’s much more to say about how Update was built. For example, if you use an Update Validator to reject an Update then it leaves no trace in history. But if you’re already familiar with Temporal, you’ll know that tasks dispatched to your Workers are typically backed by history events. So we had to extend Temporal internals to allow deferring the first write to history for an Update until after it’s been delivered to the Workflow.

Workflow Update design patterns

In this section we’re going to see what basic Update usage looks like, and look at some high-level design patterns that describe real-world Update usage. To make this more concrete, here are some scenarios in which a real-world application could use Update:

Role of Workflow Role of Update Pattern
A shopping cart in an e-commerce application Add an item to the cart, wait for availability and pricing information to be fetched from the inventory DB, and return the new cart contents and subtotal. 1
A conversation with an LLM Send a chat message and wait for the response. 1
A generic task queue - Add an item to an in-Workflow queue data structure and receive the new queue size.
- Enqueue a task, process it via an Activity, and receive the result or error when it's done.
- Dequeue a task for processing.
0, 1, 2
A distributed lock service, with multiple clients contending for the lock Acquire the lock. 2
A multi-stage financial transaction Wait for an intermediate result from one of the initial stages, leaving the workflow running in the background. (Take a look at the “early return” pattern in the Public Preview feature “Update With Start” for a latency-optimized version.) 2
An AI agent Start the agent and wait for a short while until it's ready for users to interact with it. 2

Pattern 0: mutate local Workflow state and receive a result

This is the simplest form of Update: it makes use of Update’s ability to mutate Workflow state. For example, your Workflow might be maintaining a queue of some sort in a Workflow-local data structure, and you use an Update to add an item to the queue and return the new queue size. Here’s an example in Java:

The caller executes the update by calling the Update method on a Workflow stub:

MyWorkflow workflow = client.newWorkflowStub(MyWorkflow.class, workflowOptions);
...
int queueSize = workflow.addItem("my-item");

And the Workflow uses an annotation to mark the Update handler method:

@WorkflowInterface
public interface MyWorkflow {
  @UpdateMethod
  int addItem(String item);
  ...
}

public static class MyWorkflowImpl implements MyWorkflow {
  private final List<String> queue = new ArrayList<>(10);

  @Override
  public int addItem(String item) {
    queue.add(item);
    return queue.size();
  }
  ...
}

You can see examples of basic Update usage (including the optional Update validator feature) in all languages in the docs and samples:

Docs: Go | Java | PHP | Python | Typescript | .NET

Samples: Go | Java | PHP | Python | Typescript | .NET

Pattern 1: execute an Activity or Child Workflow and receive a result

This pattern involves sending an Update that executes an Activity, mutating Workflow state in some way, and waiting for the result (the sequence diagram above is an example of this.) To see a working code example, look at the “assign nodes to job” and “delete jobs” Updates in the Safe Message Handlers sample:

Workflow with Update handler: Go | Java | Python | Typescript | .NET

Client code calling the Update: Go | Java | Python | Typescript | .NET

Here’s how Update is used in the Python sample. The client code is straightforward; it uses the language’s native concurrency APIs to execute multiple updates concurrently:

for i in range(6):
    deletion_updates.append(
        wf.execute_update(
            ClusterManagerWorkflow.delete_job,
            ClusterManagerDeleteJobInput(job_name=f"task-{i}"),
        )
    )
await asyncio.gather(*deletion_updates)

The Update handler in the Workflow waits until the Workflow is in a certain state and then executes an Activity. Although the client sent multiple concurrent Updates, for this Workflow allowing their handler executions to interleave would be a bug, which the Workflow avoids by using one of the standard mutexes offered by this SDK language:

  # Even though it returns nothing, this is an update because the client may want to track it, for example
  # to wait for nodes to be unassigned before reassigning them.
  @workflow.update
  async def delete_job(self, input: ClusterManagerDeleteJobInput) -> None:
    async with self.nodes_lock:
        nodes_to_unassign = [
            k for k, v in self.state.nodes.items() if v == input.job_name
        ]
        # This await would be dangerous without nodes_lock because it yields control and allows interleaving
        # with assign_nodes_to_job and perform_health_checks, which all touch self.state.nodes.
        await workflow.execute_activity(
            unassign_nodes_for_job,
            UnassignNodesForJobInput(
                nodes=nodes_to_unassign, job_name=input.job_name
            ),
            start_to_close_timeout=timedelta(seconds=10),
        )
        ...

Pattern 2: wait for the Workflow to reach a certain state

In this pattern, all the “real work” is done in the main Workflow run method, with the Update handler using the SDK’s “wait-for-condition” API to wait until a certain stage is reached.

A key point here is that you should write the main Workflow run method so that it will work whether or not the Client sends an Update: the Update handler is just observing, as opposed to doing. But remember: Query handlers can’t do this! They can only return a value computed from local Workflow state, without blocking.

As an example, look at the way that the Cluster Manager exposes an Update for the Client to wait until the Cluster is ready to be used:

@workflow.update
async def wait_until_cluster_started(self) -> ClusterManagerState:
    await workflow.wait_condition(lambda: self.state.cluster_started)
    return self.state

As you can see, the Update handler itself is very simple; all the real work is done in the main Workflow run method:

@workflow.run
async def run(self, input: ClusterManagerInput) -> ClusterManagerResult:
    cluster_state = await workflow.execute_activity(
        start_cluster, schedule_to_close_timeout=timedelta(seconds=10)
    )
    self.state.nodes = {k: None for k in cluster_state.node_ids}
    self.state.cluster_started = True
    workflow.logger.info("Cluster started")

    while True:
        # The cluster is now ready to accept the delete_job
        # update discussed above.
        ...

Workflow with Update handler: Go | Java | Python | Typescript | .NET

Client code calling the Update: Go | Java | Python | Typescript | .NET

An interesting variant of this pattern is to use Update as a broadcast, in which multiple clients all pass the same Update ID and wait on the same Update: from the Workflow’s point of view it’s a single Update, and all “subscribing” clients will be “notified” when the desired Workflow state is attained.

Long-running Updates

In the examples we’ve looked at so far, the client has sent the Update and waited for the result in one go. But the SDKs offer a second way to invoke an Update. Using this second way, as soon as your Workflow accepts the Update, the call returns an Update handle object that you can use subsequently to fetch the result or error. Every Update has an associated Update ID that the caller can set, and any client can create an Update handle for a pending Update if they have the Update ID. Here’s a comparison between the two ways of invoking an Update:

Wait for completed Update result in one go

This is called “execute update” in SDKs other than Go and Java. In Go, you use UpdateWorkflow, passing Completed as the waitForStage, and in Java you call the Update method on the Workflow stub.

update 3

Wait for accepted Update handle and subsequently fetch the result

This is called “start update” in SDKs other than Go. In Go, you use UpdateWorkflow, passing Accepted as the waitForStage.

upd 4

Behind the scenes, the SDK client will always poll repeatedly until the desired stage is reached. And as long as your worker is online, waiting for your Update to be accepted and return a handle should be fast. But how long it takes for your Update to complete thereafter depends on what you are doing in the Update handler, and there are no constraints on Update handler code: you can design long-running Updates that take seconds, or months, to complete.

Get started building with Update

Update is ready for production use. In fact, we’re using Update ourselves, both in Temporal Cloud and in the server core (e.g. here and here). If you’d like to read more about it, you could look at the introduction to message passing, the Robust Message Handlers blog post, and the Update docs for your language:

Go | Java | PHP | Python | Typescript | .NET.

Please join us in Temporal Slack with any questions.