State-backed components
Before working with state-backed components, you should be familiar with the basic Components APIs.
StateBackedComponents are a specialized type of Dagster component designed to handle cases where your Dagster definitions depend on information from external systems or tools, rather than purely the code and configuration files in your repository.
What is the "state" in state-backed components?
Some integrations require doing non-trivial work to turn their configuration into actual definition objects. For example:
- The
FivetranAccountComponentneeds to fetch information about connectors and connection tables from the Fivetran API. - The
DbtProjectComponentneeds to compile amanifest.jsonfile before definitions can be built.
In these cases, the "state" is the information that is collected and used to build the definitions for the component.
State-backed components simplify the process of managing this external state by providing a structured framework for recomputing, storing, and loading the state in a consistent way, to avoid having to do expensive computations every time a dagster process starts up.
How state-backed components work
State-backed components extend the base Component class, but break up the process of building the definitions into two steps:
write_state_to_path(): The component does whatever work is necessary to compute the state (querying an API, running a script, etc.), and stores it to a file on local disk.build_defs_from_state(): The component uses the state that was computed in the previous step to build the definitions.
Dagster system code controls the lifecycle of the state, ensuring that the write_state_to_path() method is called only at specific and limited points in time and that state is persisted in an accessible location for the build_defs_from_state() method to use when the component is loaded in other processes.
The specifics of this process vary depending on the state management strategy you configure, but regardless of the strategy chosen, write_state_to_path() will be called at most once per code location load.
By default, when you run dagster dev or use dg CLI commands (like dg list defs), state-backed components automatically refresh their state. This provides convenience during development so you always see the latest metadata from external systems. You can disable this behavior by setting refresh_if_dev to False in your component configuration.
Choosing a state management strategy
State-backed components support three different strategies for managing state, each suited to different deployment patterns:
| Strategy | Storage Location | Best For |
|---|---|---|
| Local Filesystem | .local_defs_state/ directory | Docker/PEX deployments where state is updated during image builds |
| Versioned State Storage | Cloud storage (S3, GCS, etc.) | Deployments where you want to update state without rebuilding images |
| Code Server Snapshots | In-memory | Legacy compatibility only (not recommended) |
Local Filesystem
Best for:
- Docker-based deployments
- CI/CD pipelines that build Docker or PEX images
- Simple, predictable deployments
How it works:
- You run
dg utils refresh-defs-stateduring your build process - State is stored in a
.local_defs_statedirectory in your project - The directory is automatically
.gitignored - State becomes part of your deployment artifact (Docker or PEX image)
Directory structure:
When you refresh state, the system creates this structure:
my_project/
└── defs/
└── .local_defs_state/
├── .gitignore (auto-created)
└── <defs_state_key>/
└── state
State files can be large and change frequently based on external system metadata. Committing them would pollute your Git history and create merge conflicts. Instead, state is refreshed during your build process and included in the deployment artifact.
Versioned State Storage
Best for:
- Deployments where you want to update state without rebuilding Docker or PEX images
- Environments requiring state version history
How it works:
- State is stored in cloud storage (S3, GCS, etc.) with UUID version identifiers
- Multiple versions can exist simultaneously
- All runs and definitions point to a consistent version until the code location reloads
- Requires configuring a state storage backend in your Dagster instance (see Configuring versioned state storage for more information)
Benefits:
This strategy allows you to update state in production without rebuilding Docker images. For example, you could write a Dagster job that:
- Executes and updates component state
- Reloads the code location to pick up the latest state version