Today I realized that I had no idea when it was appropriate to use
hook_post_update_NAME. I have ideas, but I was not sure about the concrete reasons. My gut instinct is that
hook_update_N is for schema and other database-related changes. And, then,
hook_post_update_NAME is for configuration changes and clearing caches once the schema changes have been finished.
But is that true? Does Drupal core follow this pattern? Finding examples in Drupal core was also tricky; I had to switch back to the
8.9.x branch to get a good collection of references.
- The Views module is a mixed bag. There are configuration updates and changes to View entities in both update hooks, with no real differentiation or understanding of why.
- The Workspaces module has a decent separation. It uses
hook_update_Nto modify field definitions, which are schema. Then it uses
hook_post_update_NAMEto perform some data manipulation... but it also manipulates the database schema here.
Maybe part of the reasoning is to handle different stages. I know I have experienced some oddities when executing various
hook_update_N in sequential order – purposely making two update hooks at once, so they run after each other. But that was mainly in Drupal 7 days.
The "Improve documentation for post update hooks and update hooks to clarify distinction" issue on Drupal.org. jibran posted this as his guide:
- post-update hooks run right after update_N hooks.
- update_N hooks are used when config or DB schema needs CRUD
- post-update hooks are used to update config entities, can be used for CRUD content entities but not recommended as site config might need and update see... [other issue.]
https://www.drupal.org/node/3034742 is a very good resource to understand the recommended way to install/update/uninstall entity types/fields in the update hooks.
The other issue is a length detail on some quirks when updating configuration during update hooks: https://www.drupal.org/project/drupal/issues/2901418. The last link he refers to is the change record for when automatic entity schema updates were removed. Unfortunately, it does not clear up the decision for me. Just "I need to do X first, and Y second. So I will use the two different hooks."
One of the more difficult problems is that the Update API documentation does not mention
hook_post_update_NAME. HINT! GREAT CONTRIBUTION AREA IF YOU ARE A DOCUMENTATION KIND OF PERSON!!! Please, steal from this post; I do not have the bandwidth to convert this into a decent guide for developer documentation.
Let's get digging
Okay, let's dig and see if we can sort this out. When you visit
/update.php, Drupal uses a different kernel to handle updates. It uses
HttpKernel implementation always forces the container to be rebuilt for each request. It also defines the
NullBackend cache service and decorates the cache factory service. It ensures all cache bins are instances of
\Drupal\Core\Update\UpdateBackend, which extends the
NullBackend. It wraps the regular cache backend and prevents reads, but it will purge the wrapped cache bin on deletion.
\Symfony\Component\HttpKernel\HttpKernelInterface::handle method is overridden in the
UpdateKernel to have minimal handling of the request. It does a basic bootstrap and then invokes the
\Drupal\system\Controller\DbUpdateController::handle controller directly, returning its responses.
This callback is where updates a processed in a batch. To me, this is where the special differentiations should surface. So far, nothing about
hook_post_update_NAME has occurred.
The magic is in
- All installed module updates (
hook_update_N) are discovered.
- Update dependencies are resolved (yes, you can say one update hook is dependent on another with
hook_update_dependencies. I used this a lot when I worked at a company that provided a SaaS on Drupal. That's a whole other topic if you are curious.)
hook_update_Nimplementation is placed into the batch – ordered by module weight, its name alphanumerically, dependencies, and its
- Then the post-update hooks are discovered.
- If there are post-update hooks,
drupal_flush_all_cachesis invoked to reset all of Drupal's caches.
- Each post-update hook is then added to the batch
Is that it? The difference is that Drupal's caches are flushed before
hook_post_update_NAME is invoked? We already are not getting cached values back, but defining a hook_post_update_NAME – empty or with content ensures all of Drupal's caches are flushed (which I'm not a fan of dumping all your caches in a deployment. I know it's standard practice, but I prefer preserving caches and crafting appropriate cache invalidations when needed for deployments.)
Let's check out Drush. Almost no one uses
/update.php but instead runs
drush updb or
drush deploy (which invokes
updb.) Drush allows commands to specify the kernel when executing, which can be
installer. The database update command leverages the
update kernel, which uses
\Drush\Drupal\UpdateKernel that extends the
UpdateKernel. That means the service container still has the cache factory decorated, and we are getting
NULL reads from the cache but still deletions when needed.
The logic all occurs in
It also builds a batch in the similar fashion
- Get the list of update hooks
- Resolve the dependencies and sort their order
- Push into the batch
- Detect if there are post-updates
- If there are, run a cache rebuild AND if there were update hooks previously executed
- Push each post-update hook into the batch
- Execute the batch
So Drush is a little different. The cache is only cleared if there were
hook_update_N also processed. This breaks assumptions in Drupal core as documented here: [policy, docs] Use post updates for empty 'clear the cache' updates. An empty
hook_post_update_NAME should be enough to flush Drupal's caches without a previous
So what do we use?!?!
I have absolutely no idea. In the end, they're the same. It just gives us a chance to take a layered approach. Do you want to run something after all contributed projects ran their schema changes and don't want to run dependencies hook? Go ahead, use
One thing to note: the execution order of
hook_post_update_NAME is not as deterministic as
hook_update_N: discovered in
\Drupal\Core\Update\UpdateHookRegistry::getAvailableUpdatesand sorted by module weight and the numeric value of
hook_post_update_NAME: discovered in
\Drupal\Core\Update\UpdateRegistry::getAvailableUpdateFunctionsand then they are just sorted with the
sortfunction, so alphabetically. That means a module with two post-update hooks will not execute in order of their definition.
Well. That was hopefully an exciting ride. Because my little question to myself quickly escalated into this blog post.