Today I realized that I had no idea when it was appropriate to use hook_update_N
or hook_post_update_NAME
. I have ideas, but I was not sure about the concrete reasons. My gut instinct is that hook_update_N
is for schema and other database-related changes. And, then, hook_post_update_NAME
is for configuration changes and clearing caches once the schema changes have been finished.
But is that true? Does Drupal core follow this pattern? Finding examples in Drupal core was also tricky; I had to switch back to the 8.9.x
branch to get a good collection of references.
- The Views module is a mixed bag. There are configuration updates and changes to View entities in both update hooks, with no real differentiation or understanding of why.
- The Workspaces module has a decent separation. It uses
hook_update_N
to modify field definitions, which are schema. Then it useshook_post_update_NAME
to perform some data manipulation... but it also manipulates the database schema here.
Maybe part of the reasoning is to handle different stages. I know I have experienced some oddities when executing various hook_update_N
in sequential order – purposely making two update hooks at once, so they run after each other. But that was mainly in Drupal 7 days.
The "Improve documentation for post update hooks and update hooks to clarify distinction" issue on Drupal.org. jibran posted this as his guide:
- post-update hooks run right after update_N hooks.
- update_N hooks are used when config or DB schema needs CRUD
- post-update hooks are used to update config entities, can be used for CRUD content entities but not recommended as site config might need and update see... [other issue.]
https://www.drupal.org/node/3034742 is a very good resource to understand the recommended way to install/update/uninstall entity types/fields in the update hooks.
The other issue is a length detail on some quirks when updating configuration during update hooks: https://www.drupal.org/project/drupal/issues/2901418. The last link he refers to is the change record for when automatic entity schema updates were removed. Unfortunately, it does not clear up the decision for me. Just "I need to do X first, and Y second. So I will use the two different hooks."
One of the more difficult problems is that the Update API documentation does not mention hook_post_update_NAME
. HINT! GREAT CONTRIBUTION AREA IF YOU ARE A DOCUMENTATION KIND OF PERSON!!! Please, steal from this post; I do not have the bandwidth to convert this into a decent guide for developer documentation.
Let's get digging
Okay, let's dig and see if we can sort this out. When you visit /update.php
, Drupal uses a different kernel to handle updates. It uses \Drupal\Core\Update\UpdateKernel
. This HttpKernel
implementation always forces the container to be rebuilt for each request. It also defines the NullBackend
cache service and decorates the cache factory service. It ensures all cache bins are instances of \Drupal\Core\Update\UpdateBackend,
which extends the NullBackend.
It wraps the regular cache backend and prevents reads, but it will purge the wrapped cache bin on deletion.
The \Symfony\Component\HttpKernel\HttpKernelInterface::handle
method is overridden in the UpdateKernel
to have minimal handling of the request. It does a basic bootstrap and then invokes the \Drupal\system\Controller\DbUpdateController::handle
controller directly, returning its responses.
This callback is where updates a processed in a batch. To me, this is where the special differentiations should surface. So far, nothing about hook_update_N
or hook_post_update_NAME
has occurred.
The magic is in \Drupal\system\Controller\DbUpdateController::triggerBatch
.
- All installed module updates (
hook_update_N
) are discovered. - Update dependencies are resolved (yes, you can say one update hook is dependent on another with
hook_update_dependencies
. I used this a lot when I worked at a company that provided a SaaS on Drupal. That's a whole other topic if you are curious.) - Each
hook_update_N
implementation is placed into the batch – ordered by module weight, its name alphanumerically, dependencies, and itsN
value. - Then the post-update hooks are discovered.
- If there are post-update hooks,
drupal_flush_all_caches
is invoked to reset all of Drupal's caches. - Each post-update hook is then added to the batch
Is that it? The difference is that Drupal's caches are flushed before hook_post_update_NAME
is invoked? We already are not getting cached values back, but defining a hook_post_update_NAME – empty or with content ensures all of Drupal's caches are flushed (which I'm not a fan of dumping all your caches in a deployment. I know it's standard practice, but I prefer preserving caches and crafting appropriate cache invalidations when needed for deployments.)
Let's check out Drush. Almost no one uses /update.php
but instead runs drush updb
or drush deploy
(which invokes updb
.) Drush allows commands to specify the kernel when executing, which can be drupal
(standard), update
, or installer
. The database update command leverages the update
kernel, which uses \Drush\Drupal\UpdateKernel
that extends the UpdateKernel
. That means the service container still has the cache factory decorated, and we are getting NULL
reads from the cache but still deletions when needed.
The logic all occurs in \Drush\Commands\core\UpdateDBCommands::updateBatch
.
It also builds a batch in the similar fashion
- Get the list of update hooks
- Resolve the dependencies and sort their order
- Push into the batch
- Detect if there are post-updates
- If there are, run a cache rebuild AND if there were update hooks previously executed
- Push each post-update hook into the batch
- Execute the batch
So Drush is a little different. The cache is only cleared if there were hook_update_N
also processed. This breaks assumptions in Drupal core as documented here: [policy, docs] Use post updates for empty 'clear the cache' updates. An empty hook_post_update_NAME
should be enough to flush Drupal's caches without a previous hook_update_N
.
So what do we use?!?!
I have absolutely no idea. In the end, they're the same. It just gives us a chance to take a layered approach. Do you want to run something after all contributed projects ran their schema changes and don't want to run dependencies hook? Go ahead, use hook_post_update_N
.
One thing to note: the execution order of hook_post_update_NAME
is not as deterministic as hook_update_N
.
hook_update_N
: discovered in\Drupal\Core\Update\UpdateHookRegistry::getAvailableUpdates
and sorted by module weight and the numeric value ofN
hook_post_update_NAME
: discovered in\Drupal\Core\Update\UpdateRegistry::getAvailableUpdateFunctions
and then they are just sorted with thesort
function, so alphabetically. That means a module with two post-update hooks will not execute in order of their definition.
Well. That was hopefully an exciting ride. Because my little question to myself quickly escalated into this blog post.
Want more? Sign up for my weekly newsletter