Skip to main content

hook_update_N or hook_post_update_NAME

Published on

Today I realized that I had no idea when it was appropriate to use hook_update_N or hook_post_update_NAME. I have ideas, but I was not sure about the concrete reasons. My gut instinct is that hook_update_N is for schema and other database-related changes. And, then, hook_post_update_NAME is for configuration changes and clearing caches once the schema changes have been finished.

But is that true? Does Drupal core follow this pattern? Finding examples in Drupal core was also tricky; I had to switch back to the 8.9.x branch to get a good collection of references.

  • The Views module is a mixed bag. There are configuration updates and changes to View entities in both update hooks, with no real differentiation or understanding of why.
  • The Workspaces module has a decent separation. It uses hook_update_N to modify field definitions, which are schema. Then it uses hook_post_update_NAME to perform some data manipulation... but it also manipulates the database schema here. 

Maybe part of the reasoning is to handle different stages. I know I have experienced some oddities when executing various hook_update_N in sequential order – purposely making two update hooks at once, so they run after each other. But that was mainly in Drupal 7 days.

The "Improve documentation for post update hooks and update hooks to clarify distinction" issue on Drupal.org. jibran posted this as his guide:

  1. post-update hooks run right after update_N hooks.
  2. update_N hooks are used when config or DB schema needs CRUD
  3. post-update hooks are used to update config entities, can be used for CRUD content entities but not recommended as site config might need and update see... [other issue.]

https://www.drupal.org/node/3034742 is a very good resource to understand the recommended way to install/update/uninstall entity types/fields in the update hooks.

The other issue is a length detail on some quirks when updating configuration during update hooks: https://www.drupal.org/project/drupal/issues/2901418. The last link he refers to is the change record for when automatic entity schema updates were removed. Unfortunately, it does not clear up the decision for me. Just "I need to do X first, and Y second. So I will use the two different hooks."

One of the more difficult problems is that the Update API documentation does not mention hook_post_update_NAMEHINT! GREAT CONTRIBUTION AREA IF YOU ARE A DOCUMENTATION KIND OF PERSON!!! Please, steal from this post; I do not have the bandwidth to convert this into a decent guide for developer documentation.

Let's get digging

Okay, let's dig and see if we can sort this out. When you visit /update.php, Drupal uses a different kernel to handle updates. It uses \Drupal\Core\Update\UpdateKernel. This HttpKernel implementation always forces the container to be rebuilt for each request. It also defines the NullBackend cache service and decorates the cache factory service. It ensures all cache bins are instances of \Drupal\Core\Update\UpdateBackend, which extends the NullBackend. It wraps the regular cache backend and prevents reads, but it will purge the wrapped cache bin on deletion.

The \Symfony\Component\HttpKernel\HttpKernelInterface::handle method is overridden in the UpdateKernel to have minimal handling of the request. It does a basic bootstrap and then invokes the \Drupal\system\Controller\DbUpdateController::handle controller directly, returning its responses.

This callback is where updates a processed in a batch. To me, this is where the special differentiations should surface. So far, nothing about hook_update_N or hook_post_update_NAME has occurred.

The magic is in \Drupal\system\Controller\DbUpdateController::triggerBatch.

  1. All installed module updates (hook_update_N) are discovered.
  2. Update dependencies are resolved (yes, you can say one update hook is dependent on another with hook_update_dependencies. I used this a lot when I worked at a company that provided a SaaS on Drupal. That's a whole other topic if you are curious.)
  3. Each hook_update_N implementation is placed into the batch – ordered by module weight, its name alphanumerically, dependencies, and its N value.
  4. Then the post-update hooks are discovered.
  5. If there are post-update hooks, drupal_flush_all_caches is invoked to reset all of Drupal's caches. 
  6. Each post-update hook is then added to the batch

Is that it? The difference is that Drupal's caches are flushed before hook_post_update_NAME is invoked? We already are not getting cached values back, but defining a hook_post_update_NAME – empty or with content ensures all of Drupal's caches are flushed (which I'm not a fan of dumping all your caches in a deployment. I know it's standard practice, but I prefer preserving caches and crafting appropriate cache invalidations when needed for deployments.)

Let's check out Drush. Almost no one uses /update.php but instead runs drush updb or drush deploy (which invokes updb.) Drush allows commands to specify the kernel when executing, which can be drupal (standard), update, or installer. The database update command leverages the update kernel, which uses \Drush\Drupal\UpdateKernel that extends the UpdateKernel. That means the service container still has the cache factory decorated, and we are getting NULL reads from the cache but still deletions when needed.

 The logic all occurs in \Drush\Commands\core\UpdateDBCommands::updateBatch.

It also builds a batch in the similar fashion

  1. Get the list of update hooks
  2. Resolve the dependencies and sort their order
  3. Push into the batch
  4. Detect if there are post-updates
  5. If there are, run a cache rebuild AND if there were update hooks previously executed
  6. Push each post-update hook into the batch
  7. Execute the batch

So Drush is a little different. The cache is only cleared if there were hook_update_N also processed. This breaks assumptions in Drupal core as documented here: [policy, docs] Use post updates for empty 'clear the cache' updates. An empty hook_post_update_NAME should be enough to flush Drupal's caches without a previous hook_update_N.

So what do we use?!?!

I have absolutely no idea. In the end, they're the same. It just gives us a chance to take a layered approach. Do you want to run something after all contributed projects ran their schema changes and don't want to run dependencies hook? Go ahead, use hook_post_update_N.

One thing to note: the execution order of hook_post_update_NAME is not as deterministic as hook_update_N.

  • hook_update_N: discovered in \Drupal\Core\Update\UpdateHookRegistry::getAvailableUpdates and sorted by module weight and the numeric value of N
  • hook_post_update_NAME: discovered in \Drupal\Core\Update\UpdateRegistry::getAvailableUpdateFunctions and then they are just sorted with the sort function, so alphabetically. That means a module with two post-update hooks will not execute in order of their definition.

Well. That was hopefully an exciting ride. Because my little question to myself quickly escalated into this blog post.

#