Skip to main content

Can we use concurrency to speed up streamed BigPipe responses in Drupal?

Published on

I have been reading The Pragmatic Programmer and just finished the Concurrency chapter. At the same time, I found Nick Gavalas's blog post "parallelizing" php and keeping it simple, which talks about his time at Facebook when they developed the concept BigPipe and pagelets. In his blog, he explains how they parallelized PHP in the only way possible: making multiple requests to the web server. Instead of the main thread processing each pagelet (we call them placeholders in Drupal), they'd make an HTTP request to their application to run that same rendering in a different process. Then they would take the results of that HTTP request and stream it using the BigPipe pattern. The timing was perfect and gave me an idea.

Could we make Drupal's response time even faster by adding concurrency to streamed BigPipe responses using new PHP language features? PHP 8.1 introduced Fibers as a form of lightweight concurrency. And next thing you know, I dove right into experimentation and trying to learn everything I could about Fibers.

I turned on the big_pipe_test test module for my experiments and used its /big_pipe_test route. This test route has a handful of placeholders that are streamed using BigPipe. There are seven test cases added to the page. Two of which throw exceptions. It is a perfect sample for testing experiments to the BigPipe response process.

How does BigPipe even work?

Without diving into a lot of Drupal's render pipeline internals with placeholders and how BigPipe works, here are the important parts. If you are curious to know the full details, check out one of Wim Leer's talks about BigPipe

  • HtmlResponseBigPipeSubscriber subscribes to replace the normal HtmlResponse response object with a BigPipeResponse response object.
  • The BigPipeResponse response object sets the appropriate headers so the client understands it is a streamed response.
  • The BigPipeResponse response object delegates to the BigPipe render service to chunk and stream the response content.
  • BigPipe sends the response content that includes placeholders. Placeholder content is then rendered and streamed.
  • JavaScript reads the incoming streamed content and places it into the appropriate placeholder.

To minimize hacks, I tried to keep my changes within \Drupal\big_pipe\Render\BigPipe::sendContent around calls to \Drupal\big_pipe\Render\BigPipe::sendPlaceholders. The code currently looks like the following. The pre-body (content with placeholders) is sent. Then the placeholders are rendered and streamed. Then the post-body (</body>) is sent.

$this->sendPreBody($pre_body, $nojs_placeholders, $cumulative_assets);
$this->sendPlaceholders($placeholders, $placeholder_order, $cumulative_assets);
$this->sendPostBody($post_body);

The trick is to add concurrency to sendPlaceholders so that we can render multiple placeholders simultaneously – which also removes the need to care about their rendering order ($placeholder_order.)

Using Fibers to render placeholders with BigPipe

I created a Fiber for each placeholder that needed to be rendered. There is a lot of code in sendPlaceholders. So I worked on calling it multiple times to render one placeholder, controlled by the values in $placeholder_order. I didn't know if this would work, but it was the best starting point.

Here is my adjusted code for invoking sendPlaceholders. Fibers are constructed with a callback to be invoked when the Fiber is started. I create a Fiber with a callback to invoke sendPlaceholders for just one placeholder ID.

$this->sendPreBody($pre_body, $nojs_placeholders, $cumulative_assets);

/** @var \Fiber[] $fibers */
$fibers = [];
foreach ($placeholderOrder as $placeholderId) {
    $fibers[] = new \Fiber(function () use (
        $placeholders, 
        $placeholderId, 
        $cumulative_assets
    ) {
        $this->sendPlaceholders($placeholders, [$placeholderId], $cumulative_assets);
    });
}

$this->sendPostBody($post_body);

I then perform a loop on the Fibers to start or terminate them once completed.

while (count($fibers) > 0) {
    foreach ($fibers as $key => $fiber) {
        try {
            if (!$fiber->isStarted()) {
                $fiber->start();
            }
            else if ($fiber->isSuspended()) {
                $fiber->resume();
            }
            else if ($fiber->isTerminated()) {
                unset($fibers[$key]);
            }
        } 
        catch (\Throwable $e) {
            unset($fibers[$key]);
        }
    }
}

I did some profiling using Blackfire, but the results were minimal, which is good. It didn't make using BigPipe slower. But I had no idea if it was truly working. So I decided to turn on Xdebug and put breakpoints on the two unset lines. And then I was pleasantly surprised. I fully expected the value of the $key variable to be sequential, as we'd need more refactoring to achieve true concurrency. But I was wrong. The first key returned was for 5. And then 6. The two placeholder test cases throw exceptions.

In the below screenshot of PhpStorm with Xdebug running, you can see $fibers is an array of eight Fiber instances. The $key value is 5 and is the first Fiber to be removed, meaning it finished processing before any other Fibers.

I guess it did work!

Takeaways

I ran out of time for more thorough testing or usable code for others to test, but this seems like an extremely promising experiment.

This is going to be a minor improvement for most sites. But it could be for those with dynamic placeholdered content that is paused or waiting on I/O. 

I was on a project with a dashboard of 5 blocks with remote data from a subscription management system. We leveraged BigPipe to stream the content instead of AJAX requests and client-side rendering from a JavaScript framework. Pure Drupal server-side rendering. The end-user experience was the same, and we didn't need any extra development effort. I want to build something similar and see how quickly the results are streamed onto the page using Fibers and BigPipe.

Recreating "parallelization" with multiple requests

I want to implement the approach in Nick Gavalas's blog post. Where BigPipe provides a controller for rendering placeholdered data, and the main BigPipe requests batches multiple async HTTP requests to handle the rendering. Guzzle supports concurrent requests. So the difficulty would be in the placeholder rendering controller that would return the placeholder's HTML for the main request thread.

 

Note: Links to products may be Amazon or other affiliate links, which means I will earn a commission if you click through and buy something.