Skip to main content

Using ReactPHP to consume data from an HTTP API

Published on

This is the first of two blogs detailing how to build a middleware leveraging ReactPHP to consume data from an API, normalize it, and then push that data into Drupal through JSON:API. In this example will be grabbing data from the PokéAPI. In the second blog post, we will take the data and create Pokémon nodes on a Drupal site.

The purpose of the middleware is to avoid putting this logic into the Drupal codebase. Drupal has an amazingly robust Migration API for consuming data and processing data to create content. Every Drupal site I have ever worked on – since Drupal 8's beta days – has used the migration system to import content through CSVs or JSON from remote APIs. However. That means you have to be a Drupal developer to understand it, or go through the learning curve if you are not. A generic middleware means that any team can control and maintain the code. All you need to know is some PHP.

Why ReactPHP and not XYZ?

But, why ReactPHP? ReactPHP provides a low-level library for event-driven applications based around its event-loop library. If you're familiar with Rust, it is like the Tokio runtime. Node.js has an event loop runtime built-in

Consuming data from a remote API is time-consuming. Especially if the process is synchronous:

  • Make a request to the API
  • Wait for the request to finish
  • Parse the response
  • Handle the response data
  • Repeat

There are plenty of HTTP requests that are sent and responses handled. Many times you enter the API at a collection of resources and then must fetch additional information about those resources. With ReactPHP we can process each request in a non-blocking fashion, speeding up the entire process.

What does the scraper do?

The scraper will:

  • Go through the paginated /pokemon collection and collect links to each Pokemon resource
  • Fetch the Pokemon resource
  • Fetch the Pokemon's species resource
  • Normalize the data (we don't want all of the raw data)
  • Write the normalized data to a JSON file

We could fetch the data and immediately send it to the Drupal API. But that would create one huge process that could fail if either of the APIs go down. It also puts all concerns into one process. In this blog we are dumping the normalized data into a JSON file. What if we streamed it to Apache Kafka and our data pusher streamed it from there?

Let's build it!

I was pretty impressed to see that the entire script is only about 100 lines of code, one-third of that being code to normalize the data about the Pokemon. One of the things to keep in mind when building with ReactPHP is that you will find yourself using a lot of closures for callbacks to be executed by the event loop.

The Pokemon resource on the PokeAPI returns a collection of Pokemon resource identifiers. What do I mean by that? It's an array of objects specifying the resource name and a URL for retrieving it.

{
    "count": 1050,
    "next": "https://pokeapi.co/api/v2/pokemon/?offset=3&limit=3",
    "previous": null,
    "results": [
        {
            "name": "bulbasaur",
            "url": "https://pokeapi.co/api/v2/pokemon/1/"
        },
        {
            "name": "ivysaur",
            "url": "https://pokeapi.co/api/v2/pokemon/2/"
        },
        {
            "name": "venusaur",
            "url": "https://pokeapi.co/api/v2/pokemon/3/"
        }
    ]
}

We will have to iterate through each page of the collection and fetch each Pokemon resource.

Use a class to wrangle your closures

I first started writing this in a single file and found myself writing some ridiculous closures that I had to keep adding use statements for, to have access to the loop and HTTP client, amongst any other variables.

$loop->futureTick(static function () use ($loop, $client, $url) {
    // Performs some logic on next loop tick.
});

It is much easier to use a class. Your closures can then access the class properties and things are much more manageable.

public function tickOperation()
{
    $this->loop->futureTick(function () {
        $this->loop->addTimer(2, function () {
            $this->client->get($this->url)->then(function (ResponseInterface $response) {
                var_export((string) $response->getBody());
            });
        });
    });
}

No more trying to manage passing things around. They're just accessible in the class.

In this example, we will create a PokemonMaster class which will contain all of our event loop logic. It will enter the PokeAPI at https://pokeapi.co/api/v2/pokemon/ and go through each page until all Pokemon have been processed.

Create the project and install dependencies

Before we can do anything, we need to set up the project with Composer and get our dependencies. We will need the following packages

  • react/event-loop: This is the main library. Our other dependencies require it, but I like to be explicit on my dependencies.
  • react/http: This library allows for asynchronous concurrent requests
  • clue/ndjson-react: Since we're working in an asynchronous environment and processing a lot of data, we have to stream our data to the artifact file. The NDJSON format makes this a lot easier by supporting newline-delimited JSON files.
mkdir pokeapi-middleware
cd pokeapi-middleware
composer require react/event-loop react/http clue/ndjson-react

Edit the generated composer.json file to register our PSR-4 autoload namespace for our class.

{
    "require": {
        "react/event-loop": "^1.1",
        "react/http": "^1.1",
        "clue/ndjson-react": "^1.1"
    },
    "autoload": {
      "psr-4": {
        "PokeAPiMiddleware\\": "src"
      }
    }
}

💪 Now we can write some code.

The execution script

First, we will write the script that will be executed to scrape the API. That way you can give it a few tries along the way and do some experimentation. Because I know I am the kind of person who starts reading a blog, writes some code from the blog, and then ends up in a ton of experiments. So I want to make sure you can do the same.

Create a PHP file catch-em.php in the root of your project. It does a few things for us

  • Loads the autoloader generated by Composer
  • Creates an event loop
  • Instantiates our PokemonMaster class
  • Runs the API scraper

It's really simple, but that is because all of our logic is in our class.

<?php declare(strict_types=1);

use PokeAPiMiddleware\PokemonMaster;

require __DIR__ . '/vendor/autoload.php';

$loop = React\EventLoop\Factory::create();

$catcher = new PokemonMaster($loop);
$catcher->catchEmAll();

$loop->run();

When using ReactPHP, your application revolves around creating an event loop and running it. You should always save calling $loop->run(); for the last line of your code. The PHP script will continue executing so long as the event loop has ticks queued for execution. If you run the loop before anything is registered, nothing will happen.

You'll be able to scrape the API by running the script with

php catch-em.php

The PokemonMaster: asynchronous and concurrent data fetching

As you saw from the execution script, we need a PokemonMaster class requires the event loop that has a catchEmAll method. Create a src directory and the PokemonMaster.php file inside of it (src/PokemonMaster.php).

Let's create the base scaffolding of the code. We'll define our class and its constructor, along with our entry point method. The constructor takes a loop as its parameter and sets it as a property. We then also construct a new browser object to act as our API client.

<?php declare(strict_types=1);

namespace PokeAPiMiddleware;

use React\EventLoop\LoopInterface;

final class PokemonMaster
{
    private $loop;
    private $client;

    public function __construct(LoopInterface $loop)
    {
        $this->loop = $loop;
        $this->client = new Browser($loop);
    }

    public function catchEmAll(): void
    {

    }
}

Our catchEmAll method will add our first tick to the event loop. That way things get rolling once the loop runs. Our catchEmAll method will kick off a series of requests to go through the PokeAPI and the list of Pokemon. Since the list of Pokemon is paginated, we will want to put that logic. That way we can call the method for each page of results.

    private function fetchPokemonList(string $url)
    {
        $this->loop->futureTick(function () use ($url) {
            print "\033[32m[http]\033[0m Fetching {$url}" . PHP_EOL;
            $this->client
            ->get($url)
            ->then(function (ResponseInterface $response) {
                $body = \json_decode((string)$response->getBody());

                if ($body->next !== null) {
                    $this->fetchPokemonList($body->next);
                }

                foreach ($body->results as $result) {
                    // Process the results!
                }
            });
        });
    }

    public function catchEmAll(): void
    {
        $this->fetchPokemonList('https://pokeapi.co/api/v2/pokemon/');
    }

Like every API should, the PokeAPI returns next and previous links for collection resources.

{
    "count": 1050,
    "next": "https://pokeapi.co/api/v2/pokemon/?offset=40&limit=20",
    "previous": "https://pokeapi.co/api/v2/pokemon/?offset=0&limit=20",
    "results": [
    ]
}

That means we can just keep calling fetchPokemonList with the initial API URL and continue following the next links until the API tells us there is nothing else to process.

At this point, we have not really gained much by using ReactPHP over a loop with Guzzle. But, that will change once we start processing the results. If there is a next link we add a new tick to fetch the next results and then we process the results. We then need to loop through each of the results and retrieve the Pokemon resource.

To do this, we will create a fetchPokemon method. 

    private function fetchPokemonList(string $url)
    {
        $this->loop->futureTick(function () use ($url) {
            print "\033[32m[http]\033[0m Fetching {$url}" . PHP_EOL;
            $this->client
            ->get($url)
            ->then(function (ResponseInterface $response) {
                $body = \json_decode((string)$response->getBody());

                if ($body->next !== null) {
                    $this->fetchPokemonList($body->next);
                }

                foreach ($body->results as $result) {
                    // Fetch the Pokemon resource
                    $this->fetchPokemon($result);
                }
            });
        });
    }

    private function fetchPokemon(object $result)
    {
        $this->loop->futureTick(function () use ($result) {
            print "\033[32m[http]\033[0m Fetching {$result->url}" . PHP_EOL;
            $this->client->get($result->url)->then(function (ResponseInterface $response) {
                $pokemon = \json_decode((string)$response->getBody());
                // Normalize the resource to data we care about.
            });
        });
    }

We call fetchPokemon for each result. The fetchPokemon adds a tick to the event loop to queue an HTTP request for the Pokemon resource. This is where ReactPHP becomes beneficial for non-blocking operations and the ReactPHP HTTP library for concurrent requests.

  • The API client leverages promises, so our script is not blocked on waiting for a request to complete before other ticks are handled.
  • We are adding a new tick for the new Pokemon collection page before we process the results of the response. 
  • Each result is added as its own tick to the event loop to be processed after other ticks on the loop

This allows us to begin having concurrent requests and processing without being blocked on each individual HTTP request (it took me a few times to understand, and even write that.) That is one reason I left in the debugging print statements because I find it cool to watch the order of things.

The PokemonMaster: streaming normalized data to an artifact.

Great! We have data. Now, what? We can write the data to a JSON file for later parsing. If we wanted to, we could have a class property that was an array of the data to be processed and write it all at the end. But that would use a lot of memory, and honestly is not as fun. We can use streams to write our data to a JSON file to use less memory and also handle our script's concurrent nature.

That's where the NDJSON format comes in handy. Instead of requiring a root array object in our JSON file, we can essentially just append new rows to the file. 

In the constructor, we will create a writeable stream and pass that to the NDJSON library's Encoder. We'll set the encoder as a class property so that we can access in our closure.

final class PokemonMaster
{
    private $loop;
    private $client;
    private $encoder;

    public function __construct(LoopInterface $loop)
    {
        $this->loop = $loop;
        $this->client = new Browser($loop);

        $stream = new WritableResourceStream(fopen('pokemon.json', 'wb'), $this->loop);
        $this->encoder = new Encoder($stream, JSON_UNESCAPED_SLASHES);
    }

Hat tip to Christian Lück (the ReactPHP maintainer) for showing me the JSON_UNESCAPED_SLASHES option!

Now we can update fetchPokemon to write data to our JSON stream. Instead of saving the entire payload, we will just save the name and types property.

    private function fetchPokemon(object $result)
    {
        $this->loop->futureTick(function () use ($result) {
            print "\033[32m[http]\033[0m Fetching {$result->url}" . PHP_EOL;
            $this->client->get($result->url)->then(function (ResponseInterface $response) {
                $pokemon = \json_decode((string)$response->getBody());
                $normalized = [
                    'name' => $pokemon->name,
                    'order' => $pokemon->order,
                    'types' => array_map(static function (object $type) {
                        return $type->type->name;
                    }, $pokemon->types),
                ];
                $this->encoder->write($normalized);
            });
        });
    }

That's it!

The completed file

Your completed file should look like the following:

<?php declare(strict_types=1);

namespace PokeAPiMiddleware;

use Clue\React\NDJson\Encoder;
use Psr\Http\Message\ResponseInterface;
use React\EventLoop\LoopInterface;
use React\Http\Browser;
use React\Stream\WritableResourceStream;

final class PokemonMaster
{
    private $loop;
    private $client;
    private $encoder;

    public function __construct(LoopInterface $loop)
    {
        $this->loop = $loop;
        $this->client = new Browser($loop);

        $stream = new WritableResourceStream(fopen('pokemon.json', 'wb'), $this->loop);
        $this->encoder = new Encoder($stream, JSON_UNESCAPED_SLASHES);
    }

    private function fetchPokemonList(string $url)
    {
        $this->loop->futureTick(function () use ($url) {
            print "\033[32m[http]\033[0m Fetching {$url}" . PHP_EOL;
            $this->client
            ->get($url)
            ->then(function (ResponseInterface $response) {
                $body = \json_decode((string)$response->getBody());

                if ($body->next !== null) {
                    $this->fetchPokemonList($body->next);
                }

                foreach ($body->results as $result) {
                    $this->fetchPokemon($result);
                }
            });
        });
    }

    private function fetchPokemon(object $result)
    {
        $this->loop->futureTick(function () use ($result) {
            print "\033[32m[http]\033[0m Fetching {$result->url}" . PHP_EOL;
            $this->client->get($result->url)->then(function (ResponseInterface $response) {
                $pokemon = \json_decode((string)$response->getBody());
                $normalized = [
                    'name' => $pokemon->name,
                    'order' => $pokemon->order,
                    'types' => array_map(static function (object $type) {
                        return $type->type->name;
                    }, $pokemon->types),
                ];
                $this->encoder->write($normalized);
            });
        });
    }

    public function catchEmAll(): void
    {
        $this->fetchPokemonList('https://pokeapi.co/api/v2/pokemon/');
    }
}

Now we can give it a try! Run the script:

php catch-em.php

Now check your created JSON file. Your results will vary, but here is an example:

{"name": "caterpie","order": 14,"types": ["bug"]}
{"name": "butterfree", "order": 16, "types": ["bug", "flying"]}
{"name": "charizard", "order": 7, "types": ["fire", "flying"]}
{"name": "kakuna", "order": 18, "types": ["bug", "poison"]}
{"name": "ivysaur", "order": 2, "types": ["grass", "poison"]}
{"name": "bulbasaur", "order": 1, "types": ["grass", "poison"]}
{"name": "squirtle", "order": 10, "types": ["water"]}

If you notice, the order field is not sequential The Pokemon collection resource returns by the order property in ascending order. The fact we have out of sequence records means we had a script working asynchronously with concurrent requests!

But, there is more!

You may have noticed that there isn't much data. We also need to fetch the Species resource for the Pokemon to get things like its name, description, and more. This blog was already complicated and I wanted to provide a more simplified example here.

I have put code for a more robust example on GitHub:

I even kept the Git history, so you can see what it looked like as just a bunch of functions: https://github.com/mglaman/pokeapi-middleware/commit/6e29b68b1ce49d1f84…

😱

    $client
        ->get($_ENV['POKEAPI_URL'] . '/pokemon?' . http_build_query([
            'offset' => $offset,
            'limit' => RESULT_LIMIT,
        ]))
        ->then(static function (ResponseInterface $response) use ($offset, $loop, $client, $dest) {
            $body = \json_decode((string)$response->getBody());
            if ($body->next !== null) {
                addApiFetchTimer($offset + RESULT_LIMIT, $loop, $client, $dest);
            }
            foreach ($body->results as $result) {
                $loop->futureTick(static function () use ($result, $loop, $client, $dest) {
                    doFetchPokemon($result->url, $loop, $client, $dest);
                });
            }
        });