Home » Blog » Engineering » Cavalcade: WordPress Jobs at Scale
Share on

Cavalcade: WordPress Jobs at Scale

At the heart of every web application is a basic process: receive a request, return a response. With the right architecture, this process is able to serve everything from the smallest site all the way up to the very largest. Once sites start getting more complex, there’s quickly a need for two separate-yet-related abilities: scheduled tasks, and asynchronous processing of long-running tasks.

WordPress includes the ability to do both of these through a system called wp-cron. It offers scheduled and repeating tasks (just like cron), and can be used for asynchronous processing. However, it has serious issues when running at scale, like unreliability, sequential processing, and compatibility with multisite. Replacements require complex setup processes, don’t integrate well with WordPress, or don’t scale for real production use.

To fix these problems, we built Cavalcade, a horizontally-scalable WordPress jobs processing solution. We’ve been running Cavalcade in production for almost two years, and it’s also in use on WordPress.org, so we’re confident in its stability and capability.

Diagram showing flow of WordPress jobs to the wp_cavalcade_jobs table, then to the Cavalcade Runner, then fanning out to 4 worker instances

Limitations of wp-cron

Cavalcade originally came out of project requirements from Happytables. We needed to run long-running, scheduled tasks on every site on a massive multisite installation. These tasks could take multiple minutes (as they processed large amounts of data from external APIs), and we needed to ensure they ran every hour without fail.

It was immediately apparent that WordPress’ built-in wp-cron solution wasn’t suitable for the job: it lacks the reliability we needed, and has no support for parallel processing. Internally, the way wp-cron works is by triggering a loopback HTTP request after a page load. This then sequentially processes each “due” job.

foreach ( $crons as $timestamp => $cronhooks ) {
    foreach ( $cronhooks as $hook => $keys ) {
        foreach ( $keys as $k => $v ) {
            /**
             * Fires scheduled events.
             */
             do_action_ref_array( $hook, $v['args'] );
        }
    }
}

WordPress is fundamentally hooked to the HTTP request/response cycle, so wp-cron requires a request to the site to trigger it. This is not useful for low-traffic sites, or sites that don’t fit the regular pattern of sites (such as a highly-cached web app with push invalidation). The sequential processing is also both slow, and prone to breakage: if one task fails, processing stops completely.

Since wp-cron uses HTTP behind the scenes, it’s also subject to HTTP limitations on request timeouts. While it makes efforts to avoid this (such as calling ignore_user_abort()), hard limits set by the server for safety can kill the processes.

The typical solution to this is to run wp-cron via the command line instead, typically by using the system cron daemon to run php wp-cron.php once a minute. However, this is still subject to sequential processing limitations, and doesn’t work with multisite, making it unusable for our needs.

Existing solutions

Before we decided to build Cavalcade, we looked into existing solutions, both in the WordPress world and the wider software community.

We already had an existing solution for this problem (borne of WP Remote), called Job Agency. Job Agency included the ability to schedule jobs for immediate or later execution, but did not include capabilities for recurring jobs. In addition, it wasn’t designed to be horizontally scalable across a fleet of servers, and did not have support for multisite.

Our friends at Prospress had a similar problem with scheduled tasks for WooCommerce Subscriptions, and came up with Action Scheduler. This system is built on top of wp-cron, but introduces its own scheduling. Unfortunately, it didn’t quite fit our use case, as it builds atop wp-cron (and inherits some of the downsides), it includes its own API, and it stores data in custom post types rather than finely tuned custom tables.

In the wider community, we looked into solutions like Gearman, Sidekiq, Resque, and others. While these weren’t integrated with WordPress, they were viable alternatives, due to large communities around them, and strong documentation. However, these systems often required complex setup or databases, such as Redis installs.

It was clear that none of the existing solutions really suited our use cases, and integrating with them would be complex. We wanted to come up with a solution that was simple enough to reason about, had deep integration into WordPress, and would be accessible by the wider community. We wanted to ensure the solution was usable without special setup, as avoiding lock-in has long-term benefits as we evolve our infrastructure. Although none of the existing solutions was a fit, they did act as inspiration, and significantly influenced the design of Cavalcade.

Say hello to Cavalcade

Cavalcade is our solution to scaling both scheduled jobs and asynchronous tasks. It stores jobs in WordPress’ database, runs tasks using wp-cli, and integrates deeply into the existing wp_schedule_event() and wp_schedule_recurring_event() functions in WordPress. It runs jobs in parallel, so that long-running tasks don’t hold up anything else.

(Why “Cavalcade”? A cavalcade is a procession, or queue. It’s also a lyric in both Fur Eyes by Violent Soho, and Big Parade by The Lumineers, both of which I happened to be listening to when it came time to name the project.)

Cavalcade comes in two parts: a plugin for deep integration into WordPress, and a daemon (Cavalcade Runner) for monitoring the jobs table and launching workers.

There are only two points of communication between the plugin and the runner: the plugin and the runner communicate about jobs via the wp_cavalcade_jobs table in MySQL, and the runner launches wp-cli processes. Due to this decoupling, either piece can be swapped out as needed, such as for integration into an existing system, or use in non-WordPress applications. This also allows separating job processing from your application servers if desired.

Why Cavalcade?

Cavalcade is a small, focussed utility with minimal requirements. It doesn’t require any new databases (like Redis), only requires PHP on the server, and can easily be plugged into existing workflows. Your codebase can continue using the regular wp-cron functions (like wp_schedule_event()), and developers don’t need to learn new APIs.

Since Cavalcade plugs into the existing wp-cron APIs, you can enable Cavalcade only in production, reducing local setup requirements for developers. Alternatively, we offer an official extension for Chassis.

Cavalcade is also designed to be horizontally-scalable. For large traffic sites with multiple app servers, you can install Cavalcade Runner on every app server, and your job processing capacity will scale with your regular traffic. You can also create dedicated job servers running just the Runner, and scale this separately. Combined with AWS EC2 auto-scaling groups, you can use this to fluidly scale job processing power up when needed, including using scheduled scaling if you have large daily jobs.

Autoscaling App Servers and Job Servers

You can also expect the same extensibility with Cavalcade that you have with other WordPress projects. The plugin integrates with WordPress’ existing APIs, and hence the same hooks, while the runner includes a very similar plugin API. You can use this to add custom reporting or logging, change the number of parallel workers, swap out the database configuration, or anything else you need.

Installation

To power-up your site with Cavalcade, two installation steps are required: adding the plugin, and setting up the Runner on the system.

Installing the plugin is as easy as installing any WordPress plugin, however we recommend installing it as an mu-plugin to ensure it cannot accidentally be disabled. You’ll also want to disable the default wp-cron spawning.

Running the Runner only requires the ability to run a PHP script on the command line. We wanted to make it easy to install and use for everyone, without needing to know huge amounts of system administration, or install extra database tools. We also provide Upstart and systemd service scripts for anyone who wants integration into their system. System-level or root access is not required to run the Runner, but we don’t recommend using it on systems you don’t control (such as shared hosting). (Note that the Runner also requires the pcntl extension to be enabled in PHP.)

Adapting WordPress jobs

While Cavalcade uses existing APIs in WordPress and you don’t need to update your code, it’s often useful to rethink how you’re tackling problems. Since Cavalcade runs jobs in parallel, it’s useful to restructure your existing jobs to take advantage of this.

For example, we had a recurring job which sent email reports daily to hundreds of users. This email report needed to be generated once, then customised for each user. The original task generated the report, then looped through the users, and sent each email. This was a slow process, and a failure sending one email could potentially crash the process.

With Cavalcade, we instead have a single scheduled job that generates the report. It then loops through the users, and schedules an asynchronous task for each user. Workers can then process through these users in parallel, and singular failures are logged and reported individually without affecting the rest of the queue.

<?php

// Before
wp_schedule_event( time(), 'daily', 'send_reports' );

add_action( 'send_reports', function () {
	$report = Reports\generate_report();
	foreach ( $report->recipients as $recipient ) {
		$customised = $report->for_recipient( $recipient );
		Email\send( $customised->as_email() );
	}
}


// After
wp_schedule_event( time(), 'daily', 'send_reports' );

add_action( 'send_reports', function () {
	$report = Reports\generate_report();
	foreach ( $report->recipients as $recipient ) {
		wp_schedule_single_event( time(), 'send_single_report', [ $report, $recipient ] );
	}
});

add_action( 'send_single_report', function ( Reports\Report $report, WP_User $recipient ) {
	$customised = $report->for_recipient( $recipient );
	Email\send( $customised->as_email() );
});

We’ve provided a few examples of how to restructure your jobs to take advantage of Cavalcade’s processing. If there’s a use case you’d like to discuss, we’re happy to provide feedback on approaches on GitHub.

Roadmap

We recently added a lightweight plugin system to the Runner, allowing for better flexibility and customisation of the runner. This was based on feedback from our contributors, as well as an internal need for better reporting and analytics.

We’ve since added better reporting for high-level statistics by pushing them to CloudWatch. This allows us to monitor the state of Cavalcade jobs within the existing reporting framework we use for other system utilities.

List of WordPress jobs from Cavalcade in AWS CloudWatch

Improving logging and reporting is important, so we’re also planning on adding Slack reporting tools as well. Currently, while errors are logged to the Cavalcade logs table, this requires active inspection. We’re working on moving this to push-based reporting with alerts for failed jobs. We expect to release this soon.

(We tested a Lambda-based solution previously, which required complex setup, and it also ran on a schedule rather than being push-based. The introduction of the plugin system allows us to move this to live alerts, and should be much easier to set up. This Lambda function isn’t in a usable state for external use, so we’re not releasing it as open source right now, but if you really want a copy, let us know.)

Over to you

We’d love to hear from people using Cavalcade, and improve it based on their feedback. We’ve been running Cavalcade in production for just under two years, and it’s in use on other significant projects, including powering WordPress.org.

Start using Cavalcade today, and forget about your wp-cron problems.

We’d also like to thank the third-party contributors to Cavalcade and Cavalcade Runner: Brandon Kraft, Dion Hulse, Dominik Schilling, Ian Dunn, Pascal Birchler, and Till Krüss. Their efforts using and contributing to Cavalcade outside of our infrastructure are much appreciated, helping us make the software better for us and the wider community.

 


Human Made is the technology partner of choice for the world’s leading brands.