Difference between revisions of "Job queue"

From Organic Design wiki
m (Installation and configuration of the Redis job runner service)
m (Installation and configuration of the Redis job runner service)
Line 66: Line 66:
 
{{quote|redisJobRunnerService is an infinite "while loop" used to call MediaWiki runJobs.php and eventually attempt to process any job enqueued. A number of virtual sub-loops with their own runners and job types must be defined via parameters. These loops can set any of the job types as either low or high priority. High priority will get ~80% of the time share of the sub-loop under "busy" conditions. If there are few high priority jobs, the loops will spend much more of their time on the low priority ones.|README}}
 
{{quote|redisJobRunnerService is an infinite "while loop" used to call MediaWiki runJobs.php and eventually attempt to process any job enqueued. A number of virtual sub-loops with their own runners and job types must be defined via parameters. These loops can set any of the job types as either low or high priority. High priority will get ~80% of the time share of the sub-loop under "busy" conditions. If there are few high priority jobs, the loops will spend much more of their time on the low priority ones.|README}}
  
We can see from the loop on lines 22-28 of the [https://github.com/wikimedia/mediawiki-services-jobrunner/blob/master/redisJobRunnerService redisJobRunnerService] script that the "sub-loops" referred to are the "groups" in the config file, and that each of their "runner" parameters refers to the number of threads (minus one) that will run for each group. In the code an debugging output, these runners are referred to as "slots". On line 24, ''JobRunnerPipeline::initSlot'' is called with two parameters, the loop (group) and the slot number (runner) within the loop. Each loop then divides the jobs amongst its runner using on a two-tier prioritisation mechanism based on the job types assigned to the group.
+
We can see from the loop on lines 22-28 of the [https://github.com/wikimedia/mediawiki-services-jobrunner/blob/master/redisJobRunnerService redisJobRunnerService] script that the "sub-loops" referred to are the "groups" in the config file, and that each of their "runner" parameters refers to the number of threads (minus one) that will run for each group. In the code and debugging output, these runners are referred to as "slots". On line 24, ''JobRunnerPipeline::initSlot'' is called with two parameters, the loop (group) and the slot number (runner) within the loop. Each loop then divides the jobs amongst its runner using on a two-tier prioritisation mechanism based on the job types assigned to the group.
  
 
== See also ==
 
== See also ==

Revision as of 22:41, 22 June 2019

The job queue in MediaWiki is used to perform jobs in the background such as updating category links, back links or search indexes. For wikis with a lot of content, a single edit (such as a change to a template or a category link) can involve changes to thousands of pages, so it's not practical to commit all the changes an action makes during the page request. Instead these changes are added to the job queue so that they can be done in the background as time and resource permits.

There are a number of ways the job queue can be implemented in an installation. The default way is to use the $wgJobRuRate configuration variable to have a certain number of jobs pulled off the queue and executed at the end of each request. But for wikis that don't serve much traffic, especially intranet wikis, this method isn't functional. Another setting called $wgRunJobsAsync allows the jobs that run at the end of the request to launch in their own asynchronous threads so that they can continue by themselves and not hold up the original page load, but this method seems to allow some jobs to get silently dropped. The runJobs.php maintenance script can be run from the crontab but this is not very responsive and causes lags so that for example red links stay red for a while after the article is created.

Redis

Recently the option to use Redis, an open-source, networked, in-memory, key-value data store written in C. The general approach here is to set $wgJobRunRate to zero and delegate the whole job execution system out to a background service. Here's what Wikimedia says about the use of Redis for the job queue and session storage:

Quote.pngWe previously stored the MW job queue in MySQL. This gave us lots of useful features, like replication and indexing for duplicate removal, but it has often been hard to manage the performance implications of the high insert rate. Among its many features, Redis embeds a Lua interpreter on the server side. The new Redis job queue class provides a rich feature set superior to the MySQL job queue, mainly through several server-side Lua scripts which provide high-level job queue functions. Redis is also used to keep a hash table that tracks which job queues actually have jobs, so runners know where to look. Updates to this table are push-based, so it is always up-to-date.
Quote.pngThe Wikimedia Foundation has been using Redis as a memcached replacement for session storage since the eqiad switchover in January 2013, because it has a replication feature which can be used to synchronise data between the two data centres. It allowed us to switch from Tampa to Ashburn without logging everyone out.

CirrusSearch recommend Redis as the JobQueue mechanism as well, the following is quoted from mediawiki-extensions-CirrusSearch:

Quote.pngCirrus makes heavy use of the job queue. You can run it without any job queue customization but if you switch the job queue to Redis with checkDelay enabled then Cirrus's results will be more correct. The reason for this is that this configuration allows Cirrus to delay link counts until Elasticsearch has appropriately refreshed... Note: some MediaWiki setups have trouble running the job queue. It can be finicky. The most sure fire way to get it to work is also the slowest, set $wgRunJobsAsync to false

Installation and configuration of Redis

First the Redis service and PHP client library need to be installed which cam be done with apt as follows on Debian-like systems:

apt-get install redis-server php-redis

The following additions to LocalSettings.php will then add Redis as a caching option and use it for the main cache:

$wgRunJobsAsync = false;
$wgJobRunRate = 0;
$wgMainCacheType = 'redis';
$wgSessionCacheType = 'redis';
$wgObjectCaches['redis'] = [
	'class'       => 'RedisBagOStuff',
	'servers'     => ['localhost'],
];
$wgJobTypeConf['default'] = [
	'class'       => 'JobQueueRedis',
	'order'       => 'fifo',
	'redisServer' => 'localhost',
	'checkDelay'  => true,
	'daemonized'  => true,
	'redisConfig' => [],
];
$wgJobQueueAggregator = [
	'class'       => 'JobQueueAggregatorRedis',
	'redisServer' => 'localhost',
	'redisConfig' => [],
];

Installation and configuration of the Redis job runner service

This is the MediaWiki side sorted out, but the meaning of the daemonized setting above is that it's expected that redisJobRunnerService be running in the background to execute jobs off the queue, and that redisJobChronService be running to aggregate the new jobs and push them onto the redis job queue. First clone the mediawiki-services-jobrunner repo.

git clone https://github.com/wikimedia/mediawiki-services-jobrunner

Next in the cloned repo directory, create a new jobrunner.json from the sample and set the aggregators and queues addresses to "localhost", and the dispatcher to the path to your MediaWiki codebase's runJobs.php script. You'll want to change the groups and job types for your own prioritisation needs, but for now these basic changes will suffice to get things running. Then just run each of these scripts in a separate terminal window so you can see if jobs are being queued and executed properly as you make changes in your wikis.

./redisJobRunnerService --config-file=jobrunner.json --verbose
./redisJobChronService --config-file=jobrunner.json --verbose

If this is all working as expected you'll see output from the job runner, showing spawned threads for jobs along with associated command, job type, database and table prefix, e.g.

Spawning runner in loop 0 at slot 18 (RefreshLinks, od-main_):
	php /var/www/htdocs/wiki/maintenance/runJobs.php --wiki='od-main_' --type='RefreshLinks' --maxtime='60' --memory-limit='300M' --result=json

The documentation for this service is very sparse, so we have to rely on the jobrunner.sample.json file, the source code, and the README in the repo which has one paragraph about the functionality:

Quote.pngredisJobRunnerService is an infinite "while loop" used to call MediaWiki runJobs.php and eventually attempt to process any job enqueued. A number of virtual sub-loops with their own runners and job types must be defined via parameters. These loops can set any of the job types as either low or high priority. High priority will get ~80% of the time share of the sub-loop under "busy" conditions. If there are few high priority jobs, the loops will spend much more of their time on the low priority ones.
— README

We can see from the loop on lines 22-28 of the redisJobRunnerService script that the "sub-loops" referred to are the "groups" in the config file, and that each of their "runner" parameters refers to the number of threads (minus one) that will run for each group. In the code and debugging output, these runners are referred to as "slots". On line 24, JobRunnerPipeline::initSlot is called with two parameters, the loop (group) and the slot number (runner) within the loop. Each loop then divides the jobs amongst its runner using on a two-tier prioritisation mechanism based on the job types assigned to the group.

See also