Difference between revisions of "Job queue"

From Organic Design wiki
(See also: How DroidWiki runs jobs] ''- running the Wikimedia redis job runner on a single server)
(Installation and configuration of Redis: mediawiki-services-jobrunner scripts and repo)
Line 16: Line 16:
 
First the Redis service and PHP client library need to be installed which cam be done with ''apt'' as follows on Debian-like systems:
 
First the Redis service and PHP client library need to be installed which cam be done with ''apt'' as follows on Debian-like systems:
 
<source lang="bash">
 
<source lang="bash">
apt-get install redis-server php5-redis
+
apt-get install redis-server php-redis
 
</source>
 
</source>
  
Line 44: Line 44:
 
</source>
 
</source>
  
 +
This is the MediaWiki side sorted out, but the meaning of the ''daemonized'' setting above is that it's expected that ''redisJobRunnerService'' be running in the background to execute jobs off the queue, and that ''redisJobChronService'' be running to aggregate the new jobs and push them onto the redis job queue. First clone the [https://github.com/wikimedia/mediawiki-services-jobrunner mediawiki-services-jobrunner] repo.
 +
<source lang="bash">
 +
git clone https://github.com/wikimedia/mediawiki-services-jobrunner
 +
</source>
  
 
== See also ==
 
== See also ==

Revision as of 11:31, 19 June 2019

The job queue in MediaWiki is used to perform jobs in the background such as updating category links, back links or search indexes. For wikis with a lot of content, a single edit (such as a change to a template or a category link) can involve changes to thousands of pages, so it's not practical to commit all the changes an action makes during the page request. Instead these changes are added to the job queue so that they can be done in the background as time and resource permits.

There are a number of ways the job queue can be implemented in an installation. The default way is to use the $wgJobRuRate configuration variable to have a certain number of jobs pulled off the queue and executed at the end of each request. But for wikis that don't serve much traffic, especially intranet wikis, this method isn't functional. Another setting called $wgRunJobsAsync allows the jobs that run at the end of the request to launch in their own asynchronous threads so that they can continue by themselves and not hold up the original page load, but this method seems to allow some jobs to get silently dropped. The runJobs.php maintenance script can be run from the crontab but this is not very responsive and causes lags so that for example red links stay red for a while after the article is created.

Redis

Recently the option to use Redis, an open-source, networked, in-memory, key-value data store written in C. The general approach here is to set $wgJobRunRate to zero and delegate the whole job execution system out to a background service. Here's what Wikimedia says about the use of Redis for the job queue and session storage:

Quote.pngWe previously stored the MW job queue in MySQL. This gave us lots of useful features, like replication and indexing for duplicate removal, but it has often been hard to manage the performance implications of the high insert rate. Among its many features, Redis embeds a Lua interpreter on the server side. The new Redis job queue class provides a rich feature set superior to the MySQL job queue, mainly through several server-side Lua scripts which provide high-level job queue functions. Redis is also used to keep a hash table that tracks which job queues actually have jobs, so runners know where to look. Updates to this table are push-based, so it is always up-to-date.
Quote.pngThe Wikimedia Foundation has been using Redis as a memcached replacement for session storage since the eqiad switchover in January 2013, because it has a replication feature which can be used to synchronise data between the two data centres. It allowed us to switch from Tampa to Ashburn without logging everyone out.

CirrusSearch recommend Redis as the JobQueue mechanism as well, the following is quoted from mediawiki-extensions-CirrusSearch:

Quote.pngCirrus makes heavy use of the job queue. You can run it without any job queue customization but if you switch the job queue to Redis with checkDelay enabled then Cirrus's results will be more correct. The reason for this is that this configuration allows Cirrus to delay link counts until Elasticsearch has appropriately refreshed... Note: some MediaWiki setups have trouble running the job queue. It can be finicky. The most sure fire way to get it to work is also the slowest, set $wgRunJobsAsync to false

Installation and configuration of Redis

First the Redis service and PHP client library need to be installed which cam be done with apt as follows on Debian-like systems:

apt-get install redis-server php-redis

The following additions to LocalSettings.php will then add Redis as a caching option and use it for the main cache:

$wgRunJobsAsync = false;
$wgJobRunRate = 0;
$wgMainCacheType = 'redis';
$wgSessionCacheType = 'redis';
$wgObjectCaches['redis'] = array(
	'class'       => 'RedisBagOStuff',
	'servers'     => array( 'localhost' ),
);
$wgJobTypeConf['default'] = array(
	'class'       => 'JobQueueRedis',
	'order'       => 'fifo',
	'redisServer' => 'localhost',
	'checkDelay'  => true,
	'daemonized'  => true,
	'redisConfig' => array(),
);
$wgJobQueueAggregator = array(
	'class'       => 'JobQueueAggregatorRedis',
	'redisServer' => 'localhost',
	'redisConfig' => array(),
);

This is the MediaWiki side sorted out, but the meaning of the daemonized setting above is that it's expected that redisJobRunnerService be running in the background to execute jobs off the queue, and that redisJobChronService be running to aggregate the new jobs and push them onto the redis job queue. First clone the mediawiki-services-jobrunner repo.

git clone https://github.com/wikimedia/mediawiki-services-jobrunner

See also