Wiki daemon

From Organic Design wiki
Revision as of 09:03, 7 January 2010 by Nad (talk | contribs) (Synchronising mail: .* in dovecot index match)

The wiki daemon (wikid.pl) runs in the background of all machines running the organicdesign-server package.

Configuration

The main script of the wiki daemon is wikid.pl, but it should never be used independently of all the files in the tools repository and should always reside in the /var/www/tools directory. The configuration of the wiki daemon is in the /var/www/tools/wikid.conf file. It must be valid Perl syntax since it is incorporated using Perl's require function. Any functions defined or included within wikid.conf will override functions of the same name defined in the main script. A sample configuration file is included with the main script here.

Execution framework

The wiki daemons have a job execution framework which allows ongoing jobs to run in a multiplexed way. The execution framework is persistent so that jobs will continue to execute after daemon or server restart. The work is stored in the global $::work array and the currently executing job's data is in the global $::job hash. See also the set up a new bot job procedure for detailed code examples.

There are three special functions which may be directly called by the daemon, one when an instance of the job type is first started, one which is called iteratively until the job is complete, and another after the job has completed so that any loose ends can be tied up such as closing database connections or file handles. These three functions are init, main and stop and are combined with the name of the job type when defining the functions, for example initFoo. Only the main is mandatory, the init and stop may not be necessary for some kinds of jobs.

Internal functions

In addition to the three special job functions, a job script may also define a number of special internal functions to be called by the other main functions when needed. These should follow the naming convention of jobFooBar (the "Bar" function used by functions in the "Foo" job) to avoid possible naming conflicts with other jobs or components.

The global job hash

When the daemon calls any of the three special job functions, it makes all of the data concerning the job they refer to available in the hash referred to by the $::job hashref. The job functions can store any data they like in this hash and it will be persistently available even if the daemon or server need to be restarted. Some of the keys in the job hash are used by the job execution framework as follows (the $$ is due to $::job being a reference to a hash rather than an actual hash).

  • $$::job{'id'} the daemon will set this to a GUID when a new job is started
  • $$::job{'type'} the daemon sets this when the job is started
  • $$::job{'wiki'} the daemon sets this when the job is started to the $::script value if set, or the $::wiki value if not
  • $$::job{'user'} the daemon sets this to the wiki user it will interact with the wiki through
  • $$::job{'start'} the daemon sets this to the UNIX time stamp when the job is started
  • $$::job{'paused'} a jobs main will not be called if its paused value is not empty
  • $$::job{'finish'} the daemon sets this to the UNIX time stamp when the job finishes
  • $$::job{'length'} the job's init function should calculate and set this if the number of iterations is able to be determined
  • $$::job{'status'} this can be set by the job functions and is displayed in the WikidAdmin special page
  • $$::job{'revisions'} if the jobs change wiki articles, this value should track the number of changes made
  • $$::job{'errors'} any errors occurring during execution should be logged in this value
  • $$::job{'wptr'} the daemon sets this to zero when the job starts and increments it on every execution of the job's main function. The job is automatically stopped when this value reaches the length value.

Scheduled functions

Events and actions

Remote job execution

The wiki daemons now have RPC (Remote Procedure Call) capability which allows events to be propagated amongst peers and actions to be executed by remote request. The first use of wikid RPC is the doUpdateAccount action since changes to accounts must be securely and robustly replicated across all peers. The RPC method is robust and designed to account for the following key issues and requirements:

  • Specific recipient or broadcast messages
  • Messages keep trying until successfully propagated
  • Messages are encrypted

Distributed servers

A number of remote servers can be configured as a distributed network having a simple ring topology. This group will then automatically synchronise all their shared files, wiki content, email and all the associated user accounts. The synchronisation process is designed to be robust against network outages by having a persistent job queue system. Each server's local DNS is configured such that all requests to any of the organisations wiki, mail or file services resolve to the local server when within any of the LANs. When not in a LAN the domains resolve to their "home" server.

The main work horse of our distributed server solution is the Unison file synchroniser which does bidirectional synchronisation of directory structures between two remote computers over an SSH connection. Unison uses the rsync algorithm to only transfer the parts of files which have changed.

Apart from the standard file synchronisation which is done with a regular Unison call, there are a few things which require more specific options and configuration.

Synchronising mail

Our mail configuration procedure uses the Maildir storage format with the Exim4 mail transfer agent and Dovecot IMAP server. It is fine to use Unison to directly synchronise Maildirs, but there are a couple of issues.

Problem 1
The Maildir protocol includes a tmp directory in each folder which contains files that are only partially written and will be transferred into the new directory when complete. We include ignore => "Path tmp" to prevent those from being propagated, and another similar option of ignore => "Regex .*dovecot[-.].+" to prevent Dovecot index files from being propagated. Here's the default Maildir synchronisation example included in our wikid.conf.sample file:

{{{1}}}


Problem 2
This took a long time to figure out, but actually involves only very trivial additions to the Exim and Dovecot configurations. The problem is that Unison doesn't have the permission to set the owner of the files it sends, and Dovecot doesn't have permission to manipulate files that are owned by the shared network SSH user.

It would be too unsafe to make the shared user have root privileges, so the current solution is to have the maildirs be owned by the common SSH user (the wikid user by default). Both Exim and Dovecot can be configured to handle maildirs under a specific user/group, see the running all maildirs under a single user section of the mail server configuration procedure for specifics.

Problem 3
When Unison modifies a maildir, the Dovecot indexes become out of date and so IMAP users will not see the changes until they reload the folder. This should not be a problem on modern system because all kernels come with inotify and Dovecot on Debian based systems uses it by default (see mailbox handling optimisations). To check if it's using it, execute dovecot with the --build-options switch.

Synchronising user accounts

When an account is created or updated in a wiki daemon's local wiki, the wiki will execute a PrefsPasswordAudit hook or a AddNewAccount hook which will then pass down the EventPipe into the wiki daemon and execute the onPrefsPasswordAudit or onAddNewAccount function, which then call doUpdateAccount with the appropriate arguments extracted from $::data.

In addition to calling doUpdateAccount which updates and synchronises the local unix and samba accounts, the action also calls rpcBroadcastAction so that the same update occurs on all peers (in a ring starting with $::netpeer). The rpcBroadcastAction function actually just calls rpcSendAction with an empty "To" parameter to indicate a broadcast message.

The rpcSendAction function starts a new "RpcSendAction" job in the persistent work hash so that the attempt to send the message can keep retrying periodically until successful in the case of outages. The job consists of just a main function (mainRpcSendAction) which does the periodic send attempts. The initial set up of the job is done in rpcSendAction rather than initRpcSendAction since the data must be encrypted before it gets stored in the persistent job hash. The action and arguments are first serialised into a string, encrypted using the $::netpass shared secret, and then converted to base64 so that the data can be supplied in URL's and command-line options.

The mainRpcSendAction function uses Net::Expect to connect to the remote peer using SSH and login with $::netuser and $::netpass. After it's shelled in it then executes the RPC by using the following syntax from the shell:

<bash>

wikid --rpc <base64encoded-encrypted-serialised-data> </bash>

When wikid.pl is run with the --rpc option, it simply formats the data into an event called RpcDoAction which is sent into the wiki daemon through its proper port as if it were a normal MediaWiki event coming through the EventPipe, which results in the onRpcDoAction function being called.

Unlike other wikid event-handler functions, the $::data available to the onRpcDoAction handler is encrypted and so first needs to be converted back to an @args array (by base64 decoding, then decrypting with the shared $::netpass secret, then deserialising back into an array). If the action exists, then it is called along with the original arguments, otherwise an error is logged. If the action was broadcasted (having an empty "to" argument), then the same action and args will be sent to the next peer by calling rpcSendAction again at this point.

Synchronising wiki content

The wiki's are to be replicated across all the servers. This hasn't been done yet but will be an event-driven RPC based solution similar to account propagation. In the case of edit conflicts it simply merges all revisions into the history adding useful revision information into the edit summaries and notifying the editors involved of the conflicts.