Difference between revisions of "Livelets"

Revision as of 02:32, 5 July 2007

What is it?

The idea of this extension is to allow live articles to be transcluded which update automatically on change in a non-polling, fully event-driven way. It will use the parser-functions syntax to make it very similar in usage to normal templates. It's called a livelet since it's very similar in concept to the re-usable areas of options and information in a web page usually referred to as a portlet, except that these are able to accept spontaneous incoming requests, not just to responses to their own requests, which means that livelets are able to stay up to date or be communicated with dynamically without any regular polling being necessary.

Since we already have all the code written to handle sockets between PERL and SWF, which won't ever be used for anything now since our peer interface is now in C and SDL, it seemed like a good opportunity to get it working in the field.

See also Category:Livelets

Development Plan

Updated: 12:31, 5 July 2007 (NZST)

The #live transclusion currently uses an xmlhttp request (XHR) object for each livelet, but soon this Javascript code will be removed and the livelet container changed to use the Mootools Ajax class. The Javascript code used to make an update request will be set in $wgLiveletsUpdateRequest, and there could also be a $wgLiveletsInitialise which allows livelets to call any necessary set up code, but Mootools doesn't need this setting as it's functionality is all based on class' attributes. The SWF can then call the update code when changes arrive associated with the current livelet id's (the SWF doesn't need to have any knowledge of what id's are running in the parent page's DOM, all messages sent to the SWF from the server are assumed to be current).

Livelet parameters

There are many different formats that the rendered content could take such as a pop-up div, an iFrame or simply as a link, but these kinds of things are more general properties which are also required in a non-live context. So the options that should be included in the livelet should be specific to the live updating ability of the container, and any format related attributes can be handled by other templates or parser-functions specified in the livelet's content. After disregarding the formatting parameters that might be required, there is only really one important parameter to a livelet; update, which specifies when the content should be updated:

never: the content will not be loaded initially and also will not be automatically update
once (default): means it will load after the page has loaded, but will not automatically update
>0: Any non-zero number specifies the number of seconds between each Ajax header request when polling for change.
0: Zero means stay up to date dynamically using Ajax content requests called by livelets.swf which is notified by livelets.pl when related article content changes.

The next stage is to get the PERL daemon loading and communicating with the SWF's and the PHP. Currently the SWF is loading into the page and is able to call a JS function that reads data from the SWF. Here's a quick rundown of the current state of the livelets components:

livelets.php - Mediawiki hooks, needs to have an optional method of ensuring daemon is running
livelets.js - Redundant, by changing to Mootools, we can use simple Javascript snippets defined in the extension globals
livelets.as - Routes incoming messages from server to livelet (only one message currently; update)
livelets.pl - maintains hash of clients, titles (in hierarchy of change) and sends update messages to affected clients.

Livelet Components

livelets.php

Ensure livelets.pl daemon is running
Add the #livelet parser function and internal hook
Add the livelet action to the UnknownAction hook
Add the AfterSaveComplete hook
Insert the SWF (compiled from livelets.as) into the page somewhere invisible but active
Add the livelets.js headscripts

Parser Function

Converts the syntax to tag-hook along with the parameters

Internal Tag Hook

There's no HTML content because the parser-function doesn't actually do any transclusion, the information that arrives here is the article name and parameters. This hook returns the JavaScript container which makes a separate request for the content when it runs in the browser. The request uses the live action and passes the parameters used in the original parser-function transclude statement.

see Rob's people-search.js for example of this kind of GET and insert method.

UnknownAction Hook

If an article is requested with action=livelet, then it will be returned without any HTML head, scripts, body or any skin components suitable for being inserted directly into its container by the client JS.

The JS must pass any parameters the transclusion statement originally exhibited, so rather than just returning the article, it would probably change the content AfterDatabaseFetch to a normal transclude with the parameters.

AfterSaveComplete Hook

The job of this hook is to send notification to all the SWF's which have live-containers that are out of date. To do this we need a persistent list of all the current SWF clients and their live article titles with Last Modified times. Each time a save occurs, we update the list.

Only the livelets.pl daemon has up-to-date information about which SWF's need which articles, so this save hook should first ask the daemon for its version of this list, and then return an updated one. The daemon can then notify each SWF with out of date content by sending a list of the container id's needing to request new content.

livelets.js

The JS can send HTTP requests itself with the XMLHTTPRequest object, but must rely on the SWF to call some kind of onData event handler function. Currently the onData function doesn't need any data or parameters, it's simply a notification that something on the wiki has changed and that the live containers should check if their content is up to date.

Note: It may be a good idea to get the SWF to do the send as well (or make it optional) because the XMLHTTP object seems quite browser dependent compared to Flash6.

After that's working, we'd want to get the livelets.pl to deliver a list of the containers which are known to have out of date content, so the Last Modified headers aren't even used.

livelets.as

Even the SWF XMLSocket is crippled in three main ways:

It can only recieve incoming data from servers in the same domain the SWF was served from
It can only use port number >1023
Most importantly - it is not a true listener, it can only listen on streams it has already established with the server. A server cannot spontaneously request a new connection from the SWF socket.

This third item is very important, because it means that a permenantly running server must be present, so we may need to include a simple server daemon using the code from server.pl.

Note1: If we use livelets.pl, then the SWF must be served by that, not by the wiki, because it can only recieve data from the same domain (which includes port number) as that which it was served from.

Note2: Although most PHP installations have socket support as standard now, we're still better using PERL because it still needs to be a separate port >1023 and run as a persistent daemon and we already have the code for this in PERL.

Note3: The third limitation listed above is irrelevent to this application because the SWF has to establish the stream to the server rather than vica versa since there can be many SWF's runningon the local client. They would all have to have separate ports which the server would have to be notified of anyway.

livelets.pl

NOTE: the server side will be handled by wikid.pl in the WikiDaemon extension

Although PHP has socket capability it seems that many users would not have the functionality easily available as it often requires PHP to be recompiled with the --enable-sockets switch. Also, the PHP server script would need to be set up to run as a daemon (why? because the SWF must establish a persistent connection with the server since it's not a true listener), and many PHP installs do not support command-line PHP by default. So it seems to me that a PERL server would be simplest to implement, especially since we already have working socket code in PERL, it's available by default on virtually all Linux's, and is easy to install on Win32 as well.

Rather than installing the daemon into init.d or as a service, the PHP script could check whether an instance is running and execute it if not so that the installation would then be no different than for most extensions, of putting the files into the extensions directory and ensuring they have the right permissions.

On linux-like OS's it will run as a daemon, but doesn't install itself into init.d for automatic startup, instead livelets.php launches it if it isn'r already running (it can send a simple ping command to the port to test if an instance is running).

The code should work on windows, but is being developed and tested for Linux first. The socket code itself has been tested in windows, and we have code to make it run as a windows service, but no doubt there will be trouble and dedicated windows development time will be required which is unlikely to occur for some time since we have little interest in supporting corporate closed-source solutions. Windows users need to have PERL installed with the IO::Socket module, we recommend the ActivePerl package.

Examples & Usage Ideas

Ultra changes:
Forms: Allowing forms to be submitted which can be posted to the server without reloading the page. Any items which may change are made into live templates.
Caching: Normal templates which are used across many articles like those which include category links would be far more efficient when wrapped inside a live template.
Collaboration: The edit form could be modified to post the form without reloading the page, and to have a live preview of the content which would make article editing much more chat-like.
Channels: having live content opens up the whole channel aspect...

WikiFS and Livelets

I haven't thought about the full implications yet of what WikiFS and Livelets could achieve when working together, but there are two important areas of crossover:

Socket server: It would seem obvious to drop livelets.pl and use nsd.c's socket server instead.
Client tree: The job of nsd.c is to maintain a persistent node space, part of which represents the runtime state of the MediaWiki interfaces and their trees. It would make sense therefore to also use the WikiFS client-side runtime tree method to contain the livelets trees as well.

This crossover may be significant enough to use nsd.c for the livelets extension right from the start instead of making a PERL script sepcifically for it.

Old notes

All these notes have been thought about because the browser seemed unable to cache the pages for itself, and on further inspection it was found that our wiki's weren't sending the Last Modified header, whereas Wikipedia does. I assumed this to be because of our heavy use of transclusion, parser functions and variables making our pages to dynamic to cache.

But it turns out that we weren't caches because our configuration was incorrect having a $wgCacheEpoch setting that shouldn't have been there meaning that although we were sending the private on not the no-cache headers, we were missing the critical Last Modified header. After fixing that our sites are able to cache properly, and after testing the HTTP headers returned on various content I found that MediaWiki already knows how to keep track of the proper last-modified time of articles involving recursive transclusion. Parser functions and variables are handled too because it's the individual functions job to mark the content as uncachable.

So we can use the original plan of simply ensuring that the tree content is obtained via a separate HTTP request and it won't have to load. It could even manually force the caching to be set when the noskin request is made, and that way the main content could still have caching turned off but leaving the tree caching on if we wanted.

iFrames have problems though when the size of the tree is large compared to the size of the article - it's not a very seemless integration method. I'm thinking of using the XMLHttpRequest object instead which can retrieve the content and directly insert it into the page with document.write. A noskin query string option can be used when requesting articles which returns only the HTML of the article content itself with no HTML head/body or skin components.

The idea of being able to make this functionality work similar to transclude as a parser function is still extremely useful though because since their content can load independently of the main article, it means they could contain content external to the wiki. Efficiency and load time can be increased massivley by always wrapping a cachable transclusion around all templates which are used in many articles like those which do categorisation, eg:

{{#cache:Template:ContactDetails}}

It also means that the content in each container could be dynamic and refresh periodically without the rest of the page having to reload - this could be used for forms as well.

The SWF part of it is still interesting too because we already have working code to allow the wiki to compile and serve SWF instances and use the peer to maintain persistent streams with all the instances so that the wiki can send messages to them. So we could very easily extend these new cachable transcludes to implement non-polling, event-driven dynamic areas of content, which would update almost as soon as they change.

It seems that there are is no listener functionality for the XMLHTTP JavaScript API, so a SWF would be needed to pass information between them.

Older notes

Speed problem

There are problems with the speed of page loading when they have a large tree in the navigation pane, this speed problem has been identified as mainly due to the time it takes to parse the large tree's wikitext into html, more than the time it takes to transfer the resulting html which is in the region of 100KB. But both the parsing and the data transfer are the main problems needing to be reduced.

General solution

Treeview4 is being designed to record revision id's of each tree which has to include changes in any transcluded trees too. When a page is requested only the revision id of each tree is sent to the client (if the tree has it's cache-key parameter set). The tree javascript will then request and store each tree's content locally if it doesn't already have the content associated with the current id.

Local Caching: Flash & JavaScript

There are a number of methods that the browser can use to store information locally, the general possibilities are descibed here and show that using the Flash6 shared object is probably most appropriate since cookies only allow 4K and the other options require different code for IE and Firefox.

Revision ID

Trees that need to using the local caching should be given a unique ID in their cache-key parameter which will be used to associate their current revision ID with. This data needs to be persistent across scripts and would probably use memcached if present, or a file if not.

The main complications with the process is that trees can be session-specific and contain transcluded content. This dynamic aspect can be overcome by using the changeable items such as {{CURRENTUSER}} as part of the cache-key parameter.

It makes sense to update the revision ID's on the SaveComplete event, but how do we know if and what trees the saved article affects?
We can't simply use the mTemplates property because we're caching just a fragment of the article... unless we make it that caching works more like a transclusion than a wrapper.
- But it would only do the transclusion if the cached content was nonexistent/invalid
- The parser-function would have to wrap the transcluded content in a private tag to get access to the final HTML
- At this final HTML stage it should know all the templates used (is that all the ones that were used recursively?)
The tag-hook unconditionally returns only the javascript local-cache-function, because even if HTML was just generated it belongs in the server cache to be requested by the SWF.
The SaveComplete hook must be unconditional because it's when we save an article that is included in a tree that we need to update the revision and invalidate the content.
So the memcache should contain a hash of all the articles and what they invalidate which is updated from the private tag-hook, but referred to by the SaveComplete hook.

Server-side Caching

Since we require perisitence for the revision ID's to work across requests, we can also store the HTML content of the tree along with the revision ID. This would mean that the rendering process would be required less often, especially if the tree is role-based and there are many users of the same role accessing the same tree. This would simply require checking if there's an entry for the tree's cache-key and if so, using that content instead of rebuilding it.

Dynamic update

The javascript could periodically check the revision ID's of the cache-containers on the page and request updated HTML for them when changed. This would allow the cached areas to also act as areas of dynamic wikitext content!

JavaScript/Flash6 Cache Container

It may be best to do the caching part as a separate component to the tree itself, since the two main complications of dynamic content and transcluded content are both solved by solutions not specific to the tree. We could create a second extension allowing any article content to be wrapped in cache tags which use the same cache-key methodology (the tags are private and placed there by a parser-function syntax so that we can use the mTemplates property to know what articles are transcluded within the cache container).

The containers will have their content replaced by a JavaScript function call which obtains the content from the SWF. If the SWF doesn't have it already it must ask the server for the content associated with that key, which it will have to generate if not present.

Why not just apply it to whole pages unconditionally?

I think you only get about 100K of local storage
It's main use is for caching the tree which is more static than the pages, but if the tree and the page content were under separate cache-keys this would mean instant loading of any unchanged previously seen page. This is supposed to already be the case with local browser caching, but it doesn't seem to work.

Try simple method first!

This whole thing may be able to use the browser cache, by simply using the key as a filename to request from the server.

@@ Line 5: / Line 5: @@
 Since we already have all the code written to handle sockets between PERL and SWF, which won't ever be used for anything now since our peer interface is now in C and SDL, it seemed like a good opportunity to get it working in the field.
+*See also [[:Category:Livelets]]
 == Development Plan ==
@@ Line 159: / Line 160: @@
 === Try simple method first! ===
 This whole thing may be able to use the browser cache, by simply using the key as a filename to request from the server.
+[[Category:Livelets]]