Difference between revisions of "Livelets"

From Organic Design wiki
(starting notes on WikiFS / Livelets crossover)
m
Line 82: Line 82:
 
*'''Client tree:''' The job of [[nsd.c]] is to maintain a persistent [[node space]], part of which represents the runtime state of the MediaWiki interfaces and their trees. It would make sense therefore to also use the WikiFS client-side runtime tree method to contain the livelets trees as well.
 
*'''Client tree:''' The job of [[nsd.c]] is to maintain a persistent [[node space]], part of which represents the runtime state of the MediaWiki interfaces and their trees. It would make sense therefore to also use the WikiFS client-side runtime tree method to contain the livelets trees as well.
 
This crossover may be significant enough to use [[nsd.c]] for the livelets extension right from the start instead of making a PERL script sepcifically for it.
 
This crossover may be significant enough to use [[nsd.c]] for the livelets extension right from the start instead of making a PERL script sepcifically for it.
 +
 +
----
 +
= Old notes =
 +
__NOTOC__
 +
All these notes have been thought about because the browser seemed unable to cache the pages for itself, and on further inspection it was found that our wiki's weren't sending the ''Last Modified'' header, whereas Wikipedia does. I assumed this to be because of our heavy use of transclusion, parser functions and variables making our pages to dynamic to cache.
 +
 +
But it turns out that we weren't caches because our configuration was incorrect having a ''$wgCacheEpoch'' setting that shouldn't have been there meaning that although we were sending the ''private'' on not the ''no-cache'' headers, we were missing the critical ''Last Modified'' header. After fixing that our sites are able to cache properly, and after testing the HTTP headers returned on various content I found that MediaWiki already knows how to keep track of the proper ''last-modified'' time of articles involving recursive transclusion. Parser functions and variables are handled too because it's the individual functions job to mark the content as uncachable.
 +
 +
So we can use the original plan of simply ensuring that the tree content is obtained via a separate HTTP request and it won't have to load. It could even manually force the caching to be set when the ''noskin'' request is made, and that way the main content could still have caching turned off but leaving the tree caching on if we wanted.
 +
 +
iFrames have problems though when the size of the tree is large compared to the size of the article - it's not a very seemless integration method. I'm thinking of using the [[w:XMLHttpRequest|XMLHttpRequest]] object instead which can retrieve the content and directly insert it into the page with ''document.write''. A ''noskin'' query string option can be used when requesting articles which returns only the HTML of the article content itself with no HTML head/body or skin components.
 +
 +
The idea of being able to make this functionality work similar to transclude as a parser function is still extremely useful though because since their content can load independently of the main article, it means they could contain content external to the wiki. Efficiency and load time can be increased massivley by always wrapping a cachable transclusion around all templates which are used in many articles like those which do categorisation, eg:
 +
:<tt>{</tt><tt>{#cache:{{ContactDetails}}}}</tt>
 +
 +
It also means that the content in each container could be dynamic and refresh periodically without the rest of the page having to reload - this could be used for forms as well.
 +
 +
The SWF part of it is still interesting too because we already have working code to allow the wiki to compile and serve SWF instances and use the peer to maintain persistent streams with all the instances so that the wiki can send messages to them. So we could very easily extend these new cachable transcludes to implement non-polling, event-driven dynamic areas of content, which would update almost as soon as they change.
 +
 +
It seems that there are is no listener functionality for the ''XMLHTTP'' JavaScript API, so a SWF would be needed to pass information between them.
 +
 +
 +
 +
 +
----
 +
= Old notes =
 +
== Speed problem ==
 +
There are problems with the speed of page loading when they have a large tree in the navigation pane, this speed problem has been identified as mainly due to the time it takes to parse the large tree's wikitext into html, more than the time it takes to transfer the resulting html which is in the region of 100KB. But both the parsing and the data transfer are the main problems needing to be reduced.
 +
 +
== General solution ==
 +
Treeview4 is being designed to record revision id's of each tree which has to include changes in any transcluded trees too. When a page is requested only the revision id of each tree is sent to the client (if the tree has it's ''cache-key'' parameter set). The tree javascript will then request and store each tree's content locally if it doesn't already have the content associated with the current id.
 +
 +
== Local Caching: Flash & JavaScript ==
 +
There are a number of methods that the browser can use to store information locally, the general possibilities are descibed [http://www.niallkennedy.com/blog/archives/2007/01/ajax-performance-local-storage.html here] and show that using the Flash6 ''shared object'' is probably most appropriate since cookies only allow 4K and the other options require different code for IE and Firefox.
 +
*[http://www.adobe.com/products/flashplayer/articles/lso Adobe - what are local shared objects?]
 +
*[http://www.permadi.com/tutorial/flashSharedObject/index.html Permadi flash tips]
 +
 +
*[http://www.adobe.com/cfusion/knowledgebase/index.cfm?id=tn_15683 Adobe tech note]
 +
*[http://weblogs.macromedia.com/flashjavascript Flash JavaScript Integration Kit]
 +
*[http://www.permadi.com/tutorial/flashjscommand Permadi example]
 +
 +
== Revision ID ==
 +
Trees that need to using the local caching should be given a unique ID in their ''cache-key'' parameter which will be used to associate their current revision ID with. This data needs to be persistent across scripts and would probably use ''memcached'' if present, or a file if not.
 +
 +
The main complications with the process is that trees can be session-specific and contain transcluded content. This dynamic aspect can be overcome by using the changeable items such as <tt>{</tt><tt>{CURRENTUSER}}</tt> as part of the ''cache-key'' parameter.
 +
 +
*It makes sense to update the revision ID's on the ''SaveComplete'' event, but how do we know if and what trees the saved article affects?
 +
*We can't simply use the ''mTemplates'' property because we're caching just a fragment of the article... unless we make it that caching works more like a transclusion than a wrapper.
 +
**But it would only do the transclusion if the cached content was nonexistent/invalid
 +
**The parser-function would have to wrap the transcluded content in a private tag to get access to the final HTML
 +
**At this final HTML stage it should know all the templates used (is that all the ones that were used recursively?)
 +
*The tag-hook unconditionally returns only the javascript local-cache-function, because even if HTML was just generated it belongs in the server cache to be requested by the SWF.
 +
*The ''SaveComplete'' hook must be unconditional because it's when we save an article that is included in a tree that we need to update the revision and invalidate the content.
 +
*So the memcache should contain a hash of all the articles and what they invalidate which is updated from the private tag-hook, but referred to by the ''SaveComplete'' hook.
 +
 +
== Server-side Caching ==
 +
Since we require perisitence for the revision ID's to work across requests, we can also store the HTML content of the tree along with the revision ID. This would mean that the rendering process would be required less often, especially if the tree is role-based and there are many users of the same role accessing the same tree. This would simply require checking if there's an entry for the tree's ''cache-key'' and if so, using that content instead of rebuilding it.
 +
 +
== Dynamic update ==
 +
The javascript could periodically check the revision ID's of the cache-containers on the page and request updated HTML for them when changed. This would allow the cached areas to also act as areas of dynamic wikitext content!
 +
 +
== JavaScript/Flash6 Cache Container ==
 +
It may be best to do the caching part as a separate component to the tree itself, since the two main complications of dynamic content and transcluded content are both solved by solutions not specific to the tree. We could create a second extension allowing any article content to be wrapped in cache tags which use the same cache-key methodology (the tags are private and placed there by a parser-function syntax so that we can use the ''mTemplates'' property to know what articles are transcluded within the cache container).
 +
 +
The containers will have their content replaced by a JavaScript function call which obtains the content from the SWF. If the SWF doesn't have it already it must ask the server for the content associated with that key, which it will have to generate if not present.
 +
 +
== Why not just apply it to whole pages unconditionally? ==
 +
*I think you only get about 100K of local storage
 +
*It's main use is for caching the tree which is more static than the pages, but if the tree and the page content were under separate cache-keys this would mean instant loading of any unchanged previously seen page. This is supposed to already be the case with local browser caching, but it doesn't seem to work.
 +
 +
== Try simple method first! ==
 +
This whole thing may be able to use the browser cache, by simply using the key as a filename to request from the server.

Revision as of 03:37, 4 April 2007


What is it?

The idea of this extension is to allow live articles to be transcluded which update automatically on change in a non-polling, fully event-driven way. It will use the parser-functions syntax to make it very similar in usage to normal templates. It's called a livelet since it's very similar in concept to the re-usable areas of options and information in a web page usually referred to as a portlet, except that these are able to accept spontaneous incoming requests, not just to responses to their own requests, which means that livelets are able to stay up to date or be communicated with dynamically without any regular polling being necessary.

It's being partially paid for by client needs, but will of course be free and LGPL - I say partially paid for because it's being done in a far more powerful and generic way than the client requires - in fact they don't even need the live aspect, it's only the division of the page into separate requests to make local caching more efficient that they really need!

But since we already have all the code written to handle sockets between PERL and SWF, which won't ever be used for anything now since our peer interface is now in C and SDL, it seemed like a good opportunity to get it working in the field.

Development Plan

The highest priority aspect is the new transclusion method allowing the contents to be a separate request so that it can be independently cached by the browser. So with that in mind, the order of development tasks is as follows:

  • livelets.php - parser function and tag hook returning java XMLHTTP requestor, livelet action
  • livelets.js - the function that does the request and update of content
  • livelets.as - set up XMLSocket object in JS using SWF (just use to replace XMLHTTP requests first)
  • livelets.pl - spawn a thread for each SWF, and propagate changes

Livelet Components

livelets.php

  • Ensure livelets.pl daemon is running
  • Add the #livelet parser function and internal hook
  • Add the livelet action to the UnknownAction hook
  • Add the AfterSaveComplete hook
  • Insert the SWF (compiled from livelets.as) into the page somewhere invisible but active
  • Add the livelets.js headscripts

Parser Function

Converts the syntax to tag-hook along with the parameters

Internal Tag Hook

There's no HTML content because the parser-function doesn't actually do any transclusion, the information that arrives here is the article name and parameters. This hook returns the JavaScript container which makes a separate request for the content when it runs in the browser. The request uses the live action and passes the parameters used in the original parser-function transclude statement.

UnknownAction Hook

If an article is requested with action=livelet, then it will be returned without any HTML head, scripts, body or any skin components suitable for being inserted directly into its container by the client JS.

The JS must pass any parameters the transclusion statement originally exhibited, so rather than just returning the article, it would probably change the content AfterDatabaseFetch to a normal transclude with the parameters.

AfterSaveComplete Hook

The job of this hook is to send notification to all the SWF's which have live-containers that are out of date. To do this we need a persistent list of all the current SWF clients and their live article titles with Last Modified times. Each time a save occurs, we update the list.

Only the livelets.pl daemon has up-to-date information about which SWF's need which articles, so this save hook should first ask the daemon for its version of this list, and then return an updated one. The daemon can then notify each SWF with out of date content by sending a list of the container id's needing to request new content.

livelets.js

The JS can send HTTP requests itself with the XMLHTTPRequest object, but must rely on the SWF to call some kind of onData event handler function. Currently the onData function doesn't need any data or parameters, it's simply a notification that something on the wiki has changed and that the live containers should check if their content is up to date.

Note: It may be a good idea to get the SWF to do the send as well (or make it optional) because the XMLHTTP object seems quite browser dependent compared to Flash6.

After that's working, we'd want to get the livelets.pl to deliver a list of the containers which are known to have out of date content, so the Last Modified headers aren't even used.

livelets.as

Even the SWF XMLSocket is crippled in three main ways:

  • It can only recieve incoming data from servers in the same domain the SWF was served from
  • It can only use port number >1023
  • Most importantly - it is not a true listener, it can only listen on streams it has already established with the server. A server cannot spontaneously request a new connection from the SWF socket.

This third item is very important, because it means that a permenantly running server must be present, so we may need to include a simple server daemon using the code from server.pl.

Note1: If we use livelets.pl, then the SWF must be served by that, not by the wiki, because it can only recieve data from the same domain (which includes port number) as that which it was served from.

Note2: Although most PHP installations have socket support as standard now, we're still better using PERL because it still needs to be a separate port >1023 and run as a persistent daemon and we already have the code for this in PERL.

Note3: The third limitation listed above is irrelevent to this application because the SWF has to establish the stream to the server rather than vica versa since there can be many SWF's runningon the local client. They would all have to have separate ports which the server would have to be notified of anyway.

livelets.pl

Although PHP has socket capability it seems that many users would not have the functionality easily available as it often requires PHP to be recompiled with the --enable-sockets switch. Also, the PHP server script would need to be set up to run as a daemon, and many PHP installs do not support command-line PHP by default. So it seems to me that a PERL server would be simplest to implement, especially since we already have working socket code in PERL, it's available by default on virtually all Linux's, and is easy to install on Win32 as well.

Rather than installing the daemon into init.d or as a service, the PHP script could check whether an instance is running and execute it if not so that the installation would then be no different than for most extensions, of putting the files into the extensions directory and ensuring they have the right permissions.

On linux-like OS's it will run as a daemon, but doesn't install itself into init.d for automatic startup, instead livelets.php launches it if it isn'r already running (it can send a simple ping command to the port to test if an instance is running).

The code should work on windows, but is being developed and tested for Linux first. The socket code itself has been tested in windows, and we have code to make it run as a windows service, but no doubt there will be trouble and dedicated windows development time will be required which is unlikely to occur for some time since we have little interest in supporting corporate closed-source solutions. Windows users need to have PERL installed with the IO::Socket module, we recommend the ActivePerl package.

Examples & Usage Ideas

  • Ultra changes:
  • Forms: Allowing forms to be submitted which can be posted to the server without reloading the page. Any items which may change are made into live templates.
  • Caching: Normal templates which are used across many articles like those which include category links would be far more efficient when wrapped inside a live template.
  • Collaboration: The edit form could be modified to post the form without reloading the page, and to have a live preview of the content which would make article editing much more chat-like.
  • Channels: having live content opens up the whole channel aspect...

WikiFS and Livelets

I haven't thought about the full implications yet of what WikiFS and Livelets could achieve when working together, but there are two important areas of crossover:

  • Socket server: It would seem obvious to drop livelets.pl and use nsd.c's socket server instead.
  • Client tree: The job of nsd.c is to maintain a persistent node space, part of which represents the runtime state of the MediaWiki interfaces and their trees. It would make sense therefore to also use the WikiFS client-side runtime tree method to contain the livelets trees as well.

This crossover may be significant enough to use nsd.c for the livelets extension right from the start instead of making a PERL script sepcifically for it.


Old notes

All these notes have been thought about because the browser seemed unable to cache the pages for itself, and on further inspection it was found that our wiki's weren't sending the Last Modified header, whereas Wikipedia does. I assumed this to be because of our heavy use of transclusion, parser functions and variables making our pages to dynamic to cache.

But it turns out that we weren't caches because our configuration was incorrect having a $wgCacheEpoch setting that shouldn't have been there meaning that although we were sending the private on not the no-cache headers, we were missing the critical Last Modified header. After fixing that our sites are able to cache properly, and after testing the HTTP headers returned on various content I found that MediaWiki already knows how to keep track of the proper last-modified time of articles involving recursive transclusion. Parser functions and variables are handled too because it's the individual functions job to mark the content as uncachable.

So we can use the original plan of simply ensuring that the tree content is obtained via a separate HTTP request and it won't have to load. It could even manually force the caching to be set when the noskin request is made, and that way the main content could still have caching turned off but leaving the tree caching on if we wanted.

iFrames have problems though when the size of the tree is large compared to the size of the article - it's not a very seemless integration method. I'm thinking of using the XMLHttpRequest object instead which can retrieve the content and directly insert it into the page with document.write. A noskin query string option can be used when requesting articles which returns only the HTML of the article content itself with no HTML head/body or skin components.

The idea of being able to make this functionality work similar to transclude as a parser function is still extremely useful though because since their content can load independently of the main article, it means they could contain content external to the wiki. Efficiency and load time can be increased massivley by always wrapping a cachable transclusion around all templates which are used in many articles like those which do categorisation, eg:

{{#cache:Template:ContactDetails}}

It also means that the content in each container could be dynamic and refresh periodically without the rest of the page having to reload - this could be used for forms as well.

The SWF part of it is still interesting too because we already have working code to allow the wiki to compile and serve SWF instances and use the peer to maintain persistent streams with all the instances so that the wiki can send messages to them. So we could very easily extend these new cachable transcludes to implement non-polling, event-driven dynamic areas of content, which would update almost as soon as they change.

It seems that there are is no listener functionality for the XMLHTTP JavaScript API, so a SWF would be needed to pass information between them.




Old notes

Speed problem

There are problems with the speed of page loading when they have a large tree in the navigation pane, this speed problem has been identified as mainly due to the time it takes to parse the large tree's wikitext into html, more than the time it takes to transfer the resulting html which is in the region of 100KB. But both the parsing and the data transfer are the main problems needing to be reduced.

General solution

Treeview4 is being designed to record revision id's of each tree which has to include changes in any transcluded trees too. When a page is requested only the revision id of each tree is sent to the client (if the tree has it's cache-key parameter set). The tree javascript will then request and store each tree's content locally if it doesn't already have the content associated with the current id.

Local Caching: Flash & JavaScript

There are a number of methods that the browser can use to store information locally, the general possibilities are descibed here and show that using the Flash6 shared object is probably most appropriate since cookies only allow 4K and the other options require different code for IE and Firefox.

Revision ID

Trees that need to using the local caching should be given a unique ID in their cache-key parameter which will be used to associate their current revision ID with. This data needs to be persistent across scripts and would probably use memcached if present, or a file if not.

The main complications with the process is that trees can be session-specific and contain transcluded content. This dynamic aspect can be overcome by using the changeable items such as {{CURRENTUSER}} as part of the cache-key parameter.

  • It makes sense to update the revision ID's on the SaveComplete event, but how do we know if and what trees the saved article affects?
  • We can't simply use the mTemplates property because we're caching just a fragment of the article... unless we make it that caching works more like a transclusion than a wrapper.
    • But it would only do the transclusion if the cached content was nonexistent/invalid
    • The parser-function would have to wrap the transcluded content in a private tag to get access to the final HTML
    • At this final HTML stage it should know all the templates used (is that all the ones that were used recursively?)
  • The tag-hook unconditionally returns only the javascript local-cache-function, because even if HTML was just generated it belongs in the server cache to be requested by the SWF.
  • The SaveComplete hook must be unconditional because it's when we save an article that is included in a tree that we need to update the revision and invalidate the content.
  • So the memcache should contain a hash of all the articles and what they invalidate which is updated from the private tag-hook, but referred to by the SaveComplete hook.

Server-side Caching

Since we require perisitence for the revision ID's to work across requests, we can also store the HTML content of the tree along with the revision ID. This would mean that the rendering process would be required less often, especially if the tree is role-based and there are many users of the same role accessing the same tree. This would simply require checking if there's an entry for the tree's cache-key and if so, using that content instead of rebuilding it.

Dynamic update

The javascript could periodically check the revision ID's of the cache-containers on the page and request updated HTML for them when changed. This would allow the cached areas to also act as areas of dynamic wikitext content!

JavaScript/Flash6 Cache Container

It may be best to do the caching part as a separate component to the tree itself, since the two main complications of dynamic content and transcluded content are both solved by solutions not specific to the tree. We could create a second extension allowing any article content to be wrapped in cache tags which use the same cache-key methodology (the tags are private and placed there by a parser-function syntax so that we can use the mTemplates property to know what articles are transcluded within the cache container).

The containers will have their content replaced by a JavaScript function call which obtains the content from the SWF. If the SWF doesn't have it already it must ask the server for the content associated with that key, which it will have to generate if not present.

Why not just apply it to whole pages unconditionally?

  • I think you only get about 100K of local storage
  • It's main use is for caching the tree which is more static than the pages, but if the tree and the page content were under separate cache-keys this would mean instant loading of any unchanged previously seen page. This is supposed to already be the case with local browser caching, but it doesn't seem to work.

Try simple method first!

This whole thing may be able to use the browser cache, by simply using the key as a filename to request from the server.