MediaWiki 1.11 title extraction bug

From Organic Design wiki
Revision as of 04:45, 23 September 2007 by Nad (talk | contribs) (Almost working)

The problem

There has been trouble upgrading to MediaWiki 1.11 and this article has been set up to document my investigation in to the problem. The symptom is that if $wgArticlePath is set to "/$1" then any long-form URL requests using title as a query-string parameter will fail and be redirected to a non-existent article called Wiki/index.php, i.e. it's treating the long-form URL as a friendly URL.

The cause

There have been some significant changes to the way the article title is extracted from the request in version 1.11. A new method called interpolateTitle has been added to the $wgRequest singleton object which is defined in includes/WebRequest.php and is called from includes/Setup.php.

Here is the old 1.10 title extraction code which was handled directly in $wgRequest's constructor: <php> if ( $wgUsePathInfo ) { if ( isset( $_SERVER['ORIG_PATH_INFO'] ) && $_SERVER['ORIG_PATH_INFO'] != ) { # Mangled PATH_INFO # http://bugs.php.net/bug.php?id=31892 # Also reported when ini_get('cgi.fix_pathinfo')==false $_GET['title'] = $_REQUEST['title'] = substr( $_SERVER['ORIG_PATH_INFO'], 1 ); } elseif ( isset( $_SERVER['PATH_INFO'] ) && ($_SERVER['PATH_INFO'] != ) && $wgUsePathInfo ) { $_GET['title'] = $_REQUEST['title'] = substr( $_SERVER['PATH_INFO'], 1 ); } } </php> And here is the new interpolateTitle method which gets called from includes/Setup.php. All title extraction code has been removed from the $wgRequest constructor and replaced with this new method. <php> /**

* Check for title, action, and/or variant data in the URL
* and interpolate it into the GET variables.
* This should only be run after $wgContLang is available,
* as we may need the list of language variants to determine
* available variant URLs.
*/

function interpolateTitle() { global $wgUsePathInfo; if ( $wgUsePathInfo ) { // PATH_INFO is mangled due to http://bugs.php.net/bug.php?id=31892 // And also by Apache 2.x, double slashes are converted to single slashes. // So we will use REQUEST_URI if possible. $matches = array(); if ( !empty( $_SERVER['REQUEST_URI'] ) ) { // Slurp out the path portion to examine... $url = $_SERVER['REQUEST_URI']; if ( !preg_match( '!^https?://!', $url ) ) { $url = 'http://unused' . $url; } $a = parse_url( $url ); if( $a ) { $path = $a['path'];

global $wgArticlePath; $matches = $this->extractTitle( $path, $wgArticlePath );

global $wgActionPaths; if( !$matches && $wgActionPaths) { $matches = $this->extractTitle( $path, $wgActionPaths, 'action' ); }

global $wgVariantArticlePath, $wgContLang; if( !$matches && $wgVariantArticlePath ) { $variantPaths = array(); foreach( $wgContLang->getVariants() as $variant ) { $variantPaths[$variant] = str_replace( '$2', $variant, $wgVariantArticlePath ); } $matches = $this->extractTitle( $path, $variantPaths, 'variant' ); } } } elseif ( isset( $_SERVER['ORIG_PATH_INFO'] ) && $_SERVER['ORIG_PATH_INFO'] != ) { // Mangled PATH_INFO // http://bugs.php.net/bug.php?id=31892 // Also reported when ini_get('cgi.fix_pathinfo')==false $matches['title'] = substr( $_SERVER['ORIG_PATH_INFO'], 1 );

} elseif ( isset( $_SERVER['PATH_INFO'] ) && ($_SERVER['PATH_INFO'] != ) ) { // Regular old PATH_INFO yay $matches['title'] = substr( $_SERVER['PATH_INFO'], 1 ); } foreach( $matches as $key => $val) { $_GET[$key] = $_REQUEST[$key] = $val; } } } </php>

Solution

One thing the new 1.11 code shows is that the problem can only occur when the $wgUsePathInfo global is set to true, so I've set that to false and put the $wgArticlePath back to "/$1" which has got our friendly URL's working again. The one drawback of this is that without $wgUsePathInfo, our rewrite rule must translate all friendly requests to the full long-form query-string which means that un-encoded ampersands are translated as query-string separators and cannot be used in article titles, but at least our friendly URL's are working again.