Difference between revisions of "MediaWiki 1.11 title extraction bug"

From Organic Design wiki
(Almost working)
(problem narrowed down futher to extractTitle function)
Line 3: Line 3:
  
 
== The cause ==
 
== The cause ==
There have been some significant changes to the way the article title is extracted from the request in version 1.11. A new method called ''interpolateTitle'' has been added to the ''$wgRequest'' singleton object which is defined in ''includes/WebRequest.php'' and is called from ''includes/Setup.php''.
+
There have been some significant changes to the way the article title is extracted from the request in version 1.11. A new method called ''interpolateTitle'' has been added to the ''$wgRequest'' singleton object which is defined in ''includes/WebRequest.php'' and is called from ''includes/Setup.php''. The problem has been isolated further into another new 1.11 method called ''extractTitle'' which is called from within the new ''interpolateTitle'' method and is shown below.
 
 
Here is the old 1.10 title extraction code which was handled directly in ''$wgRequest'''s constructor:
 
<php>
 
if ( $wgUsePathInfo ) {
 
if ( isset( $_SERVER['ORIG_PATH_INFO'] ) && $_SERVER['ORIG_PATH_INFO'] != '' ) {
 
# Mangled PATH_INFO
 
# http://bugs.php.net/bug.php?id=31892
 
# Also reported when ini_get('cgi.fix_pathinfo')==false
 
$_GET['title'] = $_REQUEST['title'] = substr( $_SERVER['ORIG_PATH_INFO'], 1 );
 
} elseif ( isset( $_SERVER['PATH_INFO'] ) && ($_SERVER['PATH_INFO'] != '') && $wgUsePathInfo ) {
 
$_GET['title'] = $_REQUEST['title'] = substr( $_SERVER['PATH_INFO'], 1 );
 
}
 
}
 
</php>
 
And here is the new ''interpolateTitle'' method which gets called from ''includes/Setup.php''. All title extraction code has been removed from the ''$wgRequest'' constructor and replaced with this new method.
 
 
<php>
 
<php>
 
/**
 
/**
  * Check for title, action, and/or variant data in the URL
+
  * Internal URL rewriting function; tries to extract page title and,
  * and interpolate it into the GET variables.
+
* optionally, one other fixed parameter value from a URL path.
  * This should only be run after $wgContLang is available,
+
*
  * as we may need the list of language variants to determine
+
* @param string $path the URL path given from the client
  * available variant URLs.
+
  * @param array $bases one or more URLs, optionally with $1 at the end
 +
  * @param string $key if provided, the matching key in $bases will be
 +
  *       passed on as the value of this URL parameter
 +
  * @return array of URL variables to interpolate; empty if no match
 
  */
 
  */
function interpolateTitle() {
+
private function extractTitle( $path, $bases, $key=false ) {
global $wgUsePathInfo;
+
foreach( (array)$bases as $keyValue => $base ) {
if ( $wgUsePathInfo ) {
+
// Find the part after $wgArticlePath
// PATH_INFO is mangled due to http://bugs.php.net/bug.php?id=31892
+
$base = str_replace( '$1', '', $base );
// And also by Apache 2.x, double slashes are converted to single slashes.
+
$baseLen = strlen( $base );
// So we will use REQUEST_URI if possible.
+
if( substr( $path, 0, $baseLen ) == $base ) {
$matches = array();
+
$raw = substr( $path, $baseLen );
if ( !empty( $_SERVER['REQUEST_URI'] ) ) {
+
if( $raw !== '' ) {
// Slurp out the path portion to examine...
+
$matches = array( 'title' => rawurldecode( $raw ) );
$url = $_SERVER['REQUEST_URI'];
+
if( $key ) {
if ( !preg_match( '!^https?://!', $url ) ) {
+
$matches[$key] = $keyValue;
$url = 'http://unused' . $url;
 
}
 
$a = parse_url( $url );
 
if( $a ) {
 
$path = $a['path'];
 
 
global $wgArticlePath;
 
$matches = $this->extractTitle( $path, $wgArticlePath );
 
 
global $wgActionPaths;
 
if( !$matches && $wgActionPaths) {
 
$matches = $this->extractTitle( $path, $wgActionPaths, 'action' );
 
}
 
 
global $wgVariantArticlePath, $wgContLang;
 
if( !$matches && $wgVariantArticlePath ) {
 
$variantPaths = array();
 
foreach( $wgContLang->getVariants() as $variant ) {
 
$variantPaths[$variant] =
 
str_replace( '$2', $variant, $wgVariantArticlePath );
 
}
 
$matches = $this->extractTitle( $path, $variantPaths, 'variant' );
 
 
}
 
}
 +
return $matches;
 
}
 
}
} elseif ( isset( $_SERVER['ORIG_PATH_INFO'] ) && $_SERVER['ORIG_PATH_INFO'] != '' ) {
 
// Mangled PATH_INFO
 
// http://bugs.php.net/bug.php?id=31892
 
// Also reported when ini_get('cgi.fix_pathinfo')==false
 
$matches['title'] = substr( $_SERVER['ORIG_PATH_INFO'], 1 );
 
 
} elseif ( isset( $_SERVER['PATH_INFO'] ) && ($_SERVER['PATH_INFO'] != '') ) {
 
// Regular old PATH_INFO yay
 
$matches['title'] = substr( $_SERVER['PATH_INFO'], 1 );
 
}
 
foreach( $matches as $key => $val) {
 
$_GET[$key] = $_REQUEST[$key] = $val;
 
 
}
 
}
 
}
 
}
 +
return array();
 
}
 
}
 
</php>
 
</php>

Revision as of 05:51, 23 September 2007

The problem

There has been trouble upgrading to MediaWiki 1.11 and this article has been set up to document my investigation in to the problem. The symptom is that if $wgArticlePath is set to "/$1" then any long-form URL requests using title as a query-string parameter will fail and be redirected to a non-existent article called Wiki/index.php, i.e. it's treating the long-form URL as a friendly URL.

The cause

There have been some significant changes to the way the article title is extracted from the request in version 1.11. A new method called interpolateTitle has been added to the $wgRequest singleton object which is defined in includes/WebRequest.php and is called from includes/Setup.php. The problem has been isolated further into another new 1.11 method called extractTitle which is called from within the new interpolateTitle method and is shown below. <php> /**

* Internal URL rewriting function; tries to extract page title and,
* optionally, one other fixed parameter value from a URL path.
*
* @param string $path the URL path given from the client
* @param array $bases one or more URLs, optionally with $1 at the end
* @param string $key if provided, the matching key in $bases will be
*        passed on as the value of this URL parameter
* @return array of URL variables to interpolate; empty if no match
*/

private function extractTitle( $path, $bases, $key=false ) { foreach( (array)$bases as $keyValue => $base ) { // Find the part after $wgArticlePath $base = str_replace( '$1', , $base ); $baseLen = strlen( $base ); if( substr( $path, 0, $baseLen ) == $base ) { $raw = substr( $path, $baseLen ); if( $raw !== ) { $matches = array( 'title' => rawurldecode( $raw ) ); if( $key ) { $matches[$key] = $keyValue; } return $matches; } } } return array(); } </php>

Solution

One thing the new 1.11 code shows is that the problem can only occur when the $wgUsePathInfo global is set to true, so I've set that to false and put the $wgArticlePath back to "/$1" which has got our friendly URL's working again. The one drawback of this is that without $wgUsePathInfo, our rewrite rule must translate all friendly requests to the full long-form query-string which means that un-encoded ampersands are translated as query-string separators and cannot be used in article titles, but at least our friendly URL's are working again.