Difference between revisions of "MediaWiki 1.11 title extraction bug"

From Organic Design wiki
m (legacy)
(Change source-code blocks to standard format)
 
Line 5: Line 5:
 
== The cause ==
 
== The cause ==
 
There have been some significant changes to the way the article title is extracted from the request in version 1.11. A new method called ''interpolateTitle'' has been added to the ''$wgRequest'' singleton object which is defined in ''includes/WebRequest.php'' and is called from ''includes/Setup.php''. The problem has been isolated further into another new 1.11 method called ''extractTitle'' which is called from within the new ''interpolateTitle'' method and is shown below. This function returns an array of the key/value pairs which are then written back into ''$_GET'' and ''$_REQUEST'' so that they appear in the environment as if from a normal long-form URL request. The problem is that when ''$wgUsePathInfo'' is set to ''true'' (and we do want it to be true so that we can use un-encoded ampersands and question-marks in article titles) this function does not return the correct value for the ''title'' key in the returned array when ''title'' is a query-string item. I think it should be returning an empty array, or not getting called at all in the case of long-form URL's.
 
There have been some significant changes to the way the article title is extracted from the request in version 1.11. A new method called ''interpolateTitle'' has been added to the ''$wgRequest'' singleton object which is defined in ''includes/WebRequest.php'' and is called from ''includes/Setup.php''. The problem has been isolated further into another new 1.11 method called ''extractTitle'' which is called from within the new ''interpolateTitle'' method and is shown below. This function returns an array of the key/value pairs which are then written back into ''$_GET'' and ''$_REQUEST'' so that they appear in the environment as if from a normal long-form URL request. The problem is that when ''$wgUsePathInfo'' is set to ''true'' (and we do want it to be true so that we can use un-encoded ampersands and question-marks in article titles) this function does not return the correct value for the ''title'' key in the returned array when ''title'' is a query-string item. I think it should be returning an empty array, or not getting called at all in the case of long-form URL's.
{{code|<php>
+
<source lang="php">
 
/**
 
/**
 
  * Internal URL rewriting function; tries to extract page title and,
 
  * Internal URL rewriting function; tries to extract page title and,
Line 34: Line 34:
 
return array();
 
return array();
 
}
 
}
</php>}}
+
</source>
  
 
== WebRequest Patch ==
 
== WebRequest Patch ==
Line 40: Line 40:
  
 
I'm not sure what they're trying to do with the new function, so until they come up with a proper solution, I've just replaced the ''WebRequest'' constructor method with the one from 1.10, and made the ''interpolateTitle'' method return without doing anything. This allows the ''$wgUsePathInfo'' to work for ''/wiki/index.php/foo'' style requests (but incidentally, %, # and ? don't seem to have been working in any of our wikia for some time now!). Here's a snippet of the ''includes/WebRequest.php'' file with the patch applied:
 
I'm not sure what they're trying to do with the new function, so until they come up with a proper solution, I've just replaced the ''WebRequest'' constructor method with the one from 1.10, and made the ''interpolateTitle'' method return without doing anything. This allows the ''$wgUsePathInfo'' to work for ''/wiki/index.php/foo'' style requests (but incidentally, %, # and ? don't seem to have been working in any of our wikia for some time now!). Here's a snippet of the ''includes/WebRequest.php'' file with the patch applied:
{{code|<php>
+
<source lang="php">
 
class WebRequest {
 
class WebRequest {
 
function __construct() {
 
function __construct() {
Line 68: Line 68:
 
return; # add this to disable the new 1.11 title extraction functionality
 
return; # add this to disable the new 1.11 title extraction functionality
 
global $wgUsePathInfo;
 
global $wgUsePathInfo;
</php>}}
+
</source>
 
[[Category:MediaWiki]]
 
[[Category:MediaWiki]]

Latest revision as of 18:10, 22 May 2015

Legacy.svg Legacy: This article describes a concept that has been superseded in the course of ongoing development on the Organic Design wiki. Please do not develop this any further or base work on this concept, this is only useful for a historic record of work done. You may find a link to the currently used concept or function in this article, if not you can contact the author to find out what has taken the place of this legacy item.

The problem

There has been trouble upgrading to MediaWiki 1.11 and this article has been set up to document my investigation in to the problem. The symptom is that if $wgArticlePath is set to "/$1" then any long-form URL requests using title as a query-string parameter will fail and be redirected to a non-existent article called Wiki/index.php, i.e. it's treating the long-form URL as a friendly URL. (Bug 11428)

The cause

There have been some significant changes to the way the article title is extracted from the request in version 1.11. A new method called interpolateTitle has been added to the $wgRequest singleton object which is defined in includes/WebRequest.php and is called from includes/Setup.php. The problem has been isolated further into another new 1.11 method called extractTitle which is called from within the new interpolateTitle method and is shown below. This function returns an array of the key/value pairs which are then written back into $_GET and $_REQUEST so that they appear in the environment as if from a normal long-form URL request. The problem is that when $wgUsePathInfo is set to true (and we do want it to be true so that we can use un-encoded ampersands and question-marks in article titles) this function does not return the correct value for the title key in the returned array when title is a query-string item. I think it should be returning an empty array, or not getting called at all in the case of long-form URL's.

/**
 * Internal URL rewriting function; tries to extract page title and,
 * optionally, one other fixed parameter value from a URL path.
 *
 * @param string $path the URL path given from the client
 * @param array $bases one or more URLs, optionally with $1 at the end
 * @param string $key if provided, the matching key in $bases will be
 *        passed on as the value of this URL parameter
 * @return array of URL variables to interpolate; empty if no match
 */
private function extractTitle( $path, $bases, $key=false ) {
	foreach( (array)$bases as $keyValue => $base ) {
		// Find the part after $wgArticlePath
		$base = str_replace( '$1', '', $base );
		$baseLen = strlen( $base );
		if( substr( $path, 0, $baseLen ) == $base ) {
			$raw = substr( $path, $baseLen );
			if( $raw !== '' ) {
				$matches = array( 'title' => rawurldecode( $raw ) );
				if( $key ) {
					$matches[$key] = $keyValue;
				}
				return $matches;
			}
		}
	}
	return array();
}

WebRequest Patch

One thing the new 1.11 code shows is that the problem can only occur when the $wgUsePathInfo global is set to true, but setting this to false means that the mod-rewrite rules must translate all friendly requests to the full long-form query-string which means that un-encoded ampersands are translated as query-string separators and cannot be used in article titles.

I'm not sure what they're trying to do with the new function, so until they come up with a proper solution, I've just replaced the WebRequest constructor method with the one from 1.10, and made the interpolateTitle method return without doing anything. This allows the $wgUsePathInfo to work for /wiki/index.php/foo style requests (but incidentally, %, # and ? don't seem to have been working in any of our wikia for some time now!). Here's a snippet of the includes/WebRequest.php file with the patch applied:

class WebRequest {
	function __construct() {
		$this->checkMagicQuotes();
		# The rest of the code in this function is from MediaWiki 1.10
		global $wgUsePathInfo;
		if ( $wgUsePathInfo ) {
			if ( isset( $_SERVER['ORIG_PATH_INFO'] ) && $_SERVER['ORIG_PATH_INFO'] != '' ) {
				# Mangled PATH_INFO
				# http://bugs.php.net/bug.php?id=31892
				# Also reported when ini_get('cgi.fix_pathinfo')==false
				$_GET['title'] = $_REQUEST['title'] = substr( $_SERVER['ORIG_PATH_INFO'], 1 );
			} elseif ( isset( $_SERVER['PATH_INFO'] ) && ($_SERVER['PATH_INFO'] != '') && $wgUsePathInfo ) {
				$_GET['title'] = $_REQUEST['title'] = substr( $_SERVER['PATH_INFO'], 1 );
			}
		}
	}

	/**
	 * Check for title, action, and/or variant data in the URL
	 * and interpolate it into the GET variables.
	 * This should only be run after $wgContLang is available,
	 * as we may need the list of language variants to determine
	 * available variant URLs.
	 */
	function interpolateTitle() {
		return; # add this to disable the new 1.11 title extraction functionality
		global $wgUsePathInfo;