Difference between revisions of "Extension talk:CurrentPages"

From Organic Design wiki
m (Changing to DB storage)
(Change source-code blocks to standard format)
 
(8 intermediate revisions by 2 users not shown)
Line 1: Line 1:
The main functionality is all in the function called by ExtensionSetup. It's not very efficient because the file is read and written for every request and due to the large number of articles read over a 24 hour period, the file can get up to a 1 megabyte or so. The best way would be to create an additional database table to store the entries by. The [[Extension:PayPal.php|PayPal extension]] has example code for adding and using a new table. Another efficiency gain would be to store article ID's rather than text.
+
{{ext-talk-msg|CurrentPages}}
{{code|<php>
+
== About ==
 +
This extension is basically a hit counter recording the number of normal page views to each article. The difference between this extension and the normal [[Special:Popularpages|popular pages special-page]] is that this only records the hits for the last 24 hours. It does this by storing the current hour along with each title and its view count. It creates a database table called ''currentpages_hourlyviews'' to store the data in, the structure of which is shown below.
 +
 
 +
It also stores the current hour in the database and when the current hour changes it deletes any existing rows exhibiting the new hour. It does this so that items more than 24 hours old are purged.
 +
 
 +
<pre>
 +
mysql> describe currentpages_hourlyviews;
 +
+-------+---------+------+-----+---------+-------+
 +
| Field | Type    | Null | Key | Default | Extra |
 +
+-------+---------+------+-----+---------+-------+
 +
| hour  | int(11) | YES  |    | NULL    |      |
 +
| page  | int(11) | YES  |    | NULL    |      |
 +
| views | int(11) | YES  |    | NULL    |      |
 +
+-------+---------+------+-----+---------+-------+
 +
3 rows in set (0.00 sec)
 +
</pre>
 +
 
 +
== Bugs ==
 +
*For some reason most page requests increase the count by two
 +
*If there are no requests for more than an hour the information will not be cleared for the hours during that period, but in reality no site connected to the internet would have no requests in any given hour.
 +
 
 +
== Efficiency ==
 +
The current database storage method is very efficient. For a normal page request there are only two single-row SELECT queries and one single-row UPDATE necessary to add the new title request to the counter table. Once per hour an additional single-row UPDATE and multi-row DELETE will occur to clear out the titles from the 24hr old data. To render the list, a multi-row SELECT query limited to the number of items in the list is performed.
 +
 
 +
The main code was like shown below before, where it serialised and deserialised the data array into a file. The problem was that the file started getting quite large which is a big problem considering that its loaded and saved for every request. Due to this, the storage mechanism was changed to use the database instead.
 +
<source lang="php">
 
# Read the $egCurrentPagesData array from file
 
# Read the $egCurrentPagesData array from file
 
$data = file_get_contents($egCurrentPagesFile);
 
$data = file_get_contents($egCurrentPagesFile);
Line 16: Line 41:
 
# Write the $egCurrentPagesData array back to file
 
# Write the $egCurrentPagesData array back to file
 
file_put_contents($egCurrentPagesFile, serialize($egCurrentPagesData));
 
file_put_contents($egCurrentPagesFile, serialize($egCurrentPagesData));
</php>}}
+
</source>
== Changing to DB storage ==
+
== Layout ==
{{code|<php>
+
I suggest the number of pageviews goes before the article name, that way long article titles won't push the page count under the main article area where it can't be seen.--[[User:Milan|Milan]] 15:19, 26 May 2008 (NZST)
# Create the DB table if it doesn't exist
+
:Good idea oi, done ;-) --[[User:Nad|nad]] 17:48, 26 May 2008 (NZST)
$table = $db->tableName('currentpages_hourlyviews');
 
if (!$db->tableExists($table)) {
 
$query = "CREATE TABLE $table (hour INTEGER, page INTEGER, views INTEGER);";
 
$result = $db->query($query);
 
$db->freeResult($result);
 
}
 
 
# Increment a title
 
 
 
SELECT * FROM $table WHERE hour = $hour AND page = $page
 
if ($row = fetch...) {
 
$views++
 
UPDATE...
 
}
 
else {
 
INSERT INTO $table VALUES($hour,$page,1)
 
}
 
 
 
# Render list
 
SELECT page, SUM(views) AS totals FROM $table GROUP BY page ORDER BY totals DESC LIMIT $n
 
 
 
</php>}}
 

Latest revision as of 18:11, 22 May 2015

Info.svg This talk page pertains specifically to the development of this extension. For more general discussion about bugs and usage etc, please refer to the mediawiki.org talk page at MW:Extension talk:CurrentPages

About

This extension is basically a hit counter recording the number of normal page views to each article. The difference between this extension and the normal popular pages special-page is that this only records the hits for the last 24 hours. It does this by storing the current hour along with each title and its view count. It creates a database table called currentpages_hourlyviews to store the data in, the structure of which is shown below.

It also stores the current hour in the database and when the current hour changes it deletes any existing rows exhibiting the new hour. It does this so that items more than 24 hours old are purged.

mysql> describe currentpages_hourlyviews;
+-------+---------+------+-----+---------+-------+
| Field | Type    | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| hour  | int(11) | YES  |     | NULL    |       | 
| page  | int(11) | YES  |     | NULL    |       | 
| views | int(11) | YES  |     | NULL    |       | 
+-------+---------+------+-----+---------+-------+
3 rows in set (0.00 sec)

Bugs

  • For some reason most page requests increase the count by two
  • If there are no requests for more than an hour the information will not be cleared for the hours during that period, but in reality no site connected to the internet would have no requests in any given hour.

Efficiency

The current database storage method is very efficient. For a normal page request there are only two single-row SELECT queries and one single-row UPDATE necessary to add the new title request to the counter table. Once per hour an additional single-row UPDATE and multi-row DELETE will occur to clear out the titles from the 24hr old data. To render the list, a multi-row SELECT query limited to the number of items in the list is performed.

The main code was like shown below before, where it serialised and deserialised the data array into a file. The problem was that the file started getting quite large which is a big problem considering that its loaded and saved for every request. Due to this, the storage mechanism was changed to use the database instead.

# Read the $egCurrentPagesData array from file
$data = file_get_contents($egCurrentPagesFile);
$egCurrentPagesData = $data ? unserialize($data) : array();

# If the hour has changed, clear any existing data out
$hour = strftime('%H');
if (!isset($egCurrentPagesData[$hour]) || (isset($egCurrentPagesData['H']) && $egCurrentPagesData['H'] != $hour))
	$egCurrentPagesData[$hour] = array();
$egCurrentPagesData['H'] = $hour;

# Increment the entry for current hour and title
$egCurrentPagesData[$hour][$title] = isset($egCurrentPagesData[$hour][$title]) ? $egCurrentPagesData[$hour][$title]+1 : 1;

# Write the $egCurrentPagesData array back to file
file_put_contents($egCurrentPagesFile, serialize($egCurrentPagesData));

Layout

I suggest the number of pageviews goes before the article name, that way long article titles won't push the page count under the main article area where it can't be seen.--Milan 15:19, 26 May 2008 (NZST)

Good idea oi, done ;-) --nad 17:48, 26 May 2008 (NZST)