Difference between revisions of "Talk:SQL"

From Organic Design wiki
(Selecting current articles by category: My query)
(Update)
Line 13: Line 13:
 
See [http://upload.wikimedia.org/wikipedia/commons/4/41/Mediawiki-database-schema.png MediaWiki schema] for a description of the tables. Essentially '''categorylinks''' store the category member relationships , '''page''' identifies the title and metadata information, and '''text''' contains the actual wikitext of articles.
 
See [http://upload.wikimedia.org/wikipedia/commons/4/41/Mediawiki-database-schema.png MediaWiki schema] for a description of the tables. Essentially '''categorylinks''' store the category member relationships , '''page''' identifies the title and metadata information, and '''text''' contains the actual wikitext of articles.
  
The MediaWiki allows variable article content to be stored as key => value pairs where the atomic unit for the content varies depending on the content of the article. A way around this is to use categorization to group common atomic unit structure together. Basically this is a filtering problem, however queried atomic unit structure needs to be checked downstream,
+
The database schema used by MediaWiki allows variable article content to be stored as key => value pairs where the atomic unit for the content '''varies''' depending on the content of the article. A way around this is to use categorization to group common atomic unit structure together. Basically this is a filtering problem, however queried atomic unit structure needs to be checked downstream by any downstream processing of content within categories.
  
 +
===Example===
 +
*[[Template:SQLcat]] is transcluded into the SELECT statements of this article, changing it will change category the statements QUERY on.
 +
*[[Template:SQLprefix]] is transcluded into the SELECT statements of this article, changing it will change the database prefix statements QUERY on.
 +
SELECT  * FROM {{SQLprefix}}categorylinks WHERE cl_to LIKE '{{SQLcat}}';
  
Example;
+
;This selects all pages (article names) from category '{{SQLcat}}'.
* Category: [[:Category:Rosaceae]]
+
SELECT * FROM {{SQLprefix}}page
* Article: [[Rosaceae]] (Latest revision number is 1352)
+
WHERE page_id IN (SELECT  cl_from FROM {{SQLprefix}}categorylinks WHERE cl_to LIKE '{{SQLcat}}');
  
<pre>
 
SELECT  * from categorylinks WHERE cl_to like 'Rosaceae';
 
</pre>
 
  
;This selects all pages (article names) from category ''Rosaceae''.
+
;This selects all revisions of articles from category '{{SQLcat}}'.
<pre>
+
SELECT * FROM {{SQLprefix}}revision 
SELECT * from page
+
WHERE rev_page IN (SELECT  cl_from FROM {{SQLprefix}}categorylinks WHERE cl_to LIKE '{{SQLcat}}');
WHERE page_id IN (SELECT  cl_from from categorylinks WHERE cl_to like 'Rosaceae');
 
</pre>
 
  
;This selects all revisions of articles from category ''Rosaceae''.
 
<pre>
 
SELECT * FROM revision
 
WHERE rev_page IN (SELECT  cl_from from categorylinks WHERE cl_to like 'Rosaceae');
 
</pre>
 
 
From the output of this information it can be seen that what we want is the rev_text_id that is largest for any rev_page.
 
From the output of this information it can be seen that what we want is the rev_text_id that is largest for any rev_page.
  
 
;This selects all text from articles of category ''Rosaceae'';
 
;This selects all text from articles of category ''Rosaceae'';
<pre>
+
Select * FROM {{SQLprefix}}text
Select * from text
+
WHERE old_id IN (SELECT page_latest FROM {{SQLprefix}}page WHERE page_id IN (SELECT  cl_from FROM {{SQLprefix}}categorylinks WHERE cl_to LIKE '{{SQLcat}}'));
WHERE old_id IN (SELECT page_latest from page WHERE page_id IN (SELECT  cl_from from categorylinks WHERE cl_to like 'Rosaceae'));
 
</pre>
 
  
;Nads approach for quicker query
+
The query is extremely slow taking up to 3 seconds due to multiple nesting of SELECT statements.
<pre>
+
the following optimization by joining the two inner most nested SELECT statements speeds up the query by at least an order of magnitude.
select text from catlinks,text where catlinks.cl_to = "cat" and text.id = cat.id
+
SELECT * FROM {{SQLprefix}}text
SELECT * from 1120_text
+
WHERE old_id IN (SELECT page_latest FROM {{SQLprefix}}page,{{SQLprefix}}categorylinks WHERE cl_to = '{{SQLcat}}' AND cl_from = page_id);
WHERE old_id IN (SELECT page_id from 1120_page, 1120_categorylinks WHERE cl_to = "Selenium" AND cl_from = page_id);
 
</pre>
 

Revision as of 21:37, 19 May 2008

I need to add links to an article I have on SQL, I also propose a section listing every mediawiki extension that contains SQL queries --Sven 14:09, 11 February 2008 (NZDT)

Sounds like a job for SMW - adding script related properties to them all. Or maybe to their talk pages since scripts aren't wikitext articles - or could add a namespace for that - like properties:? --Nad 14:26, 11 February 2008 (NZDT)

Ideas for script-related properties

  • has db queries
  • classes/methods used
  • classes defined
  • hooks used
  • hooks defined
Absolutely --Sven 15:55, 11 February 2008 (NZDT)

Selecting current articles by category

See MediaWiki schema for a description of the tables. Essentially categorylinks store the category member relationships , page identifies the title and metadata information, and text contains the actual wikitext of articles.

The database schema used by MediaWiki allows variable article content to be stored as key => value pairs where the atomic unit for the content varies depending on the content of the article. A way around this is to use categorization to group common atomic unit structure together. Basically this is a filtering problem, however queried atomic unit structure needs to be checked downstream by any downstream processing of content within categories.

Example

  • Template:SQLcat is transcluded into the SELECT statements of this article, changing it will change category the statements QUERY on.
  • Template:SQLprefix is transcluded into the SELECT statements of this article, changing it will change the database prefix statements QUERY on.
SELECT  * FROM 1120_categorylinks WHERE cl_to LIKE 'Selenium';
This selects all pages (article names) from category 'Selenium'.
SELECT * FROM 1120_page
WHERE page_id IN (SELECT  cl_from FROM 1120_categorylinks WHERE cl_to LIKE 'Selenium');


This selects all revisions of articles from category 'Selenium'.
SELECT * FROM 1120_revision  
WHERE rev_page IN (SELECT  cl_from FROM 1120_categorylinks WHERE cl_to LIKE 'Selenium');

From the output of this information it can be seen that what we want is the rev_text_id that is largest for any rev_page.

This selects all text from articles of category Rosaceae;
Select * FROM 1120_text
WHERE old_id IN (SELECT page_latest FROM 1120_page WHERE page_id IN (SELECT  cl_from FROM 1120_categorylinks WHERE cl_to LIKE 'Selenium'));

The query is extremely slow taking up to 3 seconds due to multiple nesting of SELECT statements. the following optimization by joining the two inner most nested SELECT statements speeds up the query by at least an order of magnitude.

SELECT * FROM 1120_text
WHERE old_id IN (SELECT page_latest FROM 1120_page,1120_categorylinks WHERE cl_to = 'Selenium' AND cl_from = page_id);