Difference between revisions of "Talk:SQL"
(→Selecting current articles by category: My query) |
(Update) |
||
Line 13: | Line 13: | ||
See [http://upload.wikimedia.org/wikipedia/commons/4/41/Mediawiki-database-schema.png MediaWiki schema] for a description of the tables. Essentially '''categorylinks''' store the category member relationships , '''page''' identifies the title and metadata information, and '''text''' contains the actual wikitext of articles. | See [http://upload.wikimedia.org/wikipedia/commons/4/41/Mediawiki-database-schema.png MediaWiki schema] for a description of the tables. Essentially '''categorylinks''' store the category member relationships , '''page''' identifies the title and metadata information, and '''text''' contains the actual wikitext of articles. | ||
− | The MediaWiki allows variable article content to be stored as key => value pairs where the atomic unit for the content varies depending on the content of the article. A way around this is to use categorization to group common atomic unit structure together. Basically this is a filtering problem, however queried atomic unit structure needs to be checked downstream | + | The database schema used by MediaWiki allows variable article content to be stored as key => value pairs where the atomic unit for the content '''varies''' depending on the content of the article. A way around this is to use categorization to group common atomic unit structure together. Basically this is a filtering problem, however queried atomic unit structure needs to be checked downstream by any downstream processing of content within categories. |
+ | ===Example=== | ||
+ | *[[Template:SQLcat]] is transcluded into the SELECT statements of this article, changing it will change category the statements QUERY on. | ||
+ | *[[Template:SQLprefix]] is transcluded into the SELECT statements of this article, changing it will change the database prefix statements QUERY on. | ||
+ | SELECT * FROM {{SQLprefix}}categorylinks WHERE cl_to LIKE '{{SQLcat}}'; | ||
− | + | ;This selects all pages (article names) from category '{{SQLcat}}'. | |
− | * | + | SELECT * FROM {{SQLprefix}}page |
− | + | WHERE page_id IN (SELECT cl_from FROM {{SQLprefix}}categorylinks WHERE cl_to LIKE '{{SQLcat}}'); | |
− | |||
− | |||
− | |||
− | ;This selects all | + | ;This selects all revisions of articles from category '{{SQLcat}}'. |
− | + | SELECT * FROM {{SQLprefix}}revision | |
− | SELECT * | + | WHERE rev_page IN (SELECT cl_from FROM {{SQLprefix}}categorylinks WHERE cl_to LIKE '{{SQLcat}}'); |
− | WHERE | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
From the output of this information it can be seen that what we want is the rev_text_id that is largest for any rev_page. | From the output of this information it can be seen that what we want is the rev_text_id that is largest for any rev_page. | ||
;This selects all text from articles of category ''Rosaceae''; | ;This selects all text from articles of category ''Rosaceae''; | ||
− | + | Select * FROM {{SQLprefix}}text | |
− | Select * | + | WHERE old_id IN (SELECT page_latest FROM {{SQLprefix}}page WHERE page_id IN (SELECT cl_from FROM {{SQLprefix}}categorylinks WHERE cl_to LIKE '{{SQLcat}}')); |
− | WHERE old_id IN (SELECT page_latest | ||
− | |||
− | + | The query is extremely slow taking up to 3 seconds due to multiple nesting of SELECT statements. | |
− | + | the following optimization by joining the two inner most nested SELECT statements speeds up the query by at least an order of magnitude. | |
− | + | SELECT * FROM {{SQLprefix}}text | |
− | SELECT * | + | WHERE old_id IN (SELECT page_latest FROM {{SQLprefix}}page,{{SQLprefix}}categorylinks WHERE cl_to = '{{SQLcat}}' AND cl_from = page_id); |
− | WHERE old_id IN (SELECT | ||
− |
Revision as of 21:37, 19 May 2008
I need to add links to an article I have on SQL, I also propose a section listing every mediawiki extension that contains SQL queries --Sven 14:09, 11 February 2008 (NZDT)
- Sounds like a job for SMW - adding script related properties to them all. Or maybe to their talk pages since scripts aren't wikitext articles - or could add a namespace for that - like properties:? --Nad 14:26, 11 February 2008 (NZDT)
- has db queries
- classes/methods used
- classes defined
- hooks used
- hooks defined
- Absolutely --Sven 15:55, 11 February 2008 (NZDT)
Selecting current articles by category
See MediaWiki schema for a description of the tables. Essentially categorylinks store the category member relationships , page identifies the title and metadata information, and text contains the actual wikitext of articles.
The database schema used by MediaWiki allows variable article content to be stored as key => value pairs where the atomic unit for the content varies depending on the content of the article. A way around this is to use categorization to group common atomic unit structure together. Basically this is a filtering problem, however queried atomic unit structure needs to be checked downstream by any downstream processing of content within categories.
Example
- Template:SQLcat is transcluded into the SELECT statements of this article, changing it will change category the statements QUERY on.
- Template:SQLprefix is transcluded into the SELECT statements of this article, changing it will change the database prefix statements QUERY on.
SELECT * FROM 1120_categorylinks WHERE cl_to LIKE 'Selenium';
- This selects all pages (article names) from category 'Selenium'.
SELECT * FROM 1120_page WHERE page_id IN (SELECT cl_from FROM 1120_categorylinks WHERE cl_to LIKE 'Selenium');
- This selects all revisions of articles from category 'Selenium'.
SELECT * FROM 1120_revision WHERE rev_page IN (SELECT cl_from FROM 1120_categorylinks WHERE cl_to LIKE 'Selenium');
From the output of this information it can be seen that what we want is the rev_text_id that is largest for any rev_page.
- This selects all text from articles of category Rosaceae;
Select * FROM 1120_text WHERE old_id IN (SELECT page_latest FROM 1120_page WHERE page_id IN (SELECT cl_from FROM 1120_categorylinks WHERE cl_to LIKE 'Selenium'));
The query is extremely slow taking up to 3 seconds due to multiple nesting of SELECT statements. the following optimization by joining the two inner most nested SELECT statements speeds up the query by at least an order of magnitude.
SELECT * FROM 1120_text WHERE old_id IN (SELECT page_latest FROM 1120_page,1120_categorylinks WHERE cl_to = 'Selenium' AND cl_from = page_id);