Difference between revisions of "Talk:Import CSV data into a wiki"
(→Job File: note: perls indexing start @ 0) |
(→Template: Full details of new functionality) |
||
Line 29: | Line 29: | ||
=== Template === | === Template === | ||
− | A | + | A nice approach would be for the script to fetch the template intended for use on the wiki using the persistent '''template:''' argument in the ''job.txt''' file. The [[Csv2wiki.pl]] script can then parse out all the {{{params}}} which are not between <nowiki><noinclude>...</noinclude></nowiki> tags from a required template, since any content between those tags is not acted upon during transclusion of the template intances. The script can then obtain a list of template parameters and check them against the column names of the input file. Using this information the uploaded template wikitext calls can be restricted to a subset of columns of interest out of the input file. e.g. |
− | |||
+ | Input has columns: Foo Fodda Fum fi | ||
From template:Foo; | From template:Foo; | ||
<nowiki>{{Foo|1= |2= |fodda=|fi=123}}</nowiki> | <nowiki>{{Foo|1= |2= |fodda=|fi=123}}</nowiki> | ||
+ | |||
+ | Column Fum is not uploaded as it is not in the template definition. | ||
;Approach | ;Approach | ||
− | #It needs a template as input, it then uses wiki.pl to fetch the article wikitext | + | #It needs to find a template as input, if it doesn't exist then it throws a warning and uploads everything, it then uses wiki.pl to fetch the article wikitext. |
#It parses through the template and grabs all {{{params}}} | #It parses through the template and grabs all {{{params}}} | ||
+ | # Th columns are matched between the unique template parameters and the infile columns | ||
− | + | ====Todo==== | |
− | + | Most of the template uploading functionality has been written. | |
+ | * Add a '''prefix:''' argument which adds to the beginning of article names. | ||
+ | * Add the <code> use strict </code>pragma | ||
=== Logging In === | === Logging In === |
Revision as of 20:05, 28 May 2008
Contents
Description
csv2wiki is a way to import data from a CSV file into a wiki running mediawiki. There are two versions of this program. The first is a command-line driven version written in the Perl language. The second is a PHP version that runs as a MediaWiki command-line maintenance script so that it can integrate directly with the wiki database instead of working via HTTP requests.
How It Works
The script uses wiki.pl to login and edit wiki articles
Job File
The job file contains all the information necessary to update your wiki from content in a single Source File. The structure of the job file is a \n delimited list of key value pairs. Fields in the job file include:
- csv: Source File #full path and file name
- wiki: http://mydomain.com/wiki/index.php5 #wiki URL (must be long form including the index.php
- user: username #an active user on the wiki
- pass: password #user's password
- separator: # the double quoted delimiter between records e.g. ",", "\t" etc (default is comma, this is a regular expression, so you can use it to remove white-space, eg \s*,\s* or specify tabs with \t)
- title: n # the column number that gives the article it's title (perl indexing starts from 0)
- template: # defaults to Template:Record if there is none
For example the job file could contain:
csv: MyMileToParse.txt wiki: http://mydomain.com/wiki/index.php5
etc. The : is critical as a key value separator
Source File
- Unicode: Ensure that the CSV file is encoded in UTF-8 if it contains special characters
Template
A nice approach would be for the script to fetch the template intended for use on the wiki using the persistent template:' argument in the job.txt file. The Csv2wiki.pl script can then parse out all the {{{params}}} which are not between <noinclude>...</noinclude> tags from a required template, since any content between those tags is not acted upon during transclusion of the template intances. The script can then obtain a list of template parameters and check them against the column names of the input file. Using this information the uploaded template wikitext calls can be restricted to a subset of columns of interest out of the input file. e.g.
Input has columns: Foo Fodda Fum fi From template:Foo;
{{Foo|1= |2= |fodda=|fi=123}}
Column Fum is not uploaded as it is not in the template definition.
- Approach
- It needs to find a template as input, if it doesn't exist then it throws a warning and uploads everything, it then uses wiki.pl to fetch the article wikitext.
- It parses through the template and grabs all {{{params}}}
- Th columns are matched between the unique template parameters and the infile columns
Todo
Most of the template uploading functionality has been written.
- Add a prefix: argument which adds to the beginning of article names.
- Add the
use strict
pragma
Logging In
Windows
If you are needing to run Csv2wiki from a Windows machine, you will first need to install Active Perl. Refer to their site for documentation.
Install and Run
- Copy your job description file (eg job.txt) to the same directory as the csv2wiki.pl and wiki.pl scripts
- Open up a command prompt
- change to the directory containing the csv2wiki.pl script
perl csv2wiki.pl job.txt
- you can run multiple jobs at the same time by opening multiple cmd windows in windows. Make sure the jobs are different!
Issues
- what is the delay time for logging in? (troubleshooting a slow server)
- Login should take about a second
Tips
- Open in excel
- copy spreadsheet and paste special as values only (gets rid of formulas) and resave
- remove or replace all "," characters in file
- save excel spreadsheet as Unicode Text
- Save As, Save as Type, Unicode Text (*.txt)
- Open and Edit text in Geany (better than notepad),
- search and replace TAB character with "," (cut and paste a tab character from notepad to get this)
- NOTE, if you want to use tabs, set separator to \t
- set unicode type (Document, Set encoding, Unicode, UTF-8)
- save