Difference between revisions of "Talk:Import CSV data into a wiki"

From Organic Design wiki
(Job File)
Line 13: Line 13:
 
*user: '''username''' #an active user on the wiki
 
*user: '''username''' #an active user on the wiki
 
*pass: '''password''' #user's password
 
*pass: '''password''' #user's password
*separator # delimiter between records (default is comma)
+
*separator # delimiter between records (default is comma, this is a regular expression, so you can use it to remove white-space, eg '''\s*,\s*''' or specify tabs with '''\t''')
 
*title: '''n''' # the column number that gives the article it's title
 
*title: '''n''' # the column number that gives the article it's title
 
*template: # defaults to [[Template:Record]] if there is none
 
*template: # defaults to [[Template:Record]] if there is none
Line 34: Line 34:
 
  perl csv2wiki.pl job.txt
 
  perl csv2wiki.pl job.txt
  
== Job file syntax ==
+
== Issues ==
The job files contain the following parameters to specify a parsing job:
+
*Unicode: Ensure that the CSV file is encoded in UTF-8 if it contains special characters
*csv: c:\foo.csv ''(the full pathname of the CSV file for this job)''
 
*wiki: http://www.mywiki.com/wiki/index.php ''(the URL of the wiki to update, must be long-form)''
 
*user: WikiUser ''(the username to log in to the wiki as)''
 
*pass:
 
*title: 7 ''(index of field to use as article title, zero is first field)''
 
*template: MyTemplate ''(defaults to "Record" if not included)''
 
 
 
==issues==
 
*delimiter: can we use/specify one different than the ","? many of the files contain text with comma included.
 
*Unicode: for the Japanese import, the unicode characters seem to break the import.  We need to be able to have unicode article names, so advisment here would be prudent.
 
 
*what is the delay time for logging in? (troubleshooting a slow server)
 
*what is the delay time for logging in? (troubleshooting a slow server)
 +
:Login should take about a second
  
 
+
== Notes ==
==Notes==
 
 
*save excel spreadsheet as Unicode Text
 
*save excel spreadsheet as Unicode Text
 
*open unicode text in notepad
 
*open unicode text in notepad
 
*search and replace TAB character with ","
 
*search and replace TAB character with ","
 
*save
 
*save

Revision as of 09:24, 27 March 2008

Description

csv2wiki is a way to import data from a CSV file into a wiki running mediawiki. There are two versions of this program. The first is a command-line driven version written in the Perl language. The second is a PHP version that runs as a MediaWiki command-line maintenance script so that it can integrate directly with the wiki database instead of working via HTTP requests.

How It Works

The script uses wiki.pl to login and edit wiki articles

Job File

The job file contains all the information necessary to update your wiki from content in the Source File. Fields in the job file include:

  • csv: Source File #full path and file name
  • wiki: http://mydomain.com/wiki/index.php5 #wiki URL (must be long form including the index.php
  • user: username #an active user on the wiki
  • pass: password #user's password
  • separator # delimiter between records (default is comma, this is a regular expression, so you can use it to remove white-space, eg \s*,\s* or specify tabs with \t)
  • title: n # the column number that gives the article it's title
  • template: # defaults to Template:Record if there is none

Source File

Template

Logging In

Windows

If you are needing to run Csv2wiki from a Windows machine, you will first need to install Active Perl. Refer to their site for documentation.

Install and Run

  • Copy your job description file (eg job.txt) to the same directory as the csv2wiki.pl and wiki.pl scripts
  • Open up a command prompt
  • change to the directory containing the csv2wiki.pl script
perl csv2wiki.pl job.txt

Issues

  • Unicode: Ensure that the CSV file is encoded in UTF-8 if it contains special characters
  • what is the delay time for logging in? (troubleshooting a slow server)
Login should take about a second

Notes

  • save excel spreadsheet as Unicode Text
  • open unicode text in notepad
  • search and replace TAB character with ","
  • save