Difference between revisions of "Import CSV data into a wiki"

From Organic Design wiki
m (Installation)
 
(13 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{lowercase}}{{svn|tools|csv2wiki.pl}}
+
{{procedure}}
  
'''csv2wiki''' is a way to import data from a CSV file into a MediaWiki. The program is run from the shell and requires one parameter which is the filename of a text file describing the parameters for the job, for example:
+
[{{repo|tools|csv2wiki.pl}} csv2wiki.pl] is a simple perl script to import data from a CSV file into a MediaWiki.
{{code|<pre>
 
./csv2wiki.pl /home/foo/my.job
 
</pre>}}
 
  
Here is an example of what the content of the job file may look like:
+
== Installation ==
{{code|<pre>
+
The script depends on a Perl module called [{{repo|tools|MediaWiki.pm}} MediaWiki.pm] which needs to be saved in the same location, or in one of your Perl include paths.
$wiki  = "http://foo.bar/wiki/index.php";
 
$user  = "Foo";
 
$pass  = "Bar";
 
$csv  = "/home/foo/projects/bar.csv";
 
$title = "$1 $2 $3";
 
</pre>}}
 
  
== Fields and values ==
+
It also depends on ''LWP'' and ''XML::Simple''. If these dependencies are not present, Perl will raise an error specifying which one is not available and you can then install it with CPAN, e.g.
Each line of the input file will be imported into an article in the wiki, and the first line of the input file specifies the field names.
+
<source lang="bash">
{{code|<pre>
+
cpan XML::Simple
Title, Firstname, Surname
+
</source>
Mr, Bob, McFoo
+
 
Miss, Mary, Barson
+
On Debian-like systems there are also packages for the dependencies:
</pre>}}
+
<source lang="bash">
 +
apt install libwww-perl libxml-simple-perl
 +
</source>
  
 +
== Usage ==
 +
The program is run from the shell and requires one parameter which is the filename of a configuration file describing the parameters for the job.
 +
<source lang="bash">
 +
./csv2wiki.pl /home/myJob.conf
 +
</source>
  
The first row defines the fieldnames, and the second is imported into the first article with the following content:
 
{{code|<pre>
 
{{Record
 
| Title = Mr
 
| Firstname = Bob
 
| Surname = McFoo
 
}}
 
</pre>}}
 
  
 +
Here is an example the content of a configuration file:
 +
<source lang="js">
 +
wiki    = "http://foo.bar/wiki/index.php"
 +
user    = "Foo"
 +
pass    = "Bar"
 +
csv      = "/home/foo/projects/bar.csv"
 +
title    = "$1 $2 $3"
 +
template = "Customer"
 +
</source>
  
If there was already a ''Record'' template on the page, then only that template would be updated rather than the whole article.
 
  
=== Multi-select fields ===
+
Each line of the input CSV file will be imported into an article in the wiki, and the first line of the input file specifies the column header names, for example:
In the following example, some of the field names in the first row are left blank to indicate a multi-value field. The ''interests'' field can have up to three items.
+
<source>
{{code|<pre>
+
Title, Firstname, Surname
Title, Firstname, Surname, Interests,,, Email
+
Mr, Bob, McFoo
Mr, Bob, McFoo, Art, History,, bob@foo.bar
+
Miss, Mary, Barson
Miss, Mary, Barson, Chemistry, Maths, Physics, mary.barson@bar.baz
+
</source>
</pre>}}
 
  
  
In this case, the first row imported looks like this:
+
In this example the first row of data will be imported into an article called "Mr Bob McFoo" with the following content. The column header names are used as the template parameter names, with the values from that row of the CSV file being used as the values for the template parameters.
{{code|<pre>
+
<source>
{{Record
+
{{Customer
 
  | Title = Mr
 
  | Title = Mr
 
  | Firstname = Bob
 
  | Firstname = Bob
 
  | Surname = McFoo
 
  | Surname = McFoo
| Interests = Art
 
History
 
| Email = bob@foo.bar
 
 
}}
 
}}
</pre>}}
+
</source>
 +
 
  
 +
If the page already existed, then the new template would be inserted at the top of the page (or the bottom if "append" is set). Or if the template already exists in the page, then it's values will be updated.
  
Note that the '''multisep''' parameter can be set in the job file, for example if ''multisep'' were set to ";", then the above content would instead be imported as follows. The default value for ''multisep'' is the newline character because that's how the RecordAdmin extension expects multiple values to be formatted.
+
'''Note''' that pages that already have templates on them must only use a single value on each line in their syntax.
{{code|<pre>
 
{{Record
 
| Title = Mr
 
| Firstname = Bob
 
| Surname = McFoo
 
| Interests = Art;History
 
| Email = bob@foo.bar
 
}}
 
</pre>}}
 
  
== Parameters ==
+
== Configuration options ==
The job file contains all the information necessary to update your wiki from content in a single Source File which should be '''UTF-8''' encoded. Here is a description of the possible parameters in the job file and their meaning:
+
Here is a description of the possible parameters in the job file and their meaning:
*'''csv:''' ''the source file to import''
+
{| class=od-info
*'''wiki:''' ''the full long-form URL of the wiki including index.php''
+
!Name!!Meaning!!Default value
*'''user:''' ''username of a user on the wiki with permission to create the necessary articles''
+
|-
*'''pass:''' ''the users password''
+
|csv||The source file to import relative to the location of the script||-
*'''separator:''' ''also just "sep" is allowed, specifies the separator character, default is comma''
+
|-
*'''multisep:''' ''specifies the separator character to use for multi-value fields, default is newline which is used by RecordAdmin''
+
|wiki||the full long-form URL of the wiki including index.php||-
*'''title:''' ''the format of the title using '''$n''' to specify fields, default is NULL which means to use GUID's for titles''
+
|-
*'''template:''' ''The template that the parameters should be wrapped by in the created wiki articles, defaults to [[Template:Record]]''
+
|user||Username of a user on the wiki with permission to create the necessary articles||-
*'''append:''' ''Specifies whether the template should be placed before or after existing text if the template doesn't already exist in the article''
+
|-
 +
|pass||The users password||-
 +
|-
 +
|separator||Also just "sep" is allowed, specifies the separator character used in the CSV file||''comma''
 +
|-
 +
|title||The format of the title using '''$n''' to specify the column numbers, default is ''$1'' which means to use just the first column as the page title||$1
 +
|-
 +
|template||The template that the parameters should be wrapped by in the created wiki articles||Record
 +
|-
 +
|append||Set this to 1 to place the template at the end of the text if the template doesn't already exist in the article||0
 +
|-
 +
|include||A comma-separated list of the columns that should be included from the CSV, by default all columns are used||''not set''
 +
|-
 +
|}
 
[[Category:Tools]][[Category:PERL]]
 
[[Category:Tools]][[Category:PERL]]

Latest revision as of 20:52, 7 May 2020

Procedure.svg Import CSV data into a wiki
Organic Design procedure

csv2wiki.pl is a simple perl script to import data from a CSV file into a MediaWiki.

Installation

The script depends on a Perl module called MediaWiki.pm which needs to be saved in the same location, or in one of your Perl include paths.

It also depends on LWP and XML::Simple. If these dependencies are not present, Perl will raise an error specifying which one is not available and you can then install it with CPAN, e.g.

cpan XML::Simple

On Debian-like systems there are also packages for the dependencies:

apt install libwww-perl libxml-simple-perl

Usage

The program is run from the shell and requires one parameter which is the filename of a configuration file describing the parameters for the job.

./csv2wiki.pl /home/myJob.conf


Here is an example the content of a configuration file:

wiki     = "http://foo.bar/wiki/index.php"
user     = "Foo"
pass     = "Bar"
csv      = "/home/foo/projects/bar.csv"
title    = "$1 $2 $3"
template = "Customer"


Each line of the input CSV file will be imported into an article in the wiki, and the first line of the input file specifies the column header names, for example:

Title, Firstname, Surname
Mr, Bob, McFoo
Miss, Mary, Barson


In this example the first row of data will be imported into an article called "Mr Bob McFoo" with the following content. The column header names are used as the template parameter names, with the values from that row of the CSV file being used as the values for the template parameters.

{{Customer
 | Title = Mr
 | Firstname = Bob
 | Surname = McFoo
}}


If the page already existed, then the new template would be inserted at the top of the page (or the bottom if "append" is set). Or if the template already exists in the page, then it's values will be updated.

Note that pages that already have templates on them must only use a single value on each line in their syntax.

Configuration options

Here is a description of the possible parameters in the job file and their meaning:

Name Meaning Default value
csv The source file to import relative to the location of the script -
wiki the full long-form URL of the wiki including index.php -
user Username of a user on the wiki with permission to create the necessary articles -
pass The users password -
separator Also just "sep" is allowed, specifies the separator character used in the CSV file comma
title The format of the title using $n to specify the column numbers, default is $1 which means to use just the first column as the page title $1
template The template that the parameters should be wrapped by in the created wiki articles Record
append Set this to 1 to place the template at the end of the text if the template doesn't already exist in the article 0
include A comma-separated list of the columns that should be included from the CSV, by default all columns are used not set