Change the character-set of a wiki database

From Organic Design
Jump to: navigation, search
Procedure.svg Change the character-set of a wiki database
Organic Design procedure

There can be some problems with MySQL dumps concerning character encodings. Most of these problems are covered here. It seems that to ensure you have a proper UTF-8 encoded text file you must set the character set to latin1. When importing this dump you may need to manually change the /*!40101 SET NAMES latin1 */ to /*!40101 SET NAMES utf8 */.

These problems don't always seem to occur - I have usually had no trouble with non latin1 exports being reimported again with special characters working fine even though looking at the dump in a text editor shows mangled characters. But sometimes - perhaps depending on the versions of the MySQL servers being imported and exported between - these special steps need to be taken.

More info

A database's default collation can be set as shown in the following example:

ALTER DATABASE dbname CHARACTER SET utf8 COLLATE utf8_general_ci;

Or just on indivdual tables with:

ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;

To convert all tables in a specified database at once use the following:

for table in `mysql -u DBUSER DBNAME --password=DBPASS -e "show tables" | grep -v Tables`;
    do mysql -u DBUSER DBNAME --password=DBPASS -e "ALTER TABLE $table CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci"; 

Using iconv

Character set specifics can be a really difficult problem with some wiki databases, I've found that sometimes the only way to get database imports to work properly is to convert the SQL dump file directly. There are examples of doing this with sed shown in the MediaWiki manual page, but sometimes even this fails and more luck can be had with iconv as in the following example taken from here.

iconv -f latin1 -t utf8 original-dump.sql -o utf8-dump.sql

Determine the Encoding of a Database

show create database <dbname>;

See also