WordPress 中文文档

Converting Database Character Sets

出自WordPress Chinese 中文文档

跳转到: 导航, 搜索

wordpress.org.cn

This article addresses, in general, the process of converting your WordPress MySQL database tables from one character set to another. Warning: character set conversion is not a simple process. Please complete a backup of your database before attempting any conversion.

目录

The History

Up to and including WordPress Version 2.1.3, most WordPress databases were created using the latin1 character set and the latin1_swedish_ci collation.

Character set and collation can now be defined

Beginning with Version 2.2, WordPress allows the user to define both the database character set and the collation in their wp-config.php file. Setting the DB_CHARSET and DB_COLLATE values in wp-config.php causes WordPress to create the database with the appropriate settings. But, the setting can only be designated for new installations, not for 'already installed' copies of WordPress. The rest of this article will explain how to convert the character set and collation for existing WordPress installations.

Converting your database

Before beginning any conversion, please backup your database. Backing Up Your Database has easy-to-follow instructions.

For discussion purposes, it is assumed you have a database in the latin1 character set that needs converting to a utf8 character set.

The Problem

To convert character sets requires using the the MySQL ALTER TABLE command. When converting the character sets, all TEXT (and similar) fields are converted to UTF-8, but that conversion will BREAK existing TEXT because the conversion expects the data to be in latin1, but WordPress may have stored unicode characters in a latin1 database, and as a result, data could end up as garbage after a conversion!

The Solution

The solution is to ALTER all TEXT and related fields to their binary counterparts, then alter the character set and finally change the binary data type fields back to TEXT.

Example steps:

  1. Place notice that blog is out of service
  2. Backup database
  3. ALTER TABLE wp_users MODIFY display_name BLOB;
  4. ...ALTER TABLE commands for all other tables/columns...
  5. ALTER DATABASE wordpress charset=utf8;
  6. ALTER TABLE wp_users charset=utf8;
  7. ...ALTER TABLE command for all other tables...
  8. ALTER TABLE wp_users MODIFY display_name TEXT CHARACTER SET utf8;
  9. ...ALTER TABLE for all other tables/columns...
  10. Add DB_CHARSET and DB_COLLATE definitions to wp-config.php
  11. Place blog back on-line

The string field types need to be converted to their binary field types counterparts. The list is as follows:

  • CHAR - BINARY
  • VARCHAR - VARBINARY
  • TINYTEXT - TINYBLOB
  • TEXT - BLOB
  • MEDIUMTEXT - MEDIUMBLOB
  • LONGTEXT - LONGBLOB

This information was originally posted by member g30rg3x in Forum Thread 117955.

SQL statements to perform all these changes can be generated with the following statements (replace MyDb with your database name). Running mysql with -s --skip-column-names will make it easier to capture the output by copy and paste

pre USE information_schema; SELECT CONCAT('ALTER TABLE ', table_name, ' MODIFY ', column_name, ' ', REPLACE(column_type, 'char', 'binary'), ';') FROM columns WHERE table_schema = 'MyDb' and data_type LIKE '%char%'; SELECT CONCAT('ALTER TABLE ', table_name, ' MODIFY ', column_name, ' ', REPLACE(column_type, 'text', 'blob'), ';') FROM columns WHERE table_schema = 'MyDb' and data_type LIKE '%text%'; SELECT CONCAT('ALTER TABLE ', table_name, ' MODIFY ', column_name, ' ', column_type, ' CHARACTER SET utf8;') FROM columns WHERE table_schema = 'MyDb' and data_type LIKE '%char%'; SELECT CONCAT('ALTER TABLE ', table_name, ' MODIFY ', column_name, ' ', column_type, ' CHARACTER SET utf8;') FROM columns WHERE table_schema = 'MyDb' and data_type LIKE '%text%'; /pre

The output of these statements will be another set of statements, based on the tables and columns in your database. Change to your Wordpress database and execute the statements generated above.

Then, change the default language:

pre ALTER DATABASE MyDb CHARACTER SET utf8; /pre

You can set the column types back by generating similar statements to the first batch. Be sure you don't accidentally change any columns that were previously in binary or blob format!

pre SELECT CONCAT('ALTER TABLE ', table_name, ' MODIFY ', column_name, ' ', REPLACE(column_type, 'binary', 'char'), ';') FROM columns WHERE table_schema = 'MyDb' and data_type LIKE '%binary%'; SELECT CONCAT('ALTER TABLE ', table_name, ' MODIFY ', column_name, ' ', REPLACE(column_type, 'blob', 'text'), ';') FROM columns WHERE table_schema = 'MyDb' and data_type LIKE '%blob%'; /pre

(Originally found in a post on Haidong Ji's blog).

ENUM and SET have more specific conversion rules: Set the character set to binary, or to UTF8 if you are sure that no ENUM or SET field has special characters that might get garbled during conversion. The SQL for this is:

  • ALTER TABLE wp_links CHANGE link_visible link_visible ENUM('Y','N') CHARACTER SET utf8;

The field name does need to be repeated, as well as the ENUM specification.

When specifying BINARY and VARBINARY, the field length also needs to be specified, and needs to be the same value as the original CHAR and VARCHAR field length. In other words, VARCHAR(200) becomes VARBINARY(200).

So, in Steps 3 and 4 change CHAR, VARCHAR, TEXT, ENUM, and SET fields to their binary counterparts (BLOB, VARBINARY, etc), in Step 5 switch the database to utf8, in Steps 6 and 7 switch all the tables to utf8, and finally, in Steps 8 and 9 return the binary fields back to the respective CHAR, VARCHAR, TEXT, ENUM, and SET data types with the utf8 character set.

The key to the conversion is that a field with a binary data type, unlike CHAR, VARCHAR, TEXT, ENUM, and SET fields, will not be converted to garbage when the database and tables are switched to utf8.

Conversion Scripts and Plugins

In the WordPress Forums, Member andersapt, in Forum Thread 117955 submitted a conversion script, Convert UTF8 SQL Generator, to automatically convert a WordPress database. (This link is currently dead.)

A plugin, UTF-8 Database Converter, is available from g30rg3_x. Carefully review the readme file included with the plugin. This plugin corrupts data in modern versions of Wordpress.

You can try a modification of UTF-8 Sanitize plugin as a last resort, it will manually replace the 'strange' characters with their UTF counterparts. As before, make sure you BACKUP your database before performing actions with the plugin, as all changes will be permanent.

Discussions on character sets

Resources

wordpress