======================================= Conversion steps to Drupal ======================================= This procedure is for Drupal 5, but should work "mostly" for Drupal 6 (will require a few modifications). Some known differences in 6 include: 'sequences' table is deprecated, files/attachments are now owned by users not nodes (addition of the new 'uploads' table), in the 'user' table, the uid field now has auto_increment... in the 'node' table there are several new columns (language, tnid, and translate)... in the 'node_revisions' table, the title field has been changed from 128 to 255 characters, and auto_increment has been added for the vid field, and a default value of '0' for nid. ======================================= Prepare the OLD site's database for UTF8 ======================================= The first step before beginning is to export from the live server, prep files for UTF8, and import new databases into local server. The OLD site has Latin1 Binary tables, which need to be converted into UTF8 before moving to Drupal. Dump the databases on the live server: mysqldump -u username -p --default-character-set=latin1 --skip-set-charset oldsitedb > oldsitedb.sql mysqldump -u username -p --default-character-set=latin1 --skip-set-charset old_forum > old_forum.sql -------------- Run iconv in order to translate the latin1 characters to utf8: iconv -f ISO-8859-1 -t UTF-8 oldsitedb.sql > oldsitedb_utf8.sql iconv -f ISO-8859-1 -t UTF-8 old_forum.sql > old_forum_utf8.sql Afterward, download the new UTF8 files to your local system. -------------- Next you need to find and replace the latin1 table info in the sql files, from Terminal command line: sed -e 's/latin1/utf8/g' -e 's/utf8_bin/utf8_general_ci/g' oldsitedb_utf8.sql > oldsitedb2_utf8.sql sed -e 's/latin1/utf8/g' -e 's/utf8_bin/utf8_general_ci/g' old_forum_utf8.sql > old_forum2_utf8.sql Afterward rename the files so the 2 is gone, and delete the extra/old version of the files if you wish. -------------- From your local terminal, create the new databases: /Applications/MAMP/Library/bin/mysql -u username -p --execute="CREATE DATABASE oldsitedb CHARACTER SET utf8 COLLATE utf8_general_ci;" /Applications/MAMP/Library/bin/mysql -u username -p --execute="CREATE DATABASE oldforum CHARACTER SET utf8 COLLATE utf8_general_ci;" -------------- Finally, import the sql files into the local databases: /Applications/MAMP/Library/bin/mysql -u username -p --default-character-set=utf8 oldsitedb < /Users/Username/ConvertSite/mysqldump/oldsitedb_utf8.sql /Applications/MAMP/Library/bin/mysql -u username -p --default-character-set=utf8 oldforum < /Users/Username/ConvertSite/mysqldump/old_forum_utf8.sql ======================================= Users: accounts, preferences, signatures ======================================= Export of users, preparing some fields along the way for use in Drupal: Signatures will not be imported into the Drupal signature field, but will be used in the more versatile "Signatures for Forums" module. Using CASE statement to replace group_id 3 and group_id 20 with '1' and group_id 15 with '0' for use in setting Drupal's "status" field where 1 = normal, 0 = blocked. Fields listed after group_id are to be used later for importing into node profiles for users. Node profiles will be created later "after" import of forum content (so the forum node IDs will be reserved). SELECT user_id, username, password, email, signature, signature_hash, join_date, last_activity, last_visit, CASE timezone WHEN '0' THEN NULL ELSE timezone * 3600 END, CASE avatar WHEN '' THEN '' ELSE CONCAT('files/pictures/', avatar) END, CASE group_id WHEN '3' THEN '1' WHEN '20' THEN '1' WHEN '15' THEN '0' ELSE '1' END, name, homepage, msn, aim, icq, yim, location, title FROM oldsitedb.user ORDER BY user_id ASC INTO OUTFILE '/Users/Username/ConvertSite/users.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'; ---------- Export a file to be used for setting Contact preferences later in Drupal... this will produce a list of user_id's that can be joined against the Drupal user table to set the contact form preference stored in the serialized data field to on: (Note: if you can get this into a column in the main user export, do so - otherwise this should work fine) SELECT user_id FROM user WHERE (flags&16) = 16 ORDER BY user_id ASC INTO OUTFILE '/Users/Username/ConvertSite/users-contactform-on.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'; ---------- Importing user data into Drupal: First create the table to be imported into.... Login to local mysql: /Applications/MAMP/Library/bin/mysql -u username -p databasename CREATE TABLE oldsite_users ( user_id int(10) NOT NULL, username varchar(60) NOT NULL, password varchar(32) NOT NULL, email varchar(64) NOT NULL, signature text NOT NULL, signature_hash varchar(10) NOT NULL, join_date int(11) NOT NULL, last_activity int(11) NOT NULL, last_visit int(11) NOT NULL, timezone varchar(8) NOT NULL, avatar varchar(255) NOT NULL, blocked_status tinyint(4) NOT NULL, name varchar(255) NOT NULL, homepage varchar(255) NOT NULL, msn varchar(255) NOT NULL, aim varchar(255) NOT NULL, icq varchar(15) NOT NULL, yim varchar(255) NOT NULL, location varchar(255) NOT NULL, title varchar(255) NOT NULL PRIMARY KEY (user_id), ); Possible issues with field length in: email varchar(64) NOT NULL --- need to check database for emails over length. signature varchar(255) NOT NULL --- setting to text as core field won't be used. ---------- Query to import the CSV file: (Note: first prepare the file by removing line 1, which is my account, user_id 1). LOAD DATA LOCAL INFILE '/Users/Username/ConvertSite/users.csv' INTO TABLE oldsite_users FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' (user_id, username, password, email, signature, signature_hash, join_date, last_activity, last_visit, timezone, avatar, blocked_status, name, homepage, msn, aim, icq, yim, location, title); ------------- Clean up the signature_hash: This just selects so you can see what will happen: SELECT REPLACE(signature, CONCAT(':', signature_hash), '') AS signature FROM oldsite_users; ***And here's the UPDATE query to make the change once you've checked that it's right: UPDATE oldsite_users SET signature = REPLACE(signature, CONCAT(':', signature_hash), ''); ----------- Now that all the data is prepped, time to create the Drupal users: INSERT INTO users (uid, name, pass, mail, mode, sort, threshold, theme, signature, created, access, login, status, timezone, language, picture, init, data) SELECT user_id, username, password, email, '0', '0', '0', '', '', join_date, last_activity, last_visit, blocked_status, timezone, '', avatar, email, 'a:1:{s:7:"contact";i:0;}' FROM oldsite_users; ------- ** Might need: Run this query to set empty timezone fields to NULL: UPDATE user SET timezone = NULL WHERE timezone = '0'; ---------- Update the sequences table (Drupal 5 only): Next update the sequences table with a new row containing users_uid and the most recent/highest user ID number (change the below number as needed): Quick lookup with: SELECT MAX(uid) FROM users; INSERT INTO sequences (name, id) VALUES ('users_uid', '21374'); ----------- That's it for users for now - later after the forum is completely imported, the rest of the stored user data will be used while generating user profile nodes. ======================================= Content: Forum threads into Drupal nodes ======================================= Query to export all needed threads from the full OLD thread table. We'll export a separate version with the primary/first post to import separately, as running the multi-search and replace Automator script for taxonomy terms on the file containing the posts would be too intense and crash prone. Also bringing along the primary post's ID (for later use in removing the primary posts from a table for use with the comment table. Remember that threads that are placeholder/forwarders won't be included, so the number will not match, though you can add it up by checking how many forwarder threads there are in the original database by running SELECT * FROM `thread` WHERE t_moved_id NOT LIKE '0'; and adding it to the results of the export, which will match the total rows in the thread table): SELECT t.t_id, t.t_user_id, t.t_username, t.t_title, t.t_visible, t.t_sticky, t.t_flags, t.t_time, t.t_lastpost_time, p.p_id, p.p_post_hash, CONCAT('cat', t.t_forum_id, 'cat') AS forum_id FROM post p, thread t WHERE p.p_thread_id = t.t_id GROUP BY p.p_thread_id ORDER BY p.p_time INTO OUTFILE '/Users/Username/ConvertSite/threads.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'; When done exporting, run the 3-part OS X Automator workflow "Taxonomy Replace" --------------- Next export the primary/first post and the thread ID, posts to be added into the table later: SELECT t.t_id, p.p_post FROM post p, thread t WHERE p.p_thread_id = t.t_id GROUP BY p.p_thread_id ORDER BY p.p_time INTO OUTFILE '/Users/Username/ConvertSite/mainposts.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'; --------------- Query to set up table/fields and import the threads data (again, minus the main posts): First create the table to be imported into.... Login to local mysql: /Applications/MAMP/Library/bin/mysql -u username -p databasename (Note: the primary key is critical - otherwise things will take many hours instead of just seconds) CREATE TABLE oldsite_threads ( t_id int(10) NOT NULL, t_user_id int(11) NOT NULL, t_username varchar(60) NOT NULL, t_title varchar(128) NOT NULL, t_visible int(11) NOT NULL, t_sticky int(11) NOT NULL, t_flags int(11) NOT NULL, t_time int(11) NOT NULL, t_lastpost_time int(11) NOT NULL, p_id mediumint(8) NOT NULL, p_post_hash varchar(10) NOT NULL, t_forum_id int(10) NOT NULL, PRIMARY KEY (t_id), INDEX (p_id) ); ------- Then load the data into the table: First set this in case there are warnings we want to look at: SET max_error_count=1000; LOAD DATA LOCAL INFILE '/Users/Username/ConvertSite/threads.csv' INTO TABLE oldsite_threads FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' (t_id, t_user_id, t_username, t_title, t_visible, t_sticky, t_flags, t_time, t_lastpost_time, p_id, p_post_hash, t_forum_id); Note: if there are warnings, run SHOW WARNINGS; ------- Now time to import the main_post data.... First create a temporary table to hold the thread IDs and main posts: CREATE TABLE oldsite_mainposts ( t_id int(10) NOT NULL, PRIMARY KEY (t_id), main_post longtext NOT NULL ); Load the data into the oldsite_mainposts table: LOAD DATA LOCAL INFILE '/Users/Username/ConvertSite/mainposts.csv' INTO TABLE oldsite_mainposts FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' (t_id, main_post); ------ Now add the main_post column to the oldsite_threads table: ALTER TABLE oldsite_threads ADD main_post longtext NOT NULL AFTER t_title; ------- Now we'll merge the oldsite_mainposts table into oldsite_threads: UPDATE oldsite_threads nst, oldsite_mainposts nsp SET oldsitet.main_post = oldsitep.main_post WHERE oldsitet.t_id = oldsitep.t_id; -------- Now it's time to clean up a few things in the oldsite_threads table: Run this UPDATE to fix the t_flags column so 0 becomes 2 and 1 stays the same (this sets the comment locked option in Drupal, where 1 = locked and 2 = open, versus the old site's 1 = locked and 0 = open): UPDATE oldsite_threads SET t_flags = '2' WHERE t_flags = '0'; --------- Next we'll clean up the main_post column by stripping out the p_post_hash that's littered through all the BB Code in the posts: (Note: run a similar query on the oldsite_replies table) This just selects so you can see what will happen: SELECT REPLACE(main_post, CONCAT(':', p_post_hash), '') AS main_post FROM oldsite_threads; ***And here's the UPDATE query to make the change once you've checked that it's right: UPDATE oldsite_threads SET main_post = REPLACE(main_post, CONCAT(':', p_post_hash), ''); ----------------------- Now the data is prepped and it's time to make Drupal's nodes. First, run this INSERT query to create the nodes themselves: INSERT INTO node (nid, vid, type, title, uid, status, created, changed, comment, promote, moderate, sticky) SELECT t_id, t_id, 'forum', t_title, t_user_id, '1', t_time, t_lastpost_time, t_flags, '0', '0', t_sticky FROM oldsite_threads; ------- Next update the sequences table with two new entries containing node_nid and node_revisions and the identical most recent/highest node ID number for each (change the below number as needed): Quick lookup with: SELECT MAX(nid) FROM node; INSERT INTO sequences (name, id) VALUES ('node_nid', '95026'); INSERT INTO sequences (name, id) VALUES ('node_revisions_vid', '95026'); ------- Next run this query to create records in the node_revisions table to mirror all the current nodes: (Note: leaving teaser blank - may run the Retease module to generate them later... also leaving body blank until the next query since we're fetching it from a different table) INSERT INTO node_revisions (nid, vid, uid, title, body, teaser, log, timestamp, format) SELECT nid, vid, uid, title, '', '', '', changed, '1' FROM node; ----- Now we'll populate the body column in the node_revisions table: UPDATE node_revisions noderev, oldsite_threads ns SET body = main_post WHERE noderev.nid = ns.t_id; ----- Next run this query to update the term_node table with the taxonomy term IDs stored in the oldsite_threads table. INSERT INTO term_node (nid, tid) SELECT t_id, t_forum_id FROM oldsite_threads; ---------- And now, a query on the forum table: INSERT INTO forum (nid, vid, tid) SELECT t_id, t_id, t_forum_id FROM oldsite_threads; --------- Now it's time to build the node_comment_statistics table. If the bug in Devel module is worked out then use that (run path: devel/rebuild_node_comment_statistics) If not, then run this PHP code: http://drupal.org/node/137458 -------- ======================================= Content: Forum posts into Drupal comments ======================================= Run this query to export all Forum posts, with all the fields we need for comments: (Note: the substr is to trim all post titles to no more than 64 characters, as the Drupal comment subject field supports only 64 characters... username is also included - however there are 6999 posts -at this time- where the user ID is set to 0 - in the export these will be set to NULL. After importing to the comments table you'll want to change instances of NULL to just an empty field... these posts will show as authored by "Anonymous"). SELECT p_id, p_thread_id, p_user_id, username, SUBSTR(p_title,1,64) AS p_title, p_post, p_post_hash, p_time FROM oldforum.post LEFT OUTER JOIN oldsitedb.user ON user.user_id = post.p_user_id INTO OUTFILE '/Users/Username/ConvertSite/allposts.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'; --------------- This query shows the posts that have no related user in the user table (just to see): SELECT p_id, p_user_id, p_title, username FROM oldforum.post LEFT OUTER JOIN oldsitedb.user ON user.user_id = post.p_user_id WHERE user.user_id IS NULL; ------- Create the table to hold posts: CREATE TABLE oldsite_replies ( p_id mediumint(8) NOT NULL, p_thread_id mediumint(8) NOT NULL, p_user_id mediumint(8) NOT NULL, username varchar(60) NOT NULL, p_title varchar(64) NOT NULL, p_post longtext NOT NULL, p_post_hash varchar(10) NOT NULL, p_time int(10) NOT NULL, PRIMARY KEY (p_id) ); -------- Now load the post data into the table: LOAD DATA LOCAL INFILE '/Users/Username/ConvertSite/allposts.csv' INTO TABLE oldsite_replies FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' (p_id, p_thread_id, p_user_id, username, p_title, p_post, p_post_hash, p_time); --------- Now it's time to remove the primary posts so all that remains are replies: Make a new table called oldsite_replies and import all Forum posts where t_id = p_thread_id, and no group by so we get all of them (the primary post and all replies). Join the table against oldsite_threads where p_id = p_id between the two tables, and DELETE the rows from oldsite_replies that match. This should leave oldsite_replies will all the rows for posts "except" for the primary/first posts. For example, p_id 64 is a primary post... after running this query it should be deleted from oldsite_replies ... leaving post 67, etc as the comments for that thread. First here is a select query to illustrate what you're about to delete. The number of results should precisely match the total rows in oldsite_threads, and should consist of just the primary posts: (Just for learning... here's a demo of both kinds of Joins, same result).... SELECT oldsite_replies.* FROM oldsite_replies, oldsite_threads WHERE oldsite_replies.p_id = oldsite_threads.p_id; SELECT oldsite_replies.* FROM oldsite_replies INNER JOIN oldsite_threads ON oldsite_replies.p_id = oldsite_threads.p_id; -------- And here is the delete query. Backup the database first just in case: DELETE oldsite_replies.* FROM oldsite_replies INNER JOIN oldsite_threads ON oldsite_replies.p_id = oldsite_threads.p_id; -------- Now cleaning up the post_hash junk: This just selects so you can see what will happen: SELECT REPLACE(p_post, CONCAT(':', p_post_hash), '') AS p_post FROM oldsite_replies; ***And here's the UPDATE query to make the change once you've checked that it's right: UPDATE oldsite_replies SET p_post = REPLACE(p_post, CONCAT(':', p_post_hash), ''); --------- And now, insert the data into the comments table: (Note: the only field we're leaving out is "thread" - will need to come back to it later to get it set up correctly). INSERT INTO comments (cid, pid, nid, uid, subject, comment, hostname, timestamp, score, status, format, thread, users, name, mail, homepage) SELECT p_id, '0', p_thread_id, p_user_id, p_title, p_post, '', p_time, '0', '0', '1', '', 'a:1:{i:0;i:0;}', username, '', '' FROM oldsite_replies; ------- Next update the sequences table with two new entries containing name and id for a new row "comments_cid" and the identical most recent/highest comment ID (change the below number as needed): (Quick look up with SELECT MAX(cid) FROM comments; ) INSERT INTO sequences (name, id) VALUES ('comments_cid', '807802'); ------------------------------- Later need to deal with the thread field in the comments table: This does NOT work... need a more comprehensive solution that may have to run in PHP to make use of the int2vancode() and vancode2int() functions. See: http://api.drupal.org/api/function/comment_save/5 And refer to good PHP examples in the vbtodrupal and phpbb2drupal modules. UPDATE comments SET thread = CONCAT(id, '/');