pyblosxom to WordPress conversion snags
• Chris Liscio
• Chris Liscio
Before I forget, I should list the snags I hit when moving my pyblosxom posts to WordPress. This post is mostly for posterity's sake, and any Googlers looking for help on the subject.
I chose to import my posts via an RSS 2.0 feed. If you already have the RSS 2.0 flavour defined, you're good. If not, or you just want something that'll get around the import bugs, download my RSS flavour.
The largest problem I encountered while importing came from the timestamps being imported incorrectly using the pubDate tag I set up. My times all had a -0500 time zone designation which totally screwed up the imported timestamps. So, I then poked around in the wordpress database and found that WordPress was importing the dates as local dates, but offset the dates to make them GMT time. Then, it created a GMT timestamp from the local time, which was even further off!
The solution to this problem was to mark my dates (incorrectly, but intentionally) as GMT in the RSS feed. Then, WordPress imports the dates properly. While this might be a bug on WordPress' behalf, it actually works around another bug in pyblosxom's lack of proper time zone reporting in the RSS feed (which really only affects those of us in a time zone with daylight savings). So, no complaints from me here.
Now, once the data made it into WordPress, I noticed that there were extra <br/> tags showing up in my posts. It turns out that the importer decided that it should retain newlines in the RSS description tags, and then when WordPress rendered the pages it would convert the newlines to <br/> tags.
The solution to this mix-up was to manually make some changes in the wordpress database:
update wp_posts set post_content=REPLACE(post_content,"\n"," ");
Note that the above replacement changes newline characters to a space character. Changing it to the empty string (i.e. "") would concatenate words together in many instances.
The final snag was that the CDATA block termination would appear at the end of every post. The solution to that was yet another database tweak:
update wp_posts set post_content=REPLACE(post_content,"]]>","");
After doing all of the above, my blog was completely moved over with few issues. Some of the really old posts have some pretty bad issues (some with completely missing content), but over time I will fix those manually.
Hopefully this helps some of you make the switch.