Migrating From Wordpress To Hugo
This site has existed in some form since 2012. Way back then, I started the site with a view to sharing PowerShell scripts I was using to manage some very large analysis services cubes. And because I just wanted to get up and running with content and not worry about the look or the site, I decided to go with a Wordpress site. And because I wanted a custom domain I promptly chose to have a premium Wordpress site.
I was very happy with the premium website, and even began customising the look of the thing once I got some content up and running. In fact, using the magic of the wayback machine I can show you how it looked back in May 2013.
However, I wanted to move from the Wordpress hosted to the self hosted Wordpress option. The main reason I wanted to move was because there were a few widgets in the self hosted option I could not use with my current setup. And so I decided to start a brand new blog with a view to migrating all my posts eventually to the new site.
Very soon after I had started, I realised that (to borrow a phrase) I had made a huge mistake. And the mistake I realised I had made was that these widgets really made no overall improvement to what I wanted to do, which was (and still is) write blog posts of things. And so I wanted to get off Wordpress and onto something simpler.
One of the things about a self hosted site is that there’s a cost up front, and the cheaper you want the deal per month the longer you have to commit to a company. And so I felt committed to sticking with my mistake. But then I remembered the idea of the sunken cost fallacy, so I bit the bullet and started up a Hugo site which is hosted in Gitlab pages so that I can use my custom domain. And all of this is free! I also moved my custom domain email to Zoho. So now my cost of running this site has gone from a couple of hundred a year to nothing. Cool!
The real pain point though was how was I going to migrate my content from Wordpress to markdown pages? I’ve got over 400 posts from two different Wordpress websites to move. I decided that the most pragmatic thing would be is to carry on writing new posts, and deal with the migration later. And finally, after starting this migration back in February, today I finally completed the move. And because someone somewhere may well be starting this painful journey, I thought I’d do what I’ve always wanted to do with this site and share some PowerSHell in the hope that it makes someones life easier.
The first thing I did it export my old site using the Wordpress admin console. This exports the entire contents of your website to a single xml file. It is frankly massive, with loads of content and meta data. Then I altered a few of the elements: I altered
<dc:creator> to just
<creator> and altered
<content><![CDATA[ , and finally
<status>. The reason here is that I’m planning on munging through a great deal of xml into markdown, and I went with the brute force and pragmatic approach to clearing up the xml. I also did a simple find and replace for any double spaces and replaced them with single spaces as they would end up looking like some funky special character post-munge.
With that out of the way I then ran the PowerSHell below. This will take each published post by the creator (ie my Wordpress login name) and create a separate markdown file with the required metadata set up for use with Hugo.
Simple enough to get us started. But now I need to alter the image tags and change the source code. So now I need to download the imags from my site! For this, I used ExtremePictureFinder. It did not find all the images, but again, brute force and pragmatism are the order here. It also quite nicely ordered them by year. So altering the image urls was a bit easier than I thought it was.
If at any point the munging fails then it will not add the file to the new folder, as it will most likely fail a hugo build. But out of 326 posts I only lost about a dozen, so not bad. Having looked over which ones were broken I decided not to fix them. As long as the image files are placed in the correct place in your assets they will link fine. Mine did with no issues. So now to check in all posts and images, and now my site has the (almost) complete history of my blog posts.
It wasn’t nearly as painful as I thought it was, but it took a little while to figure out how to do this in bulk, but it sure saved a lot of effort.