Slugs: Decrufting Movable Type URLs

After upgrading to Movable Type 3.14, and before changing hosting, I really, really, needed to decruft the default URLs my old Movable Type installation used. Here is how I did it:


h3. Plugins needed: Key Values and MT-Regex
First, I installed Brad Choates’ “Key Values plugin”:http://bradchoate.com/weblog/2002/07/27/keyvalues – I then customized the display of Movable Type’s entry editing screen, so it displayed the “Keywords” field.
I also installed the “Regex Plugin”:http://bradchoate.com/weblog/2002/07/27/mtregex from Brad Choate, even if having it installed is *optional* for the scope of this document.
h3. Writing slugs: The Keywords field
After doing this, I proceeded to write manual slugs for each and every of the plus four hundred entries. The slug is written as a key-value pair. Example:
bc. url=decrufting
The goal of the slugs is to create good-looking and easily-typed URLs for an entry. For instance, this entry has the URL:
bc. http://virtuelvis.com/archives/2005/01/decrufting
Since this step is quite labour-intensive, it should be noted that this step is *semi-optional* — If you have two different weblog entries in the same month that have the same title, you will *need to write a slug for at least one of them.*
When you write your slugs, you should restrict yourself to using the characters a-z, 0-9, , _ and . My own slug writing strategy also excludes the underscore and the plus character. You should, even if permitted, avoid using capital letters, since most people who type URLs manually, have no notion that URLs are case sensitive, and type everything in lower-case.
h3. Changing the Archive URL strategy
I then changed the archive file naming strategy. This is done by going to “Weblog Config” → “Archive Files”.
My original “Individual Entry Archive” file template read
bc.
Depending on your own setup, the pad=”0″ section might be missing (you might want to make a mental noe of whether you use zero-padded URLs or not). I then changed this to:
bc.
The entire section needs to be on one line. You can download the “Individual Entry Archive filename template”:http://virtuelvis.com/download/2005/01/decrufting/archivefilename.tmpl
The other two archive types I use, are the monthly and category archives. The monthly archives previously read , which I changed to . Similarily the Category archive template read /index.html which was changed to /index.
h3. Setting up rewrites
I then created a new Index Template (“Templates” → “Create new index emplate”), with the output file set to archives/.htaccess (If your archive is located in a different directory, you might need to change this accordingly). This template has several purposes:
# It sets up the Default Type for files inside the /archvies/ directory to be text/html so that the crufty .html extension can be dropped.
# It sets up rewrites, so that when people try to access your old entries, for instance http://www.virtuelvis.com/archives/473.html, they will be sent to the new URL instead of the old one.
# Finally, a permanent rewrite is set up for any file ending in .html, so that visitors are sent to the proper archive URL. This means that when someone tries to access http://www.virtuelvis.com/archives/2005/01/index.html, they are redirected to http://virtuelvis.com/archives/2005/01/index instead.
bc.. DefaultType text/html
DirectoryIndex index index.html

Redirect permanent /archives/.html
archives/

RedirectMatch permanent /archives/(.*).html$ archives/$1
p. Notes:
* *Step 1:* If you are using PHP files, you might want to change this section so your PHP parser is invoked.
* *Step 2:* The start and end tags for MTEntries should be on separate lines. Everything in between *should be on one line*.
* *Step 3:* .html might need to be changed if you used another file extension in the past.
You can download the “.htaccess template”:http://virtuelvis.com/download/2005/01/decrufting/htaccess.tmpl
Set this template to be rebuilt automatically when index templates are rebuilt.
h3. Rebuild and test
By now, you are pretty much done. You should now *rebuild your entire Movable Type site.*
If everything went well, and you did not receive any errors during the rebuild, your Movable Type blog should now have user friendly URLs.
You should now test:
# Try accessing an individual entry archive by it’s old, numeric URL. You should be redirected to the new, cruft-free URL.
# Try accessing an individual entry archive by it’s new URL. You should end up on the correct page, and not receive any errors.
# Try accessing a category archive by it’s old URI with the .html extension. You should be redirected to the correct extensionless URL.
# Try accessing a category archive by it’s new URI. You should end up on the correct page, and not receive any errors.
# Repeat steps three and four, but for any of the date-based archives.
If everything worked well, you should now be able to do the following:
# Delete the old individual entry files, typically named 1.html and up.
# Delete the template used to create the .htaccess file. You should, however _not_ delete the htaccess file itself
# Delete any index.html files in your archive folder.
h3. Filename templates explained:
Create the appropriate directory for the files. If you use daily archives, you could optionally add /%d/ inside the format attribute:
bc.
Look in MTEntryKeywords for url key-value pairs:
bc.
Does the entry have a url key-value pair? If so, use this value for creating the filename:
bc.


If there is no manual URL specified in the keywords field of a weblog entry, use the munged entry title:
bc.


h3. Further expansions
If you installed the Regex plugin, it is possible to modify your other templates, so the index is stripped from links leading to any of your archive pages. I haven’t bothered, though.
There is one more thing you can do, and that is to create yearly archives, so your users can chop off the end of URLs and receive sensible pages instead of error messages or directory listings. This can be done with the “ArchiveYear”:http://mt-plugins.org/archives/entry/archiveyear.php plugin.
h3. Acknowledgements
* “Cruft-Free URLs in Movable Type”:http://diveintomark.org/archives/2003/08/15/slugs — Mark Pilgrim
* “The Ultimate Weblogging System, outlined”:http://mpt.phrasewise.com/2003/05/02#a507 — Matthew Thomas
* “How to recognize a Weblog tool by its permalinks”:http://mpt.phrasewise.com/2003/07/26#a534 — Matthew Thomas

7 Comments

  1. Do you really want a .htaccess file to be that long?! Why not easy rewrite those pages to a server side parsed page which can handle all permanent redirect redirects?

  2. Anne: The .htaccess for the archive folder here is 43KB, mostly redirects using simple string comparison, and I don’t think there’s much of a parse-time difference between this .htaccess directive and invoking the PHP parser for every single page access.

  3. Friendly URLs in Movable Type

    Arve has written a very nice tutorial covering how to set up Movable Type to use search engine and user friendly url’s. Not only does he show how to set up Movable Type so you can customise the url’s yourself,…

  4. Hi Arve, nice tutorial. I tried using the .htaccess method to create redirects similar to what you explained above, but in my case I always get a popup message in my browser that reads something like Redirection limit for this URL exceeded, and the redirection fails. I’ve since given up on the redirects and my site is littered with broken links…
    Any idea why?

  5. I’m not _entirely_ sure, but the “Redirect limit exceeded” seems to occur if there are too many redirects chained. Make sure that you aren’t doing something like this:
    * A redirects to B
    * B redirects to C
    * C redirects to A

  6. Hmm, never thought of that. I’ll let you know if it solves the problem when I try it out. Thanks.

  7. Thanks for this info. Two things to keep in mind, in regards to having all the redirects in the htaccess as opposed to having one redirect to a PHP script…
    1- In the case where you have a lot of files to redirect, and the htaccess file becomes rather large (I’m looking at one I am doing right now for a client which weighs in at +200K), Apache runs through the entire list for every single request that comes in. If done using PHP method, only requests matching the “old” string get sent on to PHP, and Apache has way less matching work to do.
    2- I am unsure how Google feels about PHP-based header redirects… so if you (or a client) are worried about your Google “juice/karma/rank”, you may want to research this… as I guess I must now.
    Either way, you only need to generate this list once since after you’ve updated your archiving scheme, all future archive pages use the new scheme. No need to redirect URIs that don’t exist.
    🙂