I like my URLs to be semantic, it helps with SEO and it helps users to know what a page is about based on the URL. Today I was looking over one of my old posts and found that the TM is added to the URL. In the admin UI the title looks like this:
Title in the Admin UI
Notice that I have used the & in html in the tiled. This is stripped out by the automatic URL generating engine of WordPress. However the ™ as a unicode character is not removed. Some languages with non-roman scripts need Unicode in the titles, so not all unicode characters should be disallowed in the titles. In fact, all Unicode characters should be allowed in the title field. Sometimes unicode in the URL is allowed, however it is not always best practice (unicode above the ASCII range). I in this case it should not be allowed by WordPress. I have my permalink settings set to custom. I do /%year%/%postname%/.
However, when a unicode character is put into the postname, it is not necessarily striped out. My contention is that some characters should be, or that more characters should be. The problem for users is that the unicode character gets processed to the browser’s URL bar and looks like the following: http://hugh.thejourneyler.org/2010/selected-works™-bepress/ .
However, when the user selects the url to copy it they do not get a URL which is paste able the same as when they saw it in the URL bar, they get something like the following: http://hugh.thejourneyler.org/2010/selected-works%E2%84%A2-bepress/ .
One solution might be for authors to use the following HTML markup in the title:
But this is not user intuitive or presenting a “thoughtless process for end users/authors”.