TM in the URL for WordPress

I like my URLs to be semantic, it helps with SEO and it helps users to know what a page is about based on the URL. Today I was looking over one of my old posts and found that the TM is added to the URL. In the admin UI the title looks like this:

Title

Title in the Admin UI

Notice that I have used the & in html in the tiled. This is stripped out by the automatic URL generating engine of WordPress. However the ™ as a unicode character is not removed. Some languages with non-roman scripts need Unicode in the titles, so not all unicode characters should be disallowed in the titles. In fact, all Unicode characters should be allowed in the title field. Sometimes unicode in the URL is allowed, however it is not always best practice (unicode above the ASCII range). I in this case it should not be allowed by WordPress. I have my permalink settings set to custom. I do /%year%/%postname%/.

permalink settings

permalink settings

However, when a unicode character is put into the postname, it is not necessarily striped out. My contention is that some characters should be, or that more characters should be. The problem for users is that the unicode character gets processed to the browser’s URL bar and looks like the following:
http://hugh.thejourneyler.org/2010/selected-works™-bepress/ .
However, when the user selects the url to copy it they do not get a URL which is paste able the same as when they saw it in the URL bar, they get something like the following:
http://hugh.thejourneyler.org/2010/selected-works%E2%84%A2-bepress/ .

One solution might be for authors to use the following HTML markup in the title:

  • ™
  • ™

But this is not user intuitive or presenting a “thoughtless process for end users/authors”.

Finding that Apple command symbol

I have always wanted to be able to type the ⌘ symbol for various reasons, including writing tutorials, but I have not know how to access it through my keyboard. A few, general, related notes:

  1. There is a nice wright up including some history on the Command Key, ⌘ on wikipedia.
  2. How Apple Keyboards Lost a Logo and Windows PCs Gained One
  3. PopChar is an application which helps users find obscure characters.
    PopChar

    PopChar is a utility for helping users find the Characters they are looking for

    This functionality is built in to OS X with Character Viewer, though it is likely that PopChar extends the user experience in some way.
    CharacterViewer-with-highlight

    OS X Character Viewer

    Shiftkey-in-characterViewer

    Shift Key in Character Viewer

  4. This discussion on the Apple Forums talks about a way to put these symbols in Pages’ auto correction so that Pages will auto correct a set of characters typed to the symbol desired. I have seen this used in MS Word too.
  5. A table of Unicode characters corresponding to Macintosh keyboard symbols, as they commonly appear in menus.
  6. The Next two Links are more detailed but like the above.

  7. Special Key Symbols
  8. Apple Keyboard Symbols
  9. Marginally relevant:

  10. Multi-stroke Key Bindings
  11. Keystroke mapping explained by SIL’s NRSI.

It is unicode point 2318 (the html hex code is ⌘ ) and so you can find it in the character palette under:

  • Code Tables>Unicode>2300>2318
  • or you can go into

  • All Characters>Symbols>Technical Symbols

.

Apple ⌘ symbol

Apple ⌘ symbol

There are a few other ways to get at it, but that should do it for you.

On OS X, if you switch your keyboard to Unicode Hex Input, then holding down opt allows you to type the four digits for a unicode symbol and get the ⌘ (2318).

The Alt/Option Symbol has also been elusive. It can be fount at Unicode point 2325. U+2325.

Alt Key U+2325

Alt Key U+2325

Unicode and Hex Keyboard symbols
⌘ – ⌘ – ⌘ – the Command Key symbol
⌥ – ⌥ – ⌥ – the Option Key symbol
⇧ – ⇧ – ⇧ – the Shift Key (really just an outline up-arrow, not Mac-specific)

⇥ – ⇥ – ⇥ – the Tab Key symbol
⏎ – ⏎ – ⏎ – the Return Key symbol
⌫ – ⌫ – ⌫ – the Delete Key symbol

SSH, Unix commands & RegEx

This summer I am sitting in on a computational linguistics course. It is the first instruction I have had about UNIX. Pretty Awesome.
This has required me to do some googling looking from terminal commands.

This is kind of a sketch of where I have been.

UNIX:
http://www.osxfaq.com/Tutorials/LearningCenter/

SSH:
http://kimmo.suominen.com/docs/ssh/
http://ss64.com/osx/

TERMINAL:
http://homepage.mac.com/rgriff/files/TerminalBasics.pdf

grep:
http://www.thegeekstuff.com/2009/03/15-practical-unix-grep-command-examples/
http://en.wikipedia.org/wiki/Grep
http://www.computerhope.com/unix/ugrep.htm

Regular Expressions:
http://www.zytrax.com/tech/web/regex.htm
http://www.regular-expressions.info/tutorial.html
http://gnosis.cx/publish/programming/regular_expressions.html

RegEx and Unicode:
One of the issues that I have had with RegEx has been what is a natural class? i.e. [A-Z], [A-Za-z], [0-9], etc. As a linguist I deal a lot with IPA characters, subscripts, superscripts, unicode, and diacritics. How am I to define a natural class with these? Can I define a natural class based on the phonology of the language?

So I did some more searching:
http://unicode.org/reports/tr18/
http://unicode.org/reports/tr18/tr18-5.1.html
http://icu-project.org/docs/papers/iuc26_regexp.pdf
http://courses.ischool.berkeley.edu/i256/f06/papers/regexps_tutorial.pdf
http://wapedia.mobi/en/Regular_expression?t=5.

RegEx+PERL+Unicode:
http://perldoc.perl.org/perlretut.html

PERL:
http://www.enginsite.com/Library-Perl-Regular-Expressions-Tutorial.htm
http://www.cgi101.com/book/connect/mac.html
http://www.mactech.com/articles/mactech/Vol.18/18.09/PerlforMacOSX/index.html

Python:
http://www.amk.ca/python/howto/regex/