Collect all characters you use on a page, a bookmarklet

Yesterday Peter van Grieken asked on Twitter if anybody knew a tool that displays all the different characters that are used on a page. This would be a handy tool for font subsetting: if your page only uses a few characters you don’t need to add all those other characters in your fancy font-files. Quite a few people answered to Peter’s tweet, but as far as I can tell nobody linked to a tool that does this. So I decided to create this tool.

It’s a bookmarklet. If you drag this list-letters bookmarklet to the bookmarks bar of a browser that supports innerText and then click it, you should see a textarea with all the characters in use on the page. I havent thoroughly tested it. And I’m not a JavaScript nerd. So I’m almost certain that this code can be improved. For instance, the only way to close the textarea right now is by refreshing the page. I’m sure you know a more elegant way to do that. The other thing it doesn’t do is collecting the characters that are used by ordered lists. I created a gist of the code for you to play with. Please let me know if you improved it! Sander Aarts improved my original gist with this nice piece of code that even orders the characters alphabetically. Let’s see if you can improve that even more.

Bram Stein reminded me that working with Unicode isn’t as easy as it seems. Make sure to take a look at his CharacterSet library if you need a more robust solution.