November 10, 2014
Domains are a wacky, wacky world that goes much deeper than somenicelooking.com. Having played with them for a bit, I’d like to share some weird things I’ve seen on the way.
Before we dive in, we need to know what a domain is. Which of the following are domains?
The answer is… they are all domains. Most people probably know or can guess the first three - the first is Facebook, the second is BBC (which takes you back to their main site bbc.com) and the third is the Hong Kong version of Google. The fourth is a Hong Kong-based newspaper and the fifth is Twitter’s URL shortening service. The last 5 are surprisingly domains, although for the last 4 if you paste it in your browser you might be in for a bigger surprise (xn–… what the hell?)
1 minute primer on domains
Let’s use facebook.com as an example. The ‘com’ part is a top level domain, or TLD. It is the highest level of domains on the Internet, together with .net, .org and many others. Then, ‘facebook’ is a subdomain of ‘com’, along with ‘google’, ‘twitter’ and so on. It’s organized just like a file system - imagine at the bottom you have folders named ‘com’, ‘net’, ‘org’ etc., and you open the ‘com’ folder. Inside, you will find folders named ‘facebook’, ‘google’, ‘twitter’ etc. Just like many layers of folders, you can have many layers of subdomains and end up with this.is.such.a.stupid.long.domain.stirling.co (but please don’t do that).
Top level domains
Just now I mentioned com, net and org, and in the list of domains you can pick out uk, hk, cc, co, dk and jp. How did these TLDs come by? Can I make one (.stirling) and start pumping out domain name? (hi.stirling would be wonderful, but sadly, no)
TLDs have two origins - generic (gTLD) and country (ccTLD). A gTLD is a domain that is for general use or a theme. In fact, com derives from commercial, net comes from network and org from organization, so they were intended for commercial organizations, network-related companies and non-profit organizations, although nowadays com, net, org domains can be used for any purpose. 在线 is also a gTLD and is Chinese for ‘online’.
ccTLD, on the other hand, stands for country code TLD and every country is assigned one. uk, hk, cc, co, dk and jp are in this category. Let’s crack the easier ones first - uk is for United Kingdom, hk is for Hong Kong, dk is for Denmark and jp is for Japan. Now to the others:
When I first saw .co, I was wondering where the ‘m’ was. A typo? I think most people see ‘company’ and move on. Let me give you a hint: the country is in South America.
So, the answer is Colombia. Some people squint and see Columbia, but no, it’s Colombia. Why is Twitter using a Colombian domain? And actually, why am I using a Colombian domain? (I’m not Colombian) One reason, at least for me, is that nice-looking .com domains were registered. stirling.com was registered in 2011, so when I looked into getting a personal domain in 2013, I took the best choice I had - stirling.co. If I was building a startup, I wouldn’t mind shelling money for a good .com, but for a personal domain stirling.co is as good as it gets.
If you managed to guess Colombia for the last ccTLD, congrats! This one is way, way harder though. What if I told you it’s an Australian territory?
It’s Cocos Island. What? Like, does it even exist?
Yes, and it looks like this.
I polled some Hong Kong friends and no one knew that ‘cc’ stood for a territory far from Hong Kong, and many didn’t believe me it was some Australian-owned island in the middle of nowhere. Nevertheless, it has given the world a .cc domain option.
The weird dk domain
Just a dk domain is not very interesting, with plenty of domains like hungry.dk and fyens.dk that are pretty normal. The thing is, the ‘dk’ is there by itself. It’s not somedomain.dk, it’s just dk. Does it count as a domain?
Yes. And what’s more, try going to dk./ in your browser. It will load and jump to https://www.dk-hostmaster.dk/. Why?
The reason is technically, top-level domains are domains. ‘com’ and ‘facebook.com’ can both go to a website. However, for a domain to go to a website when you type it in your browser, your browser has to ‘know’ which IP address to go to. This is called an A or AAAA record. Facebook has set records that point to Facebook’s servers so that you can go to Facebook’s website when you type ‘facebook.com’. ‘com’, on the other hand, has no such record.
I picked dk because it’s the rare few TLDs that have an A or AAAA record.
The last 4 are the most interesting. These probably break a key rule in your head that domains are a-z, 0-9 and dash (-) only. What are Japanese or Chinese characters doing there?
So, long time ago, where the domain name system was created, not much thought was given that some people might not know English but still want to use the Internet. Therefore, it was natural that domain names were English-only. Later, the problem was recognized but the rule had been set and too late to change. How could they create domains that had non-English characters but still could be represented by English characters?
The solution is not so straightforward because whichever method you use to represent non-English characters as English characters, someone might want a domain in the English character representation. Let’s say your scheme encodes characters into its English equivalent, so pi stands in for π. Then people going to applepie.com would end up in appleπe.com . Making greekpi or greek-pi stand in for π would be somewhat better but the owner of bestgreekpies.com would be not so happy.
So any solution you have must result in an English representation that no sane person would want. The solution they came up with, Punycode, works somewhat like this:
- Split the domain into English (ASCII) and non-English parts For the domain gloriousπ.com, the parts of glorious and π.
- Perform voodoo to turn the non-ASCII part to ASCII Check out the Punycode article for details. To keep sanity, just know that π turns into geg.
- Glue back the 2 parts with a dash. Skip this step if there are no ASCII parts. We have glorious-geg.
- Add the vomit-including ‘xn–’ prefix. End result: xn–glorious-geg.com!
The ‘xn–’ basically kills any desire for the English representation. Hooray!
You can use this tool to convert IDNs into its ASCII representation.
Why do I see it as xn–…?
If you’re outside Asia, it’s quite likely the first two show in your browser as xn–eck7a4ccc3swbx358c0guf.jp and xn–1bs9y.xn–3ds443g. (even the TLD is non-English!) Some people in Asia will see them as it is. Why the difference?
The answer is that these domains can be used to do naughty things. Spot the difference:
Give up? The ‘а’ in fаcebook.com (xn–fcebook-2fg.com) is the Cyrillic small letter a while the ‘a’ in facebook.com is the ‘a’ you were expecting. The phishing possibilities are endless with faceboоk.com (xn–facebok-fjg.com), рaypal.com (xn–aypal-uye.com), bankofameriсa.com (xn–bankofameria-jhk.com).
To prevents such shenanigans, the browser decides whether to display the domain as is or in its mangled form based on which languages you likely know, whether the characters can be mistaken for other characters and so on. It tends to be safe than sorry which is why most people will see the mangled form and be very confused.
What about π or ☃?
π is 3.14159… but it’s easy to forget that’s it a Greek character as well. Greek friends rejoice that you see π.com! ☃, on the other hand, is not a character in any alphabet. Poor Unicode snowman will never appear in the address bar.
Seen some weird domains floating on the interwebs? Leave your comments below!
Short URL: https://😂.cf/domain