You are here :
-
Public Consultations
-
Reference
-
Statistics
-
Publications
-
Blog
- Brands answer the call to the 2nd ‘Cercle des .marque’ event
- Analysis of the .RE
- About the attack on French ISPs’ DNS resolvers
- Using Afnic open data : example with the term COVID
- Hosting a domain name with compound characters
- Eligibility of a holder located in the United Kingdom post Brexit
- Can compound characters be used in a domain name?
- Functioning of Afnic during lockdown
- Which Top Level Domains have an IP address?
- Lala Andriamampianina, may you rest in peace
- Resolutions for 2020: Afnic goes elliptic
- 6 tips to prevent your website from being hacked
- In search of low-cost nTLDs
- Exploring the city through the .paris community
- .org - an alternative perspective
- Looking back on the success of the first meeting of the Cercle des .marque
- Key success factors for Internet extensions: an evaluation grid
- [Video] Conclusions on the Internet Governance Forum (IGF) France 2019
- A brief example of using Afnic Open Data
- Food for thought on the "new TLD" business models
- 30 years of success and danger: the Web, URLs and the future
- [Success stories] Strengthen your infrastructure to suit your ambitions
- February 1, 2019: is the DNS going to shake?
- [Success stories] They chose to have their own TLD
- [Success stories] .museum, how a historic Internet suffix was revived
- The main steps in effectively launching your .brand
- 6 secrets on how to improve the renewal of domain names
- [Video] Back to IGF 2018 in Paris
- A .BRAND to enhance customer experience
- Afnic commits to DNS security at the international level
- Replacement of the KSK of the root zone: Are you ready?
- How the SNCF implemented its new digital strategy with oui.sncf
- Franco-Dutch research project on automatic classification of domain name abuse
- The auditive memorization of domain names
- What are the possible actions against domain name abuses?
- Identity theft by domain name: what Afnic does
- Cybersquatting, Spam, Phishing… the different types of domain name abuses
- [Video] Review of the French Internet Governance Forum 2018
- Custom Internet extensions: the opportunities for brands
- How to avoid inadmissibility in the SYRELI procedure
- Which English terms are most used in .FR domain names?
- Domain name security, the example of cryptocurrencies
- What are the terms most used in .fr domain names?
- Personality test: Are you ready for GDPR?
- Do GeoTLDs like .alsace have an effect on local SEO?
- The 11 vital locations to display your domain name!
- What means of action for a Right-holder ineligible under the Naming Policy?
- Domain name litigation: the recognition of an AOC rights in the SYRELI procedure
- Why choose a domain name under a geoTLD?
- Afnic, a community first and foremost!
- The defense of personality rights in the SYRELI procedure
- When will the next round of the new gTLDs take place?
- A million good reasons for coming to the Afnic Forum...
- Yeti DNS-over-TLS public resolver
- 2016, the beginning of a new cycle for Afnic
- .fr has just passed the 3 million domain names milestone
- My experience inside the Afnic Legal Department
- Future of ICANN Privatization? Internationalization? Supervision?
- Excellence at Afnic - Our coming-out
- Speech at the transmittal of the IANA Stewardship Transition Plan
- Exclusive offer: 100% money back on your domain name*!
- 8 tips for choosing the right domain name
- IPv6 and DNSSEC are respectively 20 and 19 years old. Same fight and challenges?
- L.45-2 paragraph 1 of the CPCE: When a domain name disrupts the French law
- How to avoid getting your domain name stolen by email?
- Accountability and IANA transition: behind the scenes
- Stop selling domain names!
- abc.xyz : erratum.xyz
- A comprehensive approach to French regional branding
- abc.xyz : Meanwhile, back in France…
- abc.xyz: Why not alphabet.com? (The conspiracy theory version)
- abc.xyz : The controversial success of .xyz
- Corporate Communications, Constant Crisis
- abc.xyz : Why not alphabet.com ?
- alphabet.xyz : How Alphabet got its domain name
- abc.xyz : Don't worry, we're still getting used to the name too!
- IANA transition crosses a major milestone in Buenos Aires
- A day in the life of the Icann empowered community
- IANA transition : the machine is moving, but the deadline is approaching
- Corporate Social Responsibility and the DNA of ccTLDs
- China Changing in Leaps and Bounds
- Towards a less intrusive DNS
- ICANN: what does accountability stand for?
- ICANN Singapore. A debate at the other end of the world
- ICANN Reform, or opening Pandora's box
- Internet Governance Forum: What is to be done?
- Slam spam!
- Icann : freeze !
- Scams and identity theft, the experience of a SYRELI reporter
- French Regional Reform Does Not Mean the End of GeoTLDs
- Lessons Learnt from NETmundial
- Suggestions for a successful IANA transition
- Wind of change at Afnic!
- Back to the future of the Afnic Legal Service
- The US Backs ICANN for Internet Governance
- Should the registrars streamline their gTLD strategy?
- The IANA elephant in the room
- 2014 : change of course for the naming system
- Why do regions want a place online?
- What can Afnic do?
- Internet governance: let’s get to work!
-
FAQ
-
Glossary
-
Certificates
Hosting a domain name with compound characters
08 June 2020 - By Stéphane Bortzmeyer
In a previous article, we saw that it is perfectly possible to use compound characters in a domain name. Examples of this are réussir-en.fr, académie-française.fr, and many others. These names are handled like any other domain name and can be used, for instance, online in URLs –or web addresses– such as http://réforme-retraites.gouv.fr/. For the end user, they are just like other domain names and have no distinguishing features, unless the software is very old or contains bugs. However, for a technician configuring the software behind the hosting of these names, and the associated services, this is not always the case, and often they have to be handled differently.
This article is therefore intended for those technicians, for example, the system administrator of a HTTP server used to serve a website whose domain name contains these compound characters (also called "diacritical characters" or "Unicode characters"). It focuses on free software, like Apache or Nginx, since they are essentially the basis for Internet services infrastructure.
Quick technical reminder
First, let's review very quickly these IDN (Internationalized Domain Names). The principle is that the DNS (Domain Name System) will only handle LDH (Letters-Digits-Hyphens) names, which may contain ASCII standard letters only, meaning that no compound characters are permitted. This was to ensure that old software would never contain compound characters. Names in Unicode (the technical term is U-label) are therefore encoded in Punycode, a coding that can be used to represent any name in LDH (the technical term is A-label). So, académie-française.fr (the U-label) will be represented in Punycode by xn—acadmie-franaise-npb1a.fr (the A-label). In an ideal world, the system administrator would only have to handle names in Unicode. But in the real world, many software programs require the administrator who configures them to use the Punycode format.
Fortunately, there are tools to facilitate the conversion from one format to another. GNU libidn2, for example, comes with a command line tool, idn2, which enables these conversions:
% echo académie-française.fr | idn2
xn—acadmie-franaise-npb1a.fr
% echo xn--ducation-90a.gouv.fr | idn2 -d
éducation.gouv.fr
Why the 2 at the end? Because this is now version 2 of the IDN standard. The earlier tool, simply called "idn", managed version 1. Problems may arise with names that behave differently with IDN version 1 and version 2. This is the case for the German ß (eszett), which is used in four .fr names:
% echo außensteckdose.fr | idn2
xn—auensteckdose-cdb.fr
% echo außensteckdose.fr | idn
aussensteckdose.fr
The ß was changed to "ss" in IDN 1, whereas it remains unchanged in IDN 2. Another example in which differences may occur is that of scripts that were not included in the Unicode standard until after the release of IDN version 1. This is the case with Tifinagh script, which simply does not work in IDN 1:
% echo "ⴰⵣⵓⵍ.bortzmeyer.fr" | idn2
xn--4lj0cra7d.bortzmeyer.fr
% echo "ⴰⵣⵓⵍ.bortzmeyer.fr" | idn
idn: idna_to_ascii_4z: String preparation failed
And there is another problem with the eszett, which is that the round-trip (i.e. translating the U-label into an A-label and then from an A-label into a U-label) is not possible in IDN version 1, where the A-label becomes a standard ASCII domain name. This explains, for example, the Python programming language error message "UnicodeError: ('IDNA does not round-trip', b'xn--auensteckdose-cdb', b'aussensteckdose')".
DNS
Now let's get to work and create these names. You can create them in a subdomain of an existing domain (like ⴰⵣⵓⵍ.bortzmeyer.fr above) or register them with a registry that accepts these characters (not all of them do, and if they do, not necessarily all the possible characters are accepted. Check with the registry).
In the first case, it all depends on the software you use to provision your domain names, which may or may not handle Unicode names well. A-labels must then be used. For example, if you edit a zone file with standard syntax directly, the DNS server will probably not accept Unicode and you will have to use the Punycode format in the zone file (hence the advantage of the idn2-type tools mentioned above). Below is an example, with a comment that shows the Unicode name:
; ⴰⵣⵓⵍ
xn--4lj0cra7d IN CNAME serveur.internautique.fr.
The DNS actually allows any characters in a domain name, and a Unicode name, with a UTF-8 type encoding, would probably be accepted as is by the server, which would confuse applications prompting them to convert it into Punycode.
If you register a name with a domain name registry, you will often go through a registrar. So it all depends on the registrar and its software. I’ve tested two major .fr registrars and in both cases, everything worked fine. The web interface lets me type and read names in their normal Unicode format, which is definitely more user-friendly than Punycode. Note that some registries will require you to indicate which script is used for the name and do not permit script mixes.
I’ve also tested the API of a major domain name registrar and was pleasantly surprised to see that IDNs were handled correctly. I was able to send Unicode (U-labels) and everything worked correctly.
If you host this domain name on your own name servers, this, again, will depend on the software used. You may be required to configure the name server using A-labels. And remember that, contrary to widely held myth, the DNS has in fact always allowed any characters, and is not restricted to the ASCII standard. If café.example is put in a zone file, the name server does not necessarily know whether the U-label should be translated into Punycode or kept as is. This is the second behaviour adopted by some servers, like BIND, which can cause surprises.
Once the name has been registered, several DNS clients manage the IDNs to query it. With the classic dig, in version 9.11:
The same applies for kdig, in version 2.7.6.
Drill (version 1.7.0), however, does not understand and does not manage the name correctly. It can be argued that this is a debugging tool, designed for computer engineers, and so it is not necessary for it to do the same as what can be done with a short Unix shell:
Lastly, other DNS clients are implemented in the form of a web page and, for example, the DNS Looking Glass manages the IDNs: see https://dns.bortzmeyer.org/réussir-en.fr.
Whois
You may also want to use other domain name-related services, such as whois. The GNU whois client has no problem with IDNs:
% whois potamochère.fr
%% This is the AFNIC Whois server.
domain: potamochère.fr
domain-ace: xn--potamochre-66a.fr d
omain-idn: potamochère.fr
registrar: GANDI
created: 2013-09-09T12:12:45Z
last-update: 2019-08-09T09:26:17Z
The same applies with other interfaces for finding information on a domain name, for example, via the Web (in this case, at Afnic), in which case names in Unicode are properly managed.
Web
Obviously, a domain name is not just created to insert information in the DNS. It is intended to be used for services, to create an online presence. Let's take the example of setting up a website. Again, the question of whether attractive U-labels (café-bien-serré.fr) can be used instead of unattractive A-labels (xn—caf-bien-serr-dhbk.fr) will depend on the software used. With Nginx (version 1.16.1), the Punycode format (xn—caf-bien-serr-dhbk.fr) must apparently be used in the server_name directive of the configuration file. Apache (version 2.4.38) allows you to use the normal Unicode format in the ServerName directive. The configuration file can be named with the Unicode name (e.g. www.potamochère.fr.conf), as Apache directives such as Include allow this.
But be wary of various utility programs and scripts written too quickly. The a2ensite script on the Debian operating system only works with LDH names (it does not permit non-ASCII characters). If the symbolic links required are available, there is no problem with Unicode. On the other hand, server directives, such as Redirect on Apache, require the A-label (Punycode) to be indicated, otherwise the Unicode is sent to the client, some of which, such as curl, do not understand the redirecting.
What about the web clients for testing? curl and wget cause no problems with Unicode:
curl prefers the IDN: even when using the -v (verbose) option, curl continues to display the Unicode format of the name, which is not the case with wget.
Note that all HTTP clients send the name in Punycode format in the Host: header. This doesn't matter as this HTTP dialogue is not seen by users directly. Incidentally, note that it is hard to rely on technical standards to know what should appear in the Host: header, as they are very complex in this respect.
And what about monitoring software like Nagios or Icinga? The monitoring plugin check_http parameters require Punycode. If another encoding is used, it is not processed and is sent as is, which typically causes an HTTP 400 error (invalid request).
Certificates
What about certificate requests? If you want your website to be authenticated, you need a certificate for your domain name and, depending on the certification authority you use, you will have to request it in the "normal" format (académie-française.fr) or the Punycode format (acadmie-franaise-npb1a.fr). Note that, in the case of the example provided, académie-française.fr, it seems there is no certificate, but I expect there will be one day.
For example, the certification authority Let’s Encrypt does not permit Unicode names ("Domain name contains an invalid character"). Everything must be in Punycode.
The same applies for some very useful services when handling certificates such as crt.sh, a web interface for accessing Certificate Transparency service logs. If Unicode is entered, crt.sh simply indicates that it did not find a certificate, and it should have been in Punycode.
The problem with email is different to that of the Web. It is an older technology and, since there is no end-to-end communication, it is more difficult to negotiate with your correspondent. Note also that there are two separate problems in email addresses, one for the local part of the name (stéphane, in the hypothetical address stéphane@internet-en-coopération.fr), and one for the domain name. Punycode only applies to the domain name.
The general framework for email addresses in Unicode is called EAI, which stands for Email Addresses Internationalization, and has been standardised since 2012. But in practice, it has to be said that it is not very reliable: not many software programs are configured to handle it, and there is little possibility of your IDN domain being used for email. As the web interface of a registrar states when registering an IDN domain, "Please note that email addresses may not work with a domain name containing one or more special characters [sic]".
Conclusion
Ideally the system administrator, like the ordinary user, could handle normal Unicode and never see the Punycode format containing xn--. But this is clearly not the case today, and various reasons relating to Internet inertia and the need not to break pre-existing habits mean that in practice, we need Punycode and must be prepared to see and handle it.
Is this domain
available ?
News
- December 10, 2020 Three major projects on the roadmap of the Afnic International College
- November 23, 2020 Lucien Castex has been reappointed as a member of the Multistakeholder Advisory ...
- November 17, 2020 Marianne Georgelin joins Afnic's Executive Committee as Legal Director
- November 16, 2020 ‘Je passe au numérique’: the Afnic initiative for VSEs/SMEs
- November 12, 2020 The Afnic Foundation announces its 2020 winners promoting an inclusive Internet ...