DESCRIPTION.rst 6.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178
  1. Internationalized Domain Names in Applications (IDNA)
  2. =====================================================
  3. Support for the Internationalised Domain Names in Applications
  4. (IDNA) protocol as specified in `RFC 5891 <http://tools.ietf.org/html/rfc5891>`_.
  5. This is the latest version of the protocol and is sometimes referred to as
  6. “IDNA 2008”.
  7. This library also provides support for Unicode Technical Standard 46,
  8. `Unicode IDNA Compatibility Processing <http://unicode.org/reports/tr46/>`_.
  9. This acts as a suitable replacement for the “encodings.idna” module that
  10. comes with the Python standard library, but only supports the
  11. old, deprecated IDNA specification (`RFC 3490 <http://tools.ietf.org/html/rfc3490>`_).
  12. Basic functions are simply executed:
  13. .. code-block:: pycon
  14. # Python 3
  15. >>> import idna
  16. >>> idna.encode('ドメイン.テスト')
  17. b'xn--eckwd4c7c.xn--zckzah'
  18. >>> print(idna.decode('xn--eckwd4c7c.xn--zckzah'))
  19. ドメイン.テスト
  20. # Python 2
  21. >>> import idna
  22. >>> idna.encode(u'ドメイン.テスト')
  23. 'xn--eckwd4c7c.xn--zckzah'
  24. >>> print idna.decode('xn--eckwd4c7c.xn--zckzah')
  25. ドメイン.テスト
  26. Packages
  27. --------
  28. The latest tagged release version is published in the PyPI repository:
  29. .. image:: https://badge.fury.io/py/idna.svg
  30. :target: http://badge.fury.io/py/idna
  31. Installation
  32. ------------
  33. To install this library, you can use pip:
  34. .. code-block:: bash
  35. $ pip install idna
  36. Alternatively, you can install the package using the bundled setup script:
  37. .. code-block:: bash
  38. $ python setup.py install
  39. This library works with Python 2.6 or later, and Python 3.3 or later.
  40. Usage
  41. -----
  42. For typical usage, the ``encode`` and ``decode`` functions will take a domain
  43. name argument and perform a conversion to A-labels or U-labels respectively.
  44. .. code-block:: pycon
  45. # Python 3
  46. >>> import idna
  47. >>> idna.encode('ドメイン.テスト')
  48. b'xn--eckwd4c7c.xn--zckzah'
  49. >>> print(idna.decode('xn--eckwd4c7c.xn--zckzah'))
  50. ドメイン.テスト
  51. You may use the codec encoding and decoding methods using the
  52. ``idna.codec`` module:
  53. .. code-block:: pycon
  54. # Python 2
  55. >>> import idna.codec
  56. >>> print u'домена.испытание'.encode('idna')
  57. xn--80ahd1agd.xn--80akhbyknj4f
  58. >>> print 'xn--80ahd1agd.xn--80akhbyknj4f'.decode('idna')
  59. домена.испытание
  60. Conversions can be applied at a per-label basis using the ``ulabel`` or ``alabel``
  61. functions if necessary:
  62. .. code-block:: pycon
  63. # Python 2
  64. >>> idna.alabel(u'测试')
  65. 'xn--0zwm56d'
  66. Compatibility Mapping (UTS #46)
  67. +++++++++++++++++++++++++++++++
  68. As described in `RFC 5895 <http://tools.ietf.org/html/rfc5895>`_, the IDNA
  69. specification no longer normalizes input from different potential ways a user
  70. may input a domain name. This functionality, known as a “mapping”, is now
  71. considered by the specification to be a local user-interface issue distinct
  72. from IDNA conversion functionality.
  73. This library provides one such mapping, that was developed by the Unicode
  74. Consortium. Known as `Unicode IDNA Compatibility Processing <http://unicode.org/reports/tr46/>`_,
  75. it provides for both a regular mapping for typical applications, as well as
  76. a transitional mapping to help migrate from older IDNA 2003 applications.
  77. For example, “Königsgäßchen” is not a permissible label as *LATIN CAPITAL
  78. LETTER K* is not allowed (nor are capital letters in general). UTS 46 will
  79. convert this into lower case prior to applying the IDNA conversion.
  80. .. code-block:: pycon
  81. # Python 3
  82. >>> import idna
  83. >>> idna.encode(u'Königsgäßchen')
  84. ...
  85. idna.core.InvalidCodepoint: Codepoint U+004B at position 1 of 'Königsgäßchen' not allowed
  86. >>> idna.encode('Königsgäßchen', uts46=True)
  87. b'xn--knigsgchen-b4a3dun'
  88. >>> print(idna.decode('xn--knigsgchen-b4a3dun'))
  89. königsgäßchen
  90. Transitional processing provides conversions to help transition from the older
  91. 2003 standard to the current standard. For example, in the original IDNA
  92. specification, the *LATIN SMALL LETTER SHARP S* (ß) was converted into two
  93. *LATIN SMALL LETTER S* (ss), whereas in the current IDNA specification this
  94. conversion is not performed.
  95. .. code-block:: pycon
  96. # Python 2
  97. >>> idna.encode(u'Königsgäßchen', uts46=True, transitional=True)
  98. 'xn--knigsgsschen-lcb0w'
  99. Implementors should use transitional processing with caution, only in rare
  100. cases where conversion from legacy labels to current labels must be performed
  101. (i.e. IDNA implementations that pre-date 2008). For typical applications
  102. that just need to convert labels, transitional processing is unlikely to be
  103. beneficial and could produce unexpected incompatible results.
  104. ``encodings.idna`` Compatibility
  105. ++++++++++++++++++++++++++++++++
  106. Function calls from the Python built-in ``encodings.idna`` module are
  107. mapped to their IDNA 2008 equivalents using the ``idna.compat`` module.
  108. Simply substitute the ``import`` clause in your code to refer to the
  109. new module name.
  110. Exceptions
  111. ----------
  112. All errors raised during the conversion following the specification should
  113. raise an exception derived from the ``idna.IDNAError`` base class.
  114. More specific exceptions that may be generated as ``idna.IDNABidiError``
  115. when the error reflects an illegal combination of left-to-right and right-to-left
  116. characters in a label; ``idna.InvalidCodepoint`` when a specific codepoint is
  117. an illegal character in an IDN label (i.e. INVALID); and ``idna.InvalidCodepointContext``
  118. when the codepoint is illegal based on its positional context (i.e. it is CONTEXTO
  119. or CONTEXTJ but the contextual requirements are not satisfied.)
  120. Testing
  121. -------
  122. The library has a test suite based on each rule of the IDNA specification, as
  123. well as tests that are provided as part of the Unicode Technical Standard 46,
  124. `Unicode IDNA Compatibility Processing <http://unicode.org/reports/tr46/>`_.
  125. The tests are run automatically on each commit at Travis CI:
  126. .. image:: https://travis-ci.org/kjd/idna.svg?branch=master
  127. :target: https://travis-ci.org/kjd/idna