html.py 19 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617
  1. # -*- encoding: utf8 -*-
  2. #
  3. # $Id: html.py 5409 2011-06-29 07:07:25Z rjones $
  4. # $HeadURL: svn+ssh://svn/svn/trunk/api/eklib/html.py $
  5. #
  6. '''Simple, elegant HTML, XHTML and XML generation.
  7. Constructing your HTML
  8. ----------------------
  9. To construct HTML start with an instance of ``html.HTML()``. Add
  10. tags by accessing the tag's attribute on that object. For example:
  11. >>> from html import HTML
  12. >>> h = HTML()
  13. >>> h.p('Hello, world!')
  14. >>> print h # or print(h) in python 3+
  15. <p>Hello, world!</p>
  16. You may supply a tag name and some text contents when creating a HTML
  17. instance:
  18. >>> h = HTML('html', 'text')
  19. >>> print h
  20. <html>text</html>
  21. You may also append text content later using the tag's ``.text()`` method
  22. or using augmented addition ``+=``. Any HTML-specific characters (``<>&"``)
  23. in the text will be escaped for HTML safety as appropriate unless
  24. ``escape=False`` is passed. Each of the following examples uses a new
  25. ``HTML`` instance:
  26. >>> p = h.p('hello world!\\n')
  27. >>> p.br
  28. >>> p.text('more &rarr; text', escape=False)
  29. >>> p += ' ... augmented'
  30. >>> h.p
  31. >>> print h
  32. <p>hello, world!<br>more &rarr; text ... augmented</p>
  33. <p>
  34. Note also that the top-level ``HTML`` object adds newlines between tags by
  35. default. Finally in the above you'll see an empty paragraph tag - tags with
  36. no contents get no closing tag.
  37. If the tag should have sub-tags you have two options. You may either add
  38. the sub-tags directly on the tag:
  39. >>> l = h.ol
  40. >>> l.li('item 1')
  41. >>> l.li.b('item 2 > 1')
  42. >>> print h
  43. <ol>
  44. <li>item 1</li>
  45. <li><b>item 2 &gt; 1</b></li>
  46. </ol>
  47. Note that the default behavior with lists (and tables) is to add newlines
  48. between sub-tags to generate a nicer output. You can also see in that
  49. example the chaining of tags in ``l.li.b``.
  50. Tag attributes may be passed in as well:
  51. >>> t = h.table(border='1')
  52. >>> for i in range(2):
  53. >>> r = t.tr
  54. >>> r.td('column 1')
  55. >>> r.td('column 2')
  56. >>> print t
  57. <table border="1">
  58. <tr><td>column 1</td><td>column 2</td></tr>
  59. <tr><td>column 1</td><td>column 2</td></tr>
  60. </table>
  61. A variation on the above is to use a tag as a context variable. The
  62. following is functionally identical to the first list construction but
  63. with a slightly different sytax emphasising the HTML structure:
  64. >>> with h.ol as l:
  65. ... l.li('item 1')
  66. ... l.li.b('item 2 > 1')
  67. You may turn off/on adding newlines by passing ``newlines=False`` or
  68. ``True`` to the tag (or ``HTML`` instance) at creation time:
  69. >>> l = h.ol(newlines=False)
  70. >>> l.li('item 1')
  71. >>> l.li('item 2')
  72. >>> print h
  73. <ol><li>item 1</li><li>item 2</li></ol>
  74. Since we can't use ``class`` as a keyword, the library recognises ``klass``
  75. as a substitute:
  76. >>> print h.p(content, klass="styled")
  77. <p class="styled">content</p>
  78. Unicode
  79. -------
  80. ``HTML`` will work with either regular strings **or** unicode strings, but
  81. not **both at the same time**.
  82. Obtain the final unicode string by calling ``unicode()`` on the ``HTML``
  83. instance:
  84. >>> h = HTML()
  85. >>> h.p(u'Some Euro: €1.14')
  86. >>> unicode(h)
  87. u'<p>Some Euro: €1.14</p>'
  88. If (under Python 2.x) you add non-unicode strings or attempt to get the
  89. resultant HTML source through any means other than ``unicode()`` then you
  90. will most likely get one of the following errors raised:
  91. UnicodeDecodeError
  92. Probably means you've added non-unicode strings to your HTML.
  93. UnicodeEncodeError
  94. Probably means you're trying to get the resultant HTML using ``print``
  95. or ``str()`` (or ``%s``).
  96. How generation works
  97. --------------------
  98. The HTML document is generated when the ``HTML`` instance is "stringified".
  99. This could be done either by invoking ``str()`` on it, or just printing it.
  100. It may also be returned directly as the "iterable content" from a WSGI app
  101. function.
  102. You may also render any tag or sub-tag at any time by stringifying it.
  103. Tags with no contents (either text or sub-tags) will have no closing tag.
  104. There is no "special list" of tags that must always have closing tags, so
  105. if you need to force a closing tag you'll need to provide some content,
  106. even if it's just a single space character.
  107. Rendering doesn't affect the HTML document's state, so you can add to or
  108. otherwise manipulate the HTML after you've stringified it.
  109. Creating XHTML
  110. --------------
  111. To construct XHTML start with an instance of ``html.XHTML()`` and use it
  112. as you would an ``HTML`` instance. Empty elements will now be rendered
  113. with the appropriate XHTML minimized tag syntax. For example:
  114. >>> from html import XHTML
  115. >>> h = XHTML()
  116. >>> h.p
  117. >>> h.br
  118. >>> print h
  119. <p></p>
  120. <br />
  121. Creating XML
  122. ------------
  123. A slight tweak to the ``html.XHTML()`` implementation allows us to generate
  124. arbitrary XML using ``html.XML()``:
  125. >>> from html import XML
  126. >>> h = XML('xml')
  127. >>> h.p
  128. >>> h.br('hi there')
  129. >>> print h
  130. <xml>
  131. <p />
  132. <br>hi there</br>
  133. </xml>
  134. Tags with difficult names
  135. -------------------------
  136. If your tag name isn't a valid Python identifier name, or if it's called
  137. "text" or "raw_text" you can add your tag slightly more manually:
  138. >>> from html import XML
  139. >>> h = XML('xml')
  140. >>> h += XML('some-tag', 'some text')
  141. >>> h += XML('text', 'some text')
  142. >>> print h
  143. <xml>
  144. <some-tag>some text</some-tag>
  145. <text>some text</text>
  146. </xml>
  147. Version History (in Brief)
  148. --------------------------
  149. - 1.16 detect and raise a more useful error when some WSGI frameworks
  150. attempt to call HTML.read(). Also added ability to add new content using
  151. the += operator.
  152. - 1.15 fix Python 3 compatibility (unit tests)
  153. - 1.14 added plain XML support
  154. - 1.13 allow adding (X)HTML instances (tags) as new document content
  155. - 1.12 fix handling of XHTML empty tags when generating unicode
  156. output (thanks Carsten Eggers)
  157. - 1.11 remove setuptools dependency
  158. - 1.10 support plain ol' distutils again
  159. - 1.9 added unicode support for Python 2.x
  160. - 1.8 added Python 3 compatibility
  161. - 1.7 added Python 2.5 compatibility and escape argument to tag
  162. construction
  163. - 1.6 added .raw_text() and and WSGI compatibility
  164. - 1.5 added XHTML support
  165. - 1.3 added more documentation, more tests
  166. - 1.2 added special-case klass / class attribute
  167. - 1.1 added escaping control
  168. - 1.0 was the initial release
  169. ----
  170. I would be interested to know whether this module is useful - if you use it
  171. please indicate so at https://www.ohloh.net/p/pyhtml
  172. This code is copyright 2009-2011 eKit.com Inc (http://www.ekit.com/)
  173. See the end of the source file for the license of use.
  174. XHTML support was contributed by Michael Haubenwallner.
  175. '''
  176. from __future__ import with_statement
  177. __version__ = '1.16'
  178. import sys
  179. import cgi
  180. import unittest
  181. class HTML(object):
  182. '''Easily generate HTML.
  183. >>> print HTML('html', 'some text')
  184. <html>some text</html>
  185. >>> print HTML('html').p('some text')
  186. <html><p>some text</p></html>
  187. If a name is not passed in then the instance becomes a container for
  188. other tags that itself generates no tag:
  189. >>> h = HTML()
  190. >>> h.p('text')
  191. >>> h.p('text')
  192. print h
  193. <p>some text</p>
  194. <p>some text</p>
  195. '''
  196. newline_default_on = set('table ol ul dl'.split())
  197. def __init__(self, name=None, text=None, stack=None, newlines=True,
  198. escape=True):
  199. self._name = name
  200. self._content = []
  201. self._attrs = {}
  202. # insert newlines between content?
  203. if stack is None:
  204. stack = [self]
  205. self._top = True
  206. self._newlines = newlines
  207. else:
  208. self._top = False
  209. self._newlines = name in self.newline_default_on
  210. self._stack = stack
  211. if text is not None:
  212. self.text(text, escape)
  213. def __getattr__(self, name):
  214. # adding a new tag or newline
  215. if name == 'newline':
  216. e = '\n'
  217. else:
  218. e = self.__class__(name, stack=self._stack)
  219. if self._top:
  220. self._stack[-1]._content.append(e)
  221. else:
  222. self._content.append(e)
  223. return e
  224. def __iadd__(self, other):
  225. if self._top:
  226. self._stack[-1]._content.append(other)
  227. else:
  228. self._content.append(other)
  229. return self
  230. def text(self, text, escape=True):
  231. '''Add text to the document. If "escape" is True any characters
  232. special to HTML will be escaped.
  233. '''
  234. if escape:
  235. text = cgi.escape(text)
  236. # adding text
  237. if self._top:
  238. self._stack[-1]._content.append(text)
  239. else:
  240. self._content.append(text)
  241. def raw_text(self, text):
  242. '''Add raw, unescaped text to the document. This is useful for
  243. explicitly adding HTML code or entities.
  244. '''
  245. return self.text(text, escape=False)
  246. def __call__(self, *content, **kw):
  247. if self._name == 'read':
  248. if len(content) == 1 and isinstance(content[0], int):
  249. raise TypeError('you appear to be calling read(%d) on '
  250. 'a HTML instance' % content)
  251. elif len(content) == 0:
  252. raise TypeError('you appear to be calling read() on a '
  253. 'HTML instance')
  254. # customising a tag with content or attributes
  255. escape = kw.pop('escape', True)
  256. if content:
  257. if escape:
  258. self._content = list(map(cgi.escape, content))
  259. else:
  260. self._content = content
  261. if 'newlines' in kw:
  262. # special-case to allow control over newlines
  263. self._newlines = kw.pop('newlines')
  264. for k in kw:
  265. if k == 'klass':
  266. self._attrs['class'] = cgi.escape(kw[k], True)
  267. else:
  268. self._attrs[k] = cgi.escape(kw[k], True)
  269. return self
  270. def __enter__(self):
  271. # we're now adding tags to me!
  272. self._stack.append(self)
  273. return self
  274. def __exit__(self, exc_type, exc_value, exc_tb):
  275. # we're done adding tags to me!
  276. self._stack.pop()
  277. def __repr__(self):
  278. return '<HTML %s 0x%x>' % (self._name, id(self))
  279. def _stringify(self, str_type):
  280. # turn me and my content into text
  281. join = '\n' if self._newlines else ''
  282. if self._name is None:
  283. return join.join(map(str_type, self._content))
  284. a = ['%s="%s"' % i for i in self._attrs.items()]
  285. l = [self._name] + a
  286. s = '<%s>%s' % (' '.join(l), join)
  287. if self._content:
  288. s += join.join(map(str_type, self._content))
  289. s += join + '</%s>' % self._name
  290. return s
  291. def __str__(self):
  292. return self._stringify(str)
  293. def __unicode__(self):
  294. return self._stringify(unicode)
  295. def __iter__(self):
  296. return iter([str(self)])
  297. class XHTML(HTML):
  298. '''Easily generate XHTML.
  299. '''
  300. empty_elements = set('base meta link hr br param img area input col \
  301. colgroup basefont isindex frame'.split())
  302. def _stringify(self, str_type):
  303. # turn me and my content into text
  304. # honor empty and non-empty elements
  305. join = '\n' if self._newlines else ''
  306. if self._name is None:
  307. return join.join(map(str_type, self._content))
  308. a = ['%s="%s"' % i for i in self._attrs.items()]
  309. l = [self._name] + a
  310. s = '<%s>%s' % (' '.join(l), join)
  311. if self._content or not(self._name.lower() in self.empty_elements):
  312. s += join.join(map(str_type, self._content))
  313. s += join + '</%s>' % self._name
  314. else:
  315. s = '<%s />%s' % (' '.join(l), join)
  316. return s
  317. class XML(XHTML):
  318. '''Easily generate XML.
  319. All tags with no contents are reduced to self-terminating tags.
  320. '''
  321. newline_default_on = set() # no tags are special
  322. def _stringify(self, str_type):
  323. # turn me and my content into text
  324. # honor empty and non-empty elements
  325. join = '\n' if self._newlines else ''
  326. if self._name is None:
  327. return join.join(map(str_type, self._content))
  328. a = ['%s="%s"' % i for i in self._attrs.items()]
  329. l = [self._name] + a
  330. s = '<%s>%s' % (' '.join(l), join)
  331. if self._content:
  332. s += join.join(map(str_type, self._content))
  333. s += join + '</%s>' % self._name
  334. else:
  335. s = '<%s />%s' % (' '.join(l), join)
  336. return s
  337. class TestCase(unittest.TestCase):
  338. def test_empty_tag(self):
  339. 'generation of an empty HTML tag'
  340. self.assertEquals(str(HTML().br), '<br>')
  341. def test_empty_tag_xml(self):
  342. 'generation of an empty XHTML tag'
  343. self.assertEquals(str(XHTML().br), '<br />')
  344. def test_tag_add(self):
  345. 'test top-level tag creation'
  346. self.assertEquals(str(HTML('html', 'text')), '<html>\ntext\n</html>')
  347. def test_tag_add_no_newline(self):
  348. 'test top-level tag creation'
  349. self.assertEquals(str(HTML('html', 'text', newlines=False)),
  350. '<html>text</html>')
  351. def test_iadd_tag(self):
  352. "test iadd'ing a tag"
  353. h = XML('xml')
  354. h += XML('some-tag', 'spam', newlines=False)
  355. h += XML('text', 'spam', newlines=False)
  356. self.assertEquals(str(h),
  357. '<xml>\n<some-tag>spam</some-tag>\n<text>spam</text>\n</xml>')
  358. def test_iadd_text(self):
  359. "test iadd'ing text"
  360. h = HTML('html', newlines=False)
  361. h += 'text'
  362. h += 'text'
  363. self.assertEquals(str(h), '<html>texttext</html>')
  364. def test_xhtml_match_tag(self):
  365. 'check forced generation of matching tag when empty'
  366. self.assertEquals(str(XHTML().p), '<p></p>')
  367. if sys.version_info[0] == 2:
  368. def test_empty_tag_unicode(self):
  369. 'generation of an empty HTML tag'
  370. self.assertEquals(unicode(HTML().br), unicode('<br>'))
  371. def test_empty_tag_xml_unicode(self):
  372. 'generation of an empty XHTML tag'
  373. self.assertEquals(unicode(XHTML().br), unicode('<br />'))
  374. def test_xhtml_match_tag_unicode(self):
  375. 'check forced generation of matching tag when empty'
  376. self.assertEquals(unicode(XHTML().p), unicode('<p></p>'))
  377. def test_just_tag(self):
  378. 'generate HTML for just one tag'
  379. self.assertEquals(str(HTML().br), '<br>')
  380. def test_just_tag_xhtml(self):
  381. 'generate XHTML for just one tag'
  382. self.assertEquals(str(XHTML().br), '<br />')
  383. def test_xml(self):
  384. 'generate XML'
  385. self.assertEquals(str(XML().br), '<br />')
  386. self.assertEquals(str(XML().p), '<p />')
  387. self.assertEquals(str(XML().br('text')), '<br>text</br>')
  388. def test_para_tag(self):
  389. 'generation of a tag with contents'
  390. h = HTML()
  391. h.p('hello')
  392. self.assertEquals(str(h), '<p>hello</p>')
  393. def test_escape(self):
  394. 'escaping of special HTML characters in text'
  395. h = HTML()
  396. h.text('<>&')
  397. self.assertEquals(str(h), '&lt;&gt;&amp;')
  398. def test_no_escape(self):
  399. 'no escaping of special HTML characters in text'
  400. h = HTML()
  401. h.text('<>&', False)
  402. self.assertEquals(str(h), '<>&')
  403. def test_escape_attr(self):
  404. 'escaping of special HTML characters in attributes'
  405. h = HTML()
  406. h.br(id='<>&"')
  407. self.assertEquals(str(h), '<br id="&lt;&gt;&amp;&quot;">')
  408. def test_subtag_context(self):
  409. 'generation of sub-tags using "with" context'
  410. h = HTML()
  411. with h.ol:
  412. h.li('foo')
  413. h.li('bar')
  414. self.assertEquals(str(h), '<ol>\n<li>foo</li>\n<li>bar</li>\n</ol>')
  415. def test_subtag_direct(self):
  416. 'generation of sub-tags directly on the parent tag'
  417. h = HTML()
  418. l = h.ol
  419. l.li('foo')
  420. l.li.b('bar')
  421. self.assertEquals(str(h),
  422. '<ol>\n<li>foo</li>\n<li><b>bar</b></li>\n</ol>')
  423. def test_subtag_direct_context(self):
  424. 'generation of sub-tags directly on the parent tag in "with" context'
  425. h = HTML()
  426. with h.ol as l:
  427. l.li('foo')
  428. l.li.b('bar')
  429. self.assertEquals(str(h),
  430. '<ol>\n<li>foo</li>\n<li><b>bar</b></li>\n</ol>')
  431. def test_subtag_no_newlines(self):
  432. 'prevent generation of newlines against default'
  433. h = HTML()
  434. l = h.ol(newlines=False)
  435. l.li('foo')
  436. l.li('bar')
  437. self.assertEquals(str(h), '<ol><li>foo</li><li>bar</li></ol>')
  438. def test_add_text(self):
  439. 'add text to a tag'
  440. h = HTML()
  441. p = h.p('hello, world!\n')
  442. p.text('more text')
  443. self.assertEquals(str(h), '<p>hello, world!\nmore text</p>')
  444. def test_add_text_newlines(self):
  445. 'add text to a tag with newlines for prettiness'
  446. h = HTML()
  447. p = h.p('hello, world!', newlines=True)
  448. p.text('more text')
  449. self.assertEquals(str(h), '<p>\nhello, world!\nmore text\n</p>')
  450. def test_doc_newlines(self):
  451. 'default document adding newlines between tags'
  452. h = HTML()
  453. h.br
  454. h.br
  455. self.assertEquals(str(h), '<br>\n<br>')
  456. def test_doc_no_newlines(self):
  457. 'prevent document adding newlines between tags'
  458. h = HTML(newlines=False)
  459. h.br
  460. h.br
  461. self.assertEquals(str(h), '<br><br>')
  462. def test_unicode(self):
  463. 'make sure unicode input works and results in unicode output'
  464. h = HTML(newlines=False)
  465. # Python 3 compat
  466. try:
  467. unicode = unicode
  468. TEST = 'euro \xe2\x82\xac'.decode('utf8')
  469. except:
  470. unicode = str
  471. TEST = 'euro €'
  472. h.p(TEST)
  473. self.assertEquals(unicode(h), '<p>%s</p>' % TEST)
  474. def test_table(self):
  475. 'multiple "with" context blocks'
  476. h = HTML()
  477. with h.table(border='1'):
  478. for i in range(2):
  479. with h.tr:
  480. h.td('column 1')
  481. h.td('column 2')
  482. self.assertEquals(str(h), '''<table border="1">
  483. <tr><td>column 1</td><td>column 2</td></tr>
  484. <tr><td>column 1</td><td>column 2</td></tr>
  485. </table>''')
  486. if __name__ == '__main__':
  487. unittest.main()
  488. # Copyright (c) 2009 eKit.com Inc (http://www.ekit.com/)
  489. #
  490. # Permission is hereby granted, free of charge, to any person obtaining a copy
  491. # of this software and associated documentation files (the "Software"), to deal
  492. # in the Software without restriction, including without limitation the rights
  493. # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
  494. # copies of the Software, and to permit persons to whom the Software is
  495. # furnished to do so, subject to the following conditions:
  496. #
  497. # The above copyright notice and this permission notice shall be included in
  498. # all copies or substantial portions of the Software.
  499. #
  500. # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
  501. # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  502. # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
  503. # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
  504. # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
  505. # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
  506. # SOFTWARE.
  507. # vim: set filetype=python ts=4 sw=4 et si