METADATA 8.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287
  1. Metadata-Version: 2.1
  2. Name: parsel
  3. Version: 1.6.0
  4. Summary: Parsel is a library to extract data from HTML and XML using XPath and CSS selectors
  5. Home-page: https://github.com/scrapy/parsel
  6. Author: Scrapy project
  7. Author-email: info@scrapy.org
  8. License: BSD
  9. Keywords: parsel
  10. Platform: UNKNOWN
  11. Classifier: Development Status :: 5 - Production/Stable
  12. Classifier: Intended Audience :: Developers
  13. Classifier: License :: OSI Approved :: BSD License
  14. Classifier: Natural Language :: English
  15. Classifier: Topic :: Text Processing :: Markup
  16. Classifier: Topic :: Text Processing :: Markup :: HTML
  17. Classifier: Topic :: Text Processing :: Markup :: XML
  18. Classifier: Programming Language :: Python :: 2
  19. Classifier: Programming Language :: Python :: 2.7
  20. Classifier: Programming Language :: Python :: 3
  21. Classifier: Programming Language :: Python :: 3.5
  22. Classifier: Programming Language :: Python :: 3.6
  23. Classifier: Programming Language :: Python :: 3.7
  24. Classifier: Programming Language :: Python :: 3.8
  25. Classifier: Programming Language :: Python :: Implementation :: CPython
  26. Classifier: Programming Language :: Python :: Implementation :: PyPy
  27. Requires-Dist: w3lib (>=1.19.0)
  28. Requires-Dist: lxml
  29. Requires-Dist: six (>=1.6.0)
  30. Requires-Dist: cssselect (>=0.9)
  31. Requires-Dist: functools32 ; python_version<'3.0'
  32. ======
  33. Parsel
  34. ======
  35. .. image:: https://img.shields.io/travis/scrapy/parsel/master.svg
  36. :target: https://travis-ci.org/scrapy/parsel
  37. :alt: Build Status
  38. .. image:: https://img.shields.io/pypi/v/parsel.svg
  39. :target: https://pypi.python.org/pypi/parsel
  40. :alt: PyPI Version
  41. .. image:: https://img.shields.io/codecov/c/github/scrapy/parsel/master.svg
  42. :target: http://codecov.io/github/scrapy/parsel?branch=master
  43. :alt: Coverage report
  44. Parsel is a BSD-licensed Python_ library to extract and remove data from HTML_
  45. and XML_ using XPath_ and CSS_ selectors, optionally combined with
  46. `regular expressions`_.
  47. Find the Parsel online documentation at https://parsel.readthedocs.org.
  48. Example (`open online demo`_):
  49. .. code-block:: python
  50. >>> from parsel import Selector
  51. >>> selector = Selector(text=u"""<html>
  52. <body>
  53. <h1>Hello, Parsel!</h1>
  54. <ul>
  55. <li><a href="http://example.com">Link 1</a></li>
  56. <li><a href="http://scrapy.org">Link 2</a></li>
  57. </ul>
  58. </body>
  59. </html>""")
  60. >>> selector.css('h1::text').get()
  61. 'Hello, Parsel!'
  62. >>> selector.xpath('//h1/text()').re(r'\w+')
  63. ['Hello', 'Parsel']
  64. >>> for li in selector.css('ul > li'):
  65. ... print(li.xpath('.//@href').get())
  66. http://example.com
  67. http://scrapy.org
  68. .. _CSS: https://en.wikipedia.org/wiki/Cascading_Style_Sheets
  69. .. _HTML: https://en.wikipedia.org/wiki/HTML
  70. .. _open online demo: https://colab.research.google.com/drive/149VFa6Px3wg7S3SEnUqk--TyBrKplxCN#forceEdit=true&sandboxMode=true
  71. .. _Python: https://www.python.org/
  72. .. _regular expressions: https://docs.python.org/library/re.html
  73. .. _XML: https://en.wikipedia.org/wiki/XML
  74. .. _XPath: https://en.wikipedia.org/wiki/XPath
  75. History
  76. -------
  77. 1.6.0 (2020-05-07)
  78. ~~~~~~~~~~~~~~~~~~
  79. * Python 3.4 is no longer supported
  80. * New ``Selector.remove()`` and ``SelectorList.remove()`` methods to remove
  81. selected elements from the parsed document tree
  82. * Improvements to error reporting, test coverage and documentation, and code
  83. cleanup
  84. 1.5.2 (2019-08-09)
  85. ~~~~~~~~~~~~~~~~~~
  86. * ``Selector.remove_namespaces`` received a significant performance improvement
  87. * The value of ``data`` within the printable representation of a selector
  88. (``repr(selector)``) now ends in ``...`` when truncated, to make the
  89. truncation obvious.
  90. * Minor documentation improvements.
  91. 1.5.1 (2018-10-25)
  92. ~~~~~~~~~~~~~~~~~~
  93. * ``has-class`` XPath function handles newlines and other separators
  94. in class names properly;
  95. * fixed parsing of HTML documents with null bytes;
  96. * documentation improvements;
  97. * Python 3.7 tests are run on CI; other test improvements.
  98. 1.5.0 (2018-07-04)
  99. ~~~~~~~~~~~~~~~~~~
  100. * New ``Selector.attrib`` and ``SelectorList.attrib`` properties which make
  101. it easier to get attributes of HTML elements.
  102. * CSS selectors became faster: compilation results are cached
  103. (LRU cache is used for ``css2xpath``), so there is
  104. less overhead when the same CSS expression is used several times.
  105. * ``.get()`` and ``.getall()`` selector methods are documented and recommended
  106. over ``.extract_first()`` and ``.extract()``.
  107. * Various documentation tweaks and improvements.
  108. One more change is that ``.extract()`` and ``.extract_first()`` methods
  109. are now implemented using ``.get()`` and ``.getall()``, not the other
  110. way around, and instead of calling ``Selector.extract`` all other methods
  111. now call ``Selector.get`` internally. It can be **backwards incompatible**
  112. in case of custom Selector subclasses which override ``Selector.extract``
  113. without doing the same for ``Selector.get``. If you have such Selector
  114. subclass, make sure ``get`` method is also overridden. For example, this::
  115. class MySelector(parsel.Selector):
  116. def extract(self):
  117. return super().extract() + " foo"
  118. should be changed to this::
  119. class MySelector(parsel.Selector):
  120. def get(self):
  121. return super().get() + " foo"
  122. extract = get
  123. 1.4.0 (2018-02-08)
  124. ~~~~~~~~~~~~~~~~~~
  125. * ``Selector`` and ``SelectorList`` can't be pickled because
  126. pickling/unpickling doesn't work for ``lxml.html.HtmlElement``;
  127. parsel now raises TypeError explicitly instead of allowing pickle to
  128. silently produce wrong output. This is technically backwards-incompatible
  129. if you're using Python < 3.6.
  130. 1.3.1 (2017-12-28)
  131. ~~~~~~~~~~~~~~~~~~
  132. * Fix artifact uploads to pypi.
  133. 1.3.0 (2017-12-28)
  134. ~~~~~~~~~~~~~~~~~~
  135. * ``has-class`` XPath extension function;
  136. * ``parsel.xpathfuncs.set_xpathfunc`` is a simplified way to register
  137. XPath extensions;
  138. * ``Selector.remove_namespaces`` now removes namespace declarations;
  139. * Python 3.3 support is dropped;
  140. * ``make htmlview`` command for easier Parsel docs development.
  141. * CI: PyPy installation is fixed; parsel now runs tests for PyPy3 as well.
  142. 1.2.0 (2017-05-17)
  143. ~~~~~~~~~~~~~~~~~~
  144. * Add ``SelectorList.get`` and ``SelectorList.getall``
  145. methods as aliases for ``SelectorList.extract_first``
  146. and ``SelectorList.extract`` respectively
  147. * Add default value parameter to ``SelectorList.re_first`` method
  148. * Add ``Selector.re_first`` method
  149. * Add ``replace_entities`` argument on ``.re()`` and ``.re_first()``
  150. to turn off replacing of character entity references
  151. * Bug fix: detect ``None`` result from lxml parsing and fallback with an empty document
  152. * Rearrange XML/HTML examples in the selectors usage docs
  153. * Travis CI:
  154. * Test against Python 3.6
  155. * Test against PyPy using "Portable PyPy for Linux" distribution
  156. 1.1.0 (2016-11-22)
  157. ~~~~~~~~~~~~~~~~~~
  158. * Change default HTML parser to `lxml.html.HTMLParser <http://lxml.de/api/lxml.html.HTMLParser-class.html>`_,
  159. which makes easier to use some HTML specific features
  160. * Add css2xpath function to translate CSS to XPath
  161. * Add support for ad-hoc namespaces declarations
  162. * Add support for XPath variables
  163. * Documentation improvements and updates
  164. 1.0.3 (2016-07-29)
  165. ~~~~~~~~~~~~~~~~~~
  166. * Add BSD-3-Clause license file
  167. * Re-enable PyPy tests
  168. * Integrate py.test runs with setuptools (needed for Debian packaging)
  169. * Changelog is now called ``NEWS``
  170. 1.0.2 (2016-04-26)
  171. ~~~~~~~~~~~~~~~~~~
  172. * Fix bug in exception handling causing original traceback to be lost
  173. * Added docstrings and other doc fixes
  174. 1.0.1 (2015-08-24)
  175. ~~~~~~~~~~~~~~~~~~
  176. * Updated PyPI classifiers
  177. * Added docstrings for csstranslator module and other doc fixes
  178. 1.0.0 (2015-08-22)
  179. ~~~~~~~~~~~~~~~~~~
  180. * Documentation fixes
  181. 0.9.6 (2015-08-14)
  182. ~~~~~~~~~~~~~~~~~~
  183. * Updated documentation
  184. * Extended test coverage
  185. 0.9.5 (2015-08-11)
  186. ~~~~~~~~~~~~~~~~~~
  187. * Support for extending SelectorList
  188. 0.9.4 (2015-08-10)
  189. ~~~~~~~~~~~~~~~~~~
  190. * Try workaround for travis-ci/dpl#253
  191. 0.9.3 (2015-08-07)
  192. ~~~~~~~~~~~~~~~~~~
  193. * Add base_url argument
  194. 0.9.2 (2015-08-07)
  195. ~~~~~~~~~~~~~~~~~~
  196. * Rename module unified -> selector and promoted root attribute
  197. * Add create_root_node function
  198. 0.9.1 (2015-08-04)
  199. ~~~~~~~~~~~~~~~~~~
  200. * Setup Sphinx build and docs structure
  201. * Build universal wheels
  202. * Rename some leftovers from package extraction
  203. 0.9.0 (2015-07-30)
  204. ~~~~~~~~~~~~~~~~~~
  205. * First release on PyPI.