123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287 |
- Metadata-Version: 2.1
- Name: parsel
- Version: 1.6.0
- Summary: Parsel is a library to extract data from HTML and XML using XPath and CSS selectors
- Home-page: https://github.com/scrapy/parsel
- Author: Scrapy project
- Author-email: info@scrapy.org
- License: BSD
- Keywords: parsel
- Platform: UNKNOWN
- Classifier: Development Status :: 5 - Production/Stable
- Classifier: Intended Audience :: Developers
- Classifier: License :: OSI Approved :: BSD License
- Classifier: Natural Language :: English
- Classifier: Topic :: Text Processing :: Markup
- Classifier: Topic :: Text Processing :: Markup :: HTML
- Classifier: Topic :: Text Processing :: Markup :: XML
- Classifier: Programming Language :: Python :: 2
- Classifier: Programming Language :: Python :: 2.7
- Classifier: Programming Language :: Python :: 3
- Classifier: Programming Language :: Python :: 3.5
- Classifier: Programming Language :: Python :: 3.6
- Classifier: Programming Language :: Python :: 3.7
- Classifier: Programming Language :: Python :: 3.8
- Classifier: Programming Language :: Python :: Implementation :: CPython
- Classifier: Programming Language :: Python :: Implementation :: PyPy
- Requires-Dist: w3lib (>=1.19.0)
- Requires-Dist: lxml
- Requires-Dist: six (>=1.6.0)
- Requires-Dist: cssselect (>=0.9)
- Requires-Dist: functools32 ; python_version<'3.0'
- ======
- Parsel
- ======
- .. image:: https://img.shields.io/travis/scrapy/parsel/master.svg
- :target: https://travis-ci.org/scrapy/parsel
- :alt: Build Status
- .. image:: https://img.shields.io/pypi/v/parsel.svg
- :target: https://pypi.python.org/pypi/parsel
- :alt: PyPI Version
- .. image:: https://img.shields.io/codecov/c/github/scrapy/parsel/master.svg
- :target: http://codecov.io/github/scrapy/parsel?branch=master
- :alt: Coverage report
- Parsel is a BSD-licensed Python_ library to extract and remove data from HTML_
- and XML_ using XPath_ and CSS_ selectors, optionally combined with
- `regular expressions`_.
- Find the Parsel online documentation at https://parsel.readthedocs.org.
- Example (`open online demo`_):
- .. code-block:: python
- >>> from parsel import Selector
- >>> selector = Selector(text=u"""<html>
- <body>
- <h1>Hello, Parsel!</h1>
- <ul>
- <li><a href="http://example.com">Link 1</a></li>
- <li><a href="http://scrapy.org">Link 2</a></li>
- </ul>
- </body>
- </html>""")
- >>> selector.css('h1::text').get()
- 'Hello, Parsel!'
- >>> selector.xpath('//h1/text()').re(r'\w+')
- ['Hello', 'Parsel']
- >>> for li in selector.css('ul > li'):
- ... print(li.xpath('.//@href').get())
- http://example.com
- http://scrapy.org
- .. _CSS: https://en.wikipedia.org/wiki/Cascading_Style_Sheets
- .. _HTML: https://en.wikipedia.org/wiki/HTML
- .. _open online demo: https://colab.research.google.com/drive/149VFa6Px3wg7S3SEnUqk--TyBrKplxCN#forceEdit=true&sandboxMode=true
- .. _Python: https://www.python.org/
- .. _regular expressions: https://docs.python.org/library/re.html
- .. _XML: https://en.wikipedia.org/wiki/XML
- .. _XPath: https://en.wikipedia.org/wiki/XPath
- History
- -------
- 1.6.0 (2020-05-07)
- ~~~~~~~~~~~~~~~~~~
- * Python 3.4 is no longer supported
- * New ``Selector.remove()`` and ``SelectorList.remove()`` methods to remove
- selected elements from the parsed document tree
- * Improvements to error reporting, test coverage and documentation, and code
- cleanup
- 1.5.2 (2019-08-09)
- ~~~~~~~~~~~~~~~~~~
- * ``Selector.remove_namespaces`` received a significant performance improvement
- * The value of ``data`` within the printable representation of a selector
- (``repr(selector)``) now ends in ``...`` when truncated, to make the
- truncation obvious.
- * Minor documentation improvements.
- 1.5.1 (2018-10-25)
- ~~~~~~~~~~~~~~~~~~
- * ``has-class`` XPath function handles newlines and other separators
- in class names properly;
- * fixed parsing of HTML documents with null bytes;
- * documentation improvements;
- * Python 3.7 tests are run on CI; other test improvements.
- 1.5.0 (2018-07-04)
- ~~~~~~~~~~~~~~~~~~
- * New ``Selector.attrib`` and ``SelectorList.attrib`` properties which make
- it easier to get attributes of HTML elements.
- * CSS selectors became faster: compilation results are cached
- (LRU cache is used for ``css2xpath``), so there is
- less overhead when the same CSS expression is used several times.
- * ``.get()`` and ``.getall()`` selector methods are documented and recommended
- over ``.extract_first()`` and ``.extract()``.
- * Various documentation tweaks and improvements.
- One more change is that ``.extract()`` and ``.extract_first()`` methods
- are now implemented using ``.get()`` and ``.getall()``, not the other
- way around, and instead of calling ``Selector.extract`` all other methods
- now call ``Selector.get`` internally. It can be **backwards incompatible**
- in case of custom Selector subclasses which override ``Selector.extract``
- without doing the same for ``Selector.get``. If you have such Selector
- subclass, make sure ``get`` method is also overridden. For example, this::
- class MySelector(parsel.Selector):
- def extract(self):
- return super().extract() + " foo"
- should be changed to this::
- class MySelector(parsel.Selector):
- def get(self):
- return super().get() + " foo"
- extract = get
- 1.4.0 (2018-02-08)
- ~~~~~~~~~~~~~~~~~~
- * ``Selector`` and ``SelectorList`` can't be pickled because
- pickling/unpickling doesn't work for ``lxml.html.HtmlElement``;
- parsel now raises TypeError explicitly instead of allowing pickle to
- silently produce wrong output. This is technically backwards-incompatible
- if you're using Python < 3.6.
- 1.3.1 (2017-12-28)
- ~~~~~~~~~~~~~~~~~~
- * Fix artifact uploads to pypi.
- 1.3.0 (2017-12-28)
- ~~~~~~~~~~~~~~~~~~
- * ``has-class`` XPath extension function;
- * ``parsel.xpathfuncs.set_xpathfunc`` is a simplified way to register
- XPath extensions;
- * ``Selector.remove_namespaces`` now removes namespace declarations;
- * Python 3.3 support is dropped;
- * ``make htmlview`` command for easier Parsel docs development.
- * CI: PyPy installation is fixed; parsel now runs tests for PyPy3 as well.
- 1.2.0 (2017-05-17)
- ~~~~~~~~~~~~~~~~~~
- * Add ``SelectorList.get`` and ``SelectorList.getall``
- methods as aliases for ``SelectorList.extract_first``
- and ``SelectorList.extract`` respectively
- * Add default value parameter to ``SelectorList.re_first`` method
- * Add ``Selector.re_first`` method
- * Add ``replace_entities`` argument on ``.re()`` and ``.re_first()``
- to turn off replacing of character entity references
- * Bug fix: detect ``None`` result from lxml parsing and fallback with an empty document
- * Rearrange XML/HTML examples in the selectors usage docs
- * Travis CI:
- * Test against Python 3.6
- * Test against PyPy using "Portable PyPy for Linux" distribution
- 1.1.0 (2016-11-22)
- ~~~~~~~~~~~~~~~~~~
- * Change default HTML parser to `lxml.html.HTMLParser <http://lxml.de/api/lxml.html.HTMLParser-class.html>`_,
- which makes easier to use some HTML specific features
- * Add css2xpath function to translate CSS to XPath
- * Add support for ad-hoc namespaces declarations
- * Add support for XPath variables
- * Documentation improvements and updates
- 1.0.3 (2016-07-29)
- ~~~~~~~~~~~~~~~~~~
- * Add BSD-3-Clause license file
- * Re-enable PyPy tests
- * Integrate py.test runs with setuptools (needed for Debian packaging)
- * Changelog is now called ``NEWS``
- 1.0.2 (2016-04-26)
- ~~~~~~~~~~~~~~~~~~
- * Fix bug in exception handling causing original traceback to be lost
- * Added docstrings and other doc fixes
- 1.0.1 (2015-08-24)
- ~~~~~~~~~~~~~~~~~~
- * Updated PyPI classifiers
- * Added docstrings for csstranslator module and other doc fixes
- 1.0.0 (2015-08-22)
- ~~~~~~~~~~~~~~~~~~
- * Documentation fixes
- 0.9.6 (2015-08-14)
- ~~~~~~~~~~~~~~~~~~
- * Updated documentation
- * Extended test coverage
- 0.9.5 (2015-08-11)
- ~~~~~~~~~~~~~~~~~~
- * Support for extending SelectorList
- 0.9.4 (2015-08-10)
- ~~~~~~~~~~~~~~~~~~
- * Try workaround for travis-ci/dpl#253
- 0.9.3 (2015-08-07)
- ~~~~~~~~~~~~~~~~~~
- * Add base_url argument
- 0.9.2 (2015-08-07)
- ~~~~~~~~~~~~~~~~~~
- * Rename module unified -> selector and promoted root attribute
- * Add create_root_node function
- 0.9.1 (2015-08-04)
- ~~~~~~~~~~~~~~~~~~
- * Setup Sphinx build and docs structure
- * Build universal wheels
- * Rename some leftovers from package extraction
- 0.9.0 (2015-07-30)
- ~~~~~~~~~~~~~~~~~~
- * First release on PyPI.
|