dayuan
/
manyi


			
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287
							Metadata-Version: 2.1
Name: parsel
Version: 1.6.0
Summary: Parsel is a library to extract data from HTML and XML using XPath and CSS selectors
Home-page: https://github.com/scrapy/parsel
Author: Scrapy project
Author-email: info@scrapy.org
License: BSD
Keywords: parsel
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Natural Language :: English
Classifier: Topic :: Text Processing :: Markup
Classifier: Topic :: Text Processing :: Markup :: HTML
Classifier: Topic :: Text Processing :: Markup :: XML
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Requires-Dist: w3lib (>=1.19.0)
Requires-Dist: lxml
Requires-Dist: six (>=1.6.0)
Requires-Dist: cssselect (>=0.9)
Requires-Dist: functools32 ; python_version<'3.0'

======
Parsel
======

.. image:: https://img.shields.io/travis/scrapy/parsel/master.svg
   :target: https://travis-ci.org/scrapy/parsel
   :alt: Build Status

.. image:: https://img.shields.io/pypi/v/parsel.svg
   :target: https://pypi.python.org/pypi/parsel
   :alt: PyPI Version

.. image:: https://img.shields.io/codecov/c/github/scrapy/parsel/master.svg
   :target: http://codecov.io/github/scrapy/parsel?branch=master
   :alt: Coverage report


Parsel is a BSD-licensed Python_ library to extract and remove data from HTML_
and XML_ using XPath_ and CSS_ selectors, optionally combined with
`regular expressions`_.

Find the Parsel online documentation at https://parsel.readthedocs.org.

Example (`open online demo`_):

.. code-block:: python

    >>> from parsel import Selector
    >>> selector = Selector(text=u"""<html>
            <body>
                <h1>Hello, Parsel!</h1>
                <ul>
                    <li><a href="http://example.com">Link 1</a></li>
                    <li><a href="http://scrapy.org">Link 2</a></li>
                </ul>
            </body>
            </html>""")
    >>> selector.css('h1::text').get()
    'Hello, Parsel!'
    >>> selector.xpath('//h1/text()').re(r'\w+')
    ['Hello', 'Parsel']
    >>> for li in selector.css('ul > li'):
    ...     print(li.xpath('.//@href').get())
    http://example.com
    http://scrapy.org


.. _CSS: https://en.wikipedia.org/wiki/Cascading_Style_Sheets
.. _HTML: https://en.wikipedia.org/wiki/HTML
.. _open online demo: https://colab.research.google.com/drive/149VFa6Px3wg7S3SEnUqk--TyBrKplxCN#forceEdit=true&sandboxMode=true
.. _Python: https://www.python.org/
.. _regular expressions: https://docs.python.org/library/re.html
.. _XML: https://en.wikipedia.org/wiki/XML
.. _XPath: https://en.wikipedia.org/wiki/XPath


History
-------

1.6.0 (2020-05-07)
~~~~~~~~~~~~~~~~~~

* Python 3.4 is no longer supported
* New ``Selector.remove()`` and ``SelectorList.remove()`` methods to remove
  selected elements from the parsed document tree
* Improvements to error reporting, test coverage and documentation, and code
  cleanup


1.5.2 (2019-08-09)
~~~~~~~~~~~~~~~~~~

* ``Selector.remove_namespaces`` received a significant performance improvement
* The value of ``data`` within the printable representation of a selector
  (``repr(selector)``) now ends in ``...`` when truncated, to make the
  truncation obvious.
* Minor documentation improvements.


1.5.1 (2018-10-25)
~~~~~~~~~~~~~~~~~~

* ``has-class`` XPath function handles newlines and other separators
  in class names properly;
* fixed parsing of HTML documents with null bytes;
* documentation improvements;
* Python 3.7 tests are run on CI; other test improvements.


1.5.0 (2018-07-04)
~~~~~~~~~~~~~~~~~~

* New ``Selector.attrib`` and ``SelectorList.attrib`` properties which make
  it easier to get attributes of HTML elements.
* CSS selectors became faster: compilation results are cached
  (LRU cache is used for ``css2xpath``), so there is
  less overhead when the same CSS expression is used several times.
* ``.get()`` and ``.getall()`` selector methods are documented and recommended
  over ``.extract_first()`` and ``.extract()``.
* Various documentation tweaks and improvements.

One more change is that ``.extract()`` and  ``.extract_first()`` methods
are now implemented using ``.get()`` and ``.getall()``, not the other
way around, and instead of calling ``Selector.extract`` all other methods
now call ``Selector.get`` internally. It can be **backwards incompatible**
in case of custom Selector subclasses which override ``Selector.extract``
without doing the same for ``Selector.get``. If you have such Selector
subclass, make sure ``get`` method is also overridden. For example, this::

    class MySelector(parsel.Selector):
        def extract(self):
            return super().extract() + " foo"

should be changed to this::

    class MySelector(parsel.Selector):
        def get(self):
            return super().get() + " foo"
        extract = get


1.4.0 (2018-02-08)
~~~~~~~~~~~~~~~~~~

* ``Selector`` and ``SelectorList`` can't be pickled because
  pickling/unpickling doesn't work for ``lxml.html.HtmlElement``;
  parsel now raises TypeError explicitly instead of allowing pickle to
  silently produce wrong output. This is technically backwards-incompatible
  if you're using Python < 3.6.


1.3.1 (2017-12-28)
~~~~~~~~~~~~~~~~~~

* Fix artifact uploads to pypi.


1.3.0 (2017-12-28)
~~~~~~~~~~~~~~~~~~

* ``has-class`` XPath extension function;
* ``parsel.xpathfuncs.set_xpathfunc`` is a simplified way to register
  XPath extensions;
* ``Selector.remove_namespaces`` now removes namespace declarations;
* Python 3.3 support is dropped;
* ``make htmlview`` command for easier Parsel docs development.
* CI: PyPy installation is fixed; parsel now runs tests for PyPy3 as well.


1.2.0 (2017-05-17)
~~~~~~~~~~~~~~~~~~

* Add ``SelectorList.get`` and ``SelectorList.getall``
  methods as aliases for ``SelectorList.extract_first``
  and ``SelectorList.extract`` respectively
* Add default value parameter to ``SelectorList.re_first`` method
* Add ``Selector.re_first`` method
* Add ``replace_entities`` argument on ``.re()`` and ``.re_first()``
  to turn off replacing of character entity references
* Bug fix: detect ``None`` result from lxml parsing and fallback with an empty document
* Rearrange XML/HTML examples in the selectors usage docs
* Travis CI:

  * Test against Python 3.6
  * Test against PyPy using "Portable PyPy for Linux" distribution


1.1.0 (2016-11-22)
~~~~~~~~~~~~~~~~~~

* Change default HTML parser to `lxml.html.HTMLParser <http://lxml.de/api/lxml.html.HTMLParser-class.html>`_,
  which makes easier to use some HTML specific features
* Add css2xpath function to translate CSS to XPath
* Add support for ad-hoc namespaces declarations
* Add support for XPath variables
* Documentation improvements and updates


1.0.3 (2016-07-29)
~~~~~~~~~~~~~~~~~~

* Add BSD-3-Clause license file
* Re-enable PyPy tests
* Integrate py.test runs with setuptools (needed for Debian packaging)
* Changelog is now called ``NEWS``


1.0.2 (2016-04-26)
~~~~~~~~~~~~~~~~~~

* Fix bug in exception handling causing original traceback to be lost
* Added docstrings and other doc fixes


1.0.1 (2015-08-24)
~~~~~~~~~~~~~~~~~~

* Updated PyPI classifiers
* Added docstrings for csstranslator module and other doc fixes


1.0.0 (2015-08-22)
~~~~~~~~~~~~~~~~~~

* Documentation fixes


0.9.6 (2015-08-14)
~~~~~~~~~~~~~~~~~~

* Updated documentation
* Extended test coverage


0.9.5 (2015-08-11)
~~~~~~~~~~~~~~~~~~

* Support for extending SelectorList


0.9.4 (2015-08-10)
~~~~~~~~~~~~~~~~~~

* Try workaround for travis-ci/dpl#253


0.9.3 (2015-08-07)
~~~~~~~~~~~~~~~~~~

* Add base_url argument


0.9.2 (2015-08-07)
~~~~~~~~~~~~~~~~~~

* Rename module unified -> selector and promoted root attribute
* Add create_root_node function


0.9.1 (2015-08-04)
~~~~~~~~~~~~~~~~~~

* Setup Sphinx build and docs structure
* Build universal wheels
* Rename some leftovers from package extraction


0.9.0 (2015-07-30)
~~~~~~~~~~~~~~~~~~

* First release on PyPI.