lxml.cssselect module
CSS Selectors based on XPath.
This module supports selecting XML/HTML tags based on CSS selectors. See the CSSSelector class for details.
This is a thin wrapper around cssselect 0.7 or later.
- class lxml.cssselect.CSSSelector(css, namespaces=None, translator='xml')[source]
Bases:
XPath
A CSS selector.
Usage:
>>> from lxml import etree, cssselect >>> select = cssselect.CSSSelector("a tag > child") >>> root = etree.XML("<a><b><c/><tag><child>TEXT</child></tag></b></a>") >>> [ el.tag for el in select(root) ] ['child']
To use CSS namespaces, you need to pass a prefix-to-namespace mapping as
namespaces
keyword argument:>>> rdfns = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#' >>> select_ns = cssselect.CSSSelector('root > rdf|Description', ... namespaces={'rdf': rdfns}) >>> rdf = etree.XML(( ... '<root xmlns:rdf="%s">' ... '<rdf:Description>blah</rdf:Description>' ... '</root>') % rdfns) >>> [(el.tag, el.text) for el in select_ns(rdf)] [('{http://www.w3.org/1999/02/22-rdf-syntax-ns#}Description', 'blah')]
- evaluate(self, _eval_arg, **_variables)
Evaluate an XPath expression.
Instead of calling this method, you can also call the evaluator object itself.
Variables may be provided as keyword arguments. Note that namespaces are currently not supported for variables.
- Deprecated:
call the object, not its method.
- error_log
- path
The literal XPath expression.
- class lxml.cssselect.LxmlHTMLTranslator(xhtml=False)[source]
Bases:
LxmlTranslator
,HTMLTranslator
lxml extensions + HTML support.
- xpathexpr_cls
alias of
XPathExpr
- css_to_xpath(css, prefix='descendant-or-self::')
Translate a group of selectors to XPath.
Pseudo-elements are not supported here since XPath only knows about “real” elements.
- Parameters:
css – A group of selectors as an Unicode string.
prefix – This string is prepended to the XPath expression for each selector. The default makes selectors scoped to the context node’s subtree.
- Raises:
SelectorSyntaxError
on invalid selectors,ExpressionError
on unknown/unsupported selectors, including pseudo-elements.- Returns:
The equivalent XPath 1.0 expression as an Unicode string.
- pseudo_never_matches(xpath)
Common implementation for pseudo-classes that never match.
- selector_to_xpath(selector, prefix='descendant-or-self::', translate_pseudo_elements=False)
Translate a parsed selector to XPath.
- Parameters:
selector – A parsed
Selector
object.prefix – This string is prepended to the resulting XPath expression. The default makes selectors scoped to the context node’s subtree.
translate_pseudo_elements – Unless this is set to
True
(ascss_to_xpath()
does), thepseudo_element
attribute of the selector is ignored. It is the caller’s responsibility to reject selectors with pseudo-elements, or to account for them somehow.
- Raises:
ExpressionError
on unknown/unsupported selectors.- Returns:
The equivalent XPath 1.0 expression as an Unicode string.
- xpath(parsed_selector)
Translate any parsed selector object.
- xpath_active_pseudo(xpath)
Common implementation for pseudo-classes that never match.
- xpath_attrib(selector)
Translate an attribute selector.
- xpath_attrib_dashmatch(xpath, name, value)
- xpath_attrib_different(xpath, name, value)
- xpath_attrib_equals(xpath, name, value)
- xpath_attrib_exists(xpath, name, value)
- xpath_attrib_includes(xpath, name, value)
- xpath_attrib_prefixmatch(xpath, name, value)
- xpath_attrib_substringmatch(xpath, name, value)
- xpath_attrib_suffixmatch(xpath, name, value)
- xpath_checked_pseudo(xpath)
Common implementation for pseudo-classes that never match.
- xpath_child_combinator(left, right)
right is an immediate child of left
- xpath_class(class_selector)
Translate a class selector.
- xpath_combinedselector(combined)
Translate a combined selector.
- xpath_contains_function(xpath, function)
- xpath_descendant_combinator(left, right)
right is a child, grand-child or further descendant of left
- xpath_direct_adjacent_combinator(left, right)
right is a sibling immediately after left
- xpath_disabled_pseudo(xpath)
Common implementation for pseudo-classes that never match.
- xpath_element(selector)
Translate a type or universal selector.
- xpath_empty_pseudo(xpath)
- xpath_enabled_pseudo(xpath)
Common implementation for pseudo-classes that never match.
- xpath_first_child_pseudo(xpath)
- xpath_first_of_type_pseudo(xpath)
- xpath_focus_pseudo(xpath)
Common implementation for pseudo-classes that never match.
- xpath_function(function)
Translate a functional pseudo-class.
- xpath_hash(id_selector)
Translate an ID selector.
- xpath_hover_pseudo(xpath)
Common implementation for pseudo-classes that never match.
- xpath_indirect_adjacent_combinator(left, right)
right is a sibling after left, immediately or not
- xpath_lang_function(xpath, function)
- xpath_last_child_pseudo(xpath)
- xpath_last_of_type_pseudo(xpath)
- xpath_link_pseudo(xpath)
Common implementation for pseudo-classes that never match.
- static xpath_literal(s)
- xpath_negation(negation)
- xpath_nth_child_function(xpath, function, last=False, add_name_test=True)
- xpath_nth_last_child_function(xpath, function)
- xpath_nth_last_of_type_function(xpath, function)
- xpath_nth_of_type_function(xpath, function)
- xpath_only_child_pseudo(xpath)
- xpath_only_of_type_pseudo(xpath)
- xpath_pseudo(pseudo)
Translate a pseudo-class.
- xpath_pseudo_element(xpath, pseudo_element)
Translate a pseudo-element.
Defaults to not supporting pseudo-elements at all, but can be overridden by sub-classes.
- xpath_root_pseudo(xpath)
- xpath_scope_pseudo(xpath)
- xpath_target_pseudo(xpath)
Common implementation for pseudo-classes that never match.
- xpath_visited_pseudo(xpath)
Common implementation for pseudo-classes that never match.
- attribute_operator_mapping = {'!=': 'different', '$=': 'suffixmatch', '*=': 'substringmatch', '=': 'equals', '^=': 'prefixmatch', 'exists': 'exists', '|=': 'dashmatch', '~=': 'includes'}
- combinator_mapping = {' ': 'descendant', '+': 'direct_adjacent', '>': 'child', '~': 'indirect_adjacent'}
- id_attribute = 'id'
The attribute used for ID selectors depends on the document language: http://www.w3.org/TR/selectors/#id-selectors
- lang_attribute = 'lang'
The attribute used for
:lang()
depends on the document language: http://www.w3.org/TR/selectors/#lang-pseudo
- lower_case_attribute_names = False
- lower_case_attribute_values = False
- lower_case_element_names = False
The case sensitivity of document language element names, attribute names, and attribute values in selectors depends on the document language. http://www.w3.org/TR/selectors/#casesens
When a document language defines one of these as case-insensitive, cssselect assumes that the document parser makes the parsed values lower-case. Making the selector lower-case too makes the comparaison case-insensitive.
In HTML, element names and attributes names (but not attribute values) are case-insensitive. All of lxml.html, html5lib, BeautifulSoup4 and HTMLParser make them lower-case in their parse result, so the assumption holds.
- class lxml.cssselect.LxmlTranslator[source]
Bases:
GenericTranslator
A custom CSS selector to XPath translator with lxml-specific extensions.
- xpathexpr_cls
alias of
XPathExpr
- css_to_xpath(css, prefix='descendant-or-self::')
Translate a group of selectors to XPath.
Pseudo-elements are not supported here since XPath only knows about “real” elements.
- Parameters:
css – A group of selectors as an Unicode string.
prefix – This string is prepended to the XPath expression for each selector. The default makes selectors scoped to the context node’s subtree.
- Raises:
SelectorSyntaxError
on invalid selectors,ExpressionError
on unknown/unsupported selectors, including pseudo-elements.- Returns:
The equivalent XPath 1.0 expression as an Unicode string.
- pseudo_never_matches(xpath)
Common implementation for pseudo-classes that never match.
- selector_to_xpath(selector, prefix='descendant-or-self::', translate_pseudo_elements=False)
Translate a parsed selector to XPath.
- Parameters:
selector – A parsed
Selector
object.prefix – This string is prepended to the resulting XPath expression. The default makes selectors scoped to the context node’s subtree.
translate_pseudo_elements – Unless this is set to
True
(ascss_to_xpath()
does), thepseudo_element
attribute of the selector is ignored. It is the caller’s responsibility to reject selectors with pseudo-elements, or to account for them somehow.
- Raises:
ExpressionError
on unknown/unsupported selectors.- Returns:
The equivalent XPath 1.0 expression as an Unicode string.
- xpath(parsed_selector)
Translate any parsed selector object.
- xpath_active_pseudo(xpath)
Common implementation for pseudo-classes that never match.
- xpath_attrib(selector)
Translate an attribute selector.
- xpath_attrib_dashmatch(xpath, name, value)
- xpath_attrib_different(xpath, name, value)
- xpath_attrib_equals(xpath, name, value)
- xpath_attrib_exists(xpath, name, value)
- xpath_attrib_includes(xpath, name, value)
- xpath_attrib_prefixmatch(xpath, name, value)
- xpath_attrib_substringmatch(xpath, name, value)
- xpath_attrib_suffixmatch(xpath, name, value)
- xpath_checked_pseudo(xpath)
Common implementation for pseudo-classes that never match.
- xpath_child_combinator(left, right)
right is an immediate child of left
- xpath_class(class_selector)
Translate a class selector.
- xpath_combinedselector(combined)
Translate a combined selector.
- xpath_descendant_combinator(left, right)
right is a child, grand-child or further descendant of left
- xpath_direct_adjacent_combinator(left, right)
right is a sibling immediately after left
- xpath_disabled_pseudo(xpath)
Common implementation for pseudo-classes that never match.
- xpath_element(selector)
Translate a type or universal selector.
- xpath_empty_pseudo(xpath)
- xpath_enabled_pseudo(xpath)
Common implementation for pseudo-classes that never match.
- xpath_first_child_pseudo(xpath)
- xpath_first_of_type_pseudo(xpath)
- xpath_focus_pseudo(xpath)
Common implementation for pseudo-classes that never match.
- xpath_function(function)
Translate a functional pseudo-class.
- xpath_hash(id_selector)
Translate an ID selector.
- xpath_hover_pseudo(xpath)
Common implementation for pseudo-classes that never match.
- xpath_indirect_adjacent_combinator(left, right)
right is a sibling after left, immediately or not
- xpath_lang_function(xpath, function)
- xpath_last_child_pseudo(xpath)
- xpath_last_of_type_pseudo(xpath)
- xpath_link_pseudo(xpath)
Common implementation for pseudo-classes that never match.
- static xpath_literal(s)
- xpath_negation(negation)
- xpath_nth_child_function(xpath, function, last=False, add_name_test=True)
- xpath_nth_last_child_function(xpath, function)
- xpath_nth_last_of_type_function(xpath, function)
- xpath_nth_of_type_function(xpath, function)
- xpath_only_child_pseudo(xpath)
- xpath_only_of_type_pseudo(xpath)
- xpath_pseudo(pseudo)
Translate a pseudo-class.
- xpath_pseudo_element(xpath, pseudo_element)
Translate a pseudo-element.
Defaults to not supporting pseudo-elements at all, but can be overridden by sub-classes.
- xpath_root_pseudo(xpath)
- xpath_scope_pseudo(xpath)
- xpath_target_pseudo(xpath)
Common implementation for pseudo-classes that never match.
- xpath_visited_pseudo(xpath)
Common implementation for pseudo-classes that never match.
- attribute_operator_mapping = {'!=': 'different', '$=': 'suffixmatch', '*=': 'substringmatch', '=': 'equals', '^=': 'prefixmatch', 'exists': 'exists', '|=': 'dashmatch', '~=': 'includes'}
- combinator_mapping = {' ': 'descendant', '+': 'direct_adjacent', '>': 'child', '~': 'indirect_adjacent'}
- id_attribute = 'id'
The attribute used for ID selectors depends on the document language: http://www.w3.org/TR/selectors/#id-selectors
- lang_attribute = 'xml:lang'
The attribute used for
:lang()
depends on the document language: http://www.w3.org/TR/selectors/#lang-pseudo
- lower_case_attribute_names = False
- lower_case_attribute_values = False
- lower_case_element_names = False
The case sensitivity of document language element names, attribute names, and attribute values in selectors depends on the document language. http://www.w3.org/TR/selectors/#casesens
When a document language defines one of these as case-insensitive, cssselect assumes that the document parser makes the parsed values lower-case. Making the selector lower-case too makes the comparaison case-insensitive.
In HTML, element names and attributes names (but not attribute values) are case-insensitive. All of lxml.html, html5lib, BeautifulSoup4 and HTMLParser make them lower-case in their parse result, so the assumption holds.