Parsing HTML5 data-* attribute values with Selenium in Python
Solution 1
If you have elements like the following:
<rect rx="3" ry="3" width="76%" height="40" transform="translate(0,40)" data-value="75" class="bar">bar1</rect>
<rect rx="3" ry="3" width="76%" height="40" transform="translate(0,40)" data-value="76" class="bar">bar2</rect>
You can get the text value and the attribute value as follows:
elements = driver.find_elements_by_class_name('bar')
for element in elements:
print element.text
print element.get_attribute('data-value')
This prints out:
bar1
75
bar2
76
Solution 2
You mention you tried:
for text in driver.find_elements_by_class_name('bar'):
print(data_value.text)
Seeing as data_value
is not defined anywhere, it won't work. If you did print(text.text)
you should get the text of each element that has a bar
class. (This is essentially what you do in your first snippet.)
You also mention this:
for data in driver.find_elements_by_xpath('//*[contains(@data-value)]/@data-value'):
print(data.text)
This cannot work because Selenium's find_element(s)...
functions cannot return anything else than elements or lists of elements. You are trying to get it to return an attribute, which won't work. XPath generally allows it, but when you use XPath through Selenium you cannot get anything else than elements.
You could do what Jessamyn Smith suggested or:
results = driver.execute_script("""
var els = document.getElementsByClassName("bar");
var ret = [];
for (var i =0, el; (el = els[i]); ++i) {
ret.push([el.textContent, el.attributes["data-value"].value]);
}
return ret;
""")
for r in results:
print(r[0], r[1])
This will take one round-trip between your script and the browser. Looping and using .text
and .get_attribute()
involves 2 round-trips per iteration. The JavasScript builds a list of pairs of results. Each pair contains the text of the element in the first position, and the value of data-value
in the second position.
metersk
Updated on July 01, 2022Comments
-
metersk almost 2 years
I am parsing a JS generated webpage like so:
from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC driver = webdriver.Firefox() driver.get('https://www.consumerbarometer.com/en/graph-builder/?question=M1&filter=country:singapore,canada,mexico,brazil,argentina,united_states,bulgaria,austria,belgium,croatia,czech_republic,denmark,estonia,finland,france,germany,greece,hungary,italy,ireland,latvia,lithuania,norway,netherlands,poland,portugal,russia,romania,serbia,slovakia,spain,slovenia,sweden,switzerland,ukraine,united_kingdom,australia,china,israel,hong_kong_sar,japan,korea,new_zealand,malaysia,taiwan,turkey,vietnam') // wait for svg to appear WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.TAG_NAME, 'svg'))) for text in driver.find_elements_by_class_name('bar-text-label'): print(text.text) driver.close()
Besides getting the
text
from the classbar-text-label
I would also like to get values from an HTML5 data-attribute. For example,<rect rx="3" ry="3" width="76%" height="40" transform="translate(0,40)" data-value="76" class="bar"></rect>
and I would like to be able to parse76
from this.Is this possible to do in Selenium?
I tried both of the below, with no sucess:
for text in driver.find_elements_by_class_name('bar'): print(data_value.text) for data in driver.find_elements_by_xpath('//*[contains(@data-value)]/@data-value'): print(data.text)
-
metersk about 9 yearsThis is very interesting. I did not know you could execute js like that.
-
Louis about 9 yearsI did not know either at first. If you run everything locally, the difference is not great but if you use Sauce Labs, Browser Stack or something to run tests remotely, the round-trips add up a lot. I've reduced the time it takes to run large test suites in half by combining multiple Selenium calls into a single
execute_script
(orexecute_script_async
) call.