Selenium find all elements by xpath

23,292

To grab all prices on that page you should use such XPATH:

Header = driver.find_elements_by_xpath("//span[contains(concat(' ', normalize-space(@class), ' '), 'price-amount')]")

which means: find all span elements with class=price-amount, why so complex - see here

But more simply to find the same elements is by CSS locator:

.price-amount
Share:
23,292
Serious Ruffy
Author by

Serious Ruffy

Updated on December 13, 2020

Comments

  • Serious Ruffy
    Serious Ruffy over 3 years

    I used selenium to scrap a scrolling website and conducted the code below

    import requests
    from bs4 import BeautifulSoup
    import csv
    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    import unittest
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    import time
    import unittest
    import re
    
    output_file = open("Kijubi.csv", "w", newline='')  
    
    class Crawling(unittest.TestCase):
        def setUp(self):
            self.driver = webdriver.Firefox()
            self.driver.set_window_size(1024, 768)
            self.base_url = "http://www.viatorcom.de/"
            self.accept_next_alert = True
    
        def test_sel(self):
            driver = self.driver
            delay = 3
            driver.get(self.base_url + "de/7132/Seoul/d973-allthingstodo")
            for i in range(1,1):
                driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
                time.sleep(2)
        html_source = driver.page_source
        data = html_source.encode("utf-8")
    

    My next step was to crawl specific information from the website like the price.

    Hence, I added the following code:

     all_spans = driver.find_elements_by_xpath("/html/body/div[5]/div/div[3]/div[2]/div[2]/div[1]/div[1]/div")
        print(all_spans)
        for price in all_spans:
            Header = driver.find_elements_by_xpath("/html/body/div[5]/div/div[3]/div[2]/div[2]/div[1]/div[1]/div/div[2]/div[2]/span[2]")
            for span in Header:
                print(span.text)
    

    But I get just one price instead all of them. Could you provide me feedback on what I could improve my code? Thanks:)

    EDIT

    Thanks to your guys I managed to get it running. Here is the additional code:

        elements = driver.find_elements_by_xpath("//div[@id='productList']/div/div")
    
        innerElements = 15
    
        outerElements = len(elements)/innerElements
    
        print(innerElements,  "\t", outerElements, "\t", len(elements))
    
        for j in range(1, int(outerElements)):
    
            for i in range(1, int(innerElements)):
    
    
                headline = driver.find_element_by_xpath("//div[@id='productList']/div["+str(j)+"]/div["+str(i)+"]/div/div[2]/h2/a").text
    
                price = driver.find_element_by_xpath("//div[@id='productList']/div["+str(j)+"]/div["+str(i)+"]/div/div[2]/div[2]/span[2]").text
                deeplink = driver.find_element_by_xpath("//div[@id='productList']/div["+str(j)+"]/div["+str(i)+"]/div/div[2]/h2/a").get_attribute("href")
    
                print("Header: " + headline + " | " + "Price: " + price + " | " + "Deeplink: " + deeplink)
    

    Now my last issue is that I still do not get the last 20 prices back, which have a English description. I only get back the prices which have German description. For English ones, they do not get fetched although they share the same html structure.

    E.g. html structure for the English items

         headline =   driver.find_element_by_xpath("//div[@id='productList']/div[6]/div[1]/div/div[2]/h2/a")
    

    Do you guys know what I have to modify? Any feedback is appreciated:)

  • Serious Ruffy
    Serious Ruffy over 8 years
    Thanks for your feedback