navigating through pagination with selenium in python

13,092

Solution 1

Before moving on to automating any scenario, always write down the manual steps you would perform to execute the scenario. Manual steps for what you want to (which I understand from the question) is -

1) Go to site - https://services.wiltshire.gov.uk/PlanningGIS/LLPG/WeeklyList

2) Select first week option

3) Click search

4) Get the data from every page

5) Load the url again

6) Select second week option

7) Click search

8) Get the data from every page

.. and so on.

You are having a loop to select different weeks but inside each loop iteration for the week option, you also need to include a loop to iterate over all the pages. Since you are not doing that, your code is returning only the data from the first page.

Another problem is with how you are locaing the 'Next' button -

driver.find_element_by_xpath('//*[@id="form1"]/div[3]/a[4]').click()

You are selecting the 4th <a> element which is ofcourse not robust because in different pages, the Next button's index will be different. Instead, use this better locator -

driver.find_element_by_xpath("//a[contains(text(),'Next')]").click()

Logic for creating loop which will iterate through pages -

First you will need the number of pages. I did that by locating the <a> immediately before the "Next" button. As per the screenshot below, it is clear that the text of this element will be equal to the number of pages -

screenshot -

I did that using following code -

number_of_pages = int(driver.find_element_by_xpath("//a[contains(text(),'Next')]/preceding-sibling::a[1]").text)

Now once you have number of pages as number_of_pages, you only need to click "Next" button number_of_pages - 1 times!

Final code for your main function-

def main():
 all_data = []
 select = Select(driver.find_element_by_xpath("//select[@class='formitem' and @id='selWeek']"))
 list_options = select.options

 for item in range(len(list_options)):
    select = Select(driver.find_element_by_xpath("//select[@class='formitem' and @id='selWeek']"))
    select.select_by_index(str(item))
    driver.find_element_by_css_selector("input.formbutton#csbtnSearch").click()
    number_of_pages = int(driver.find_element_by_xpath("//a[contains(text(),'Next')]/preceding-sibling::a[1]").text)
    for j in range(number_of_pages - 1):
      all_data.extend(getData())
      driver.find_element_by_xpath("//a[contains(text(),'Next')]").click()
      time.sleep(1)
    driver.get(url)

 with open( 'wiltshire.json', 'w+' ) as f:
    json.dump( all_data, f )
 driver.quit()

Solution 2

Following approach is simply worked for me.

driver.find_element_by_link_text("3").click()
driver.find_element_by_link_text("4").click()
....
driver.find_element_by_link_text("Next").click()    

Solution 3

first get the total pages in the pagination, using

ins.get('https://services.wiltshire.gov.uk/PlanningGIS/LLPG/WeeklyList/10702380,1')
ins.find_element_by_class_name("pagination")
source = BeautifulSoup(ins.page_source)
div = source.find_all('div', {'class':'pagination'})
all_as = div[0].find_all('a')
total = 0

for i in range(len(all_as)):
    if 'Next' in all_as[i].text:
        total = all_as[i-1].text
        break

Now just loop through the range

for i in range(total):
 ins.get('https://services.wiltshire.gov.uk/PlanningGIS/LLPG/WeeklyList/10702380,{}'.format(count))

keep incrementing the count and get the source code for the page and then get the data for it. Note: Don't forget the sleep when clicking on going form one page to another

Share:
13,092

Related videos on Youtube

Abdul Jamac
Author by

Abdul Jamac

Software Engineer

Updated on June 04, 2022

Comments

  • Abdul Jamac
    Abdul Jamac almost 2 years

    I'm scraping this website using Python and Selenium. I have the code working but it currently only scrapes the first page, I would like to iterate through all the pages and scrape them all but they handle pagination in a weird way how would I go through the pages and scrape them one by one?

    Pagination HTML:

    <div class="pagination">
        <a href="/PlanningGIS/LLPG/WeeklyList/41826123,1" title="Go to first page">First</a>
        <a href="/PlanningGIS/LLPG/WeeklyList/41826123,1" title="Go to previous page">Prev</a>
        <a href="/PlanningGIS/LLPG/WeeklyList/41826123,1" title="Go to page 1">1</a>
        <span class="current">2</span>
        <a href="/PlanningGIS/LLPG/WeeklyList/41826123,3" title="Go to page 3">3</a>
        <a href="/PlanningGIS/LLPG/WeeklyList/41826123,4" title="Go to page 4">4</a>
        <a href="/PlanningGIS/LLPG/WeeklyList/41826123,3" title="Go to next page">Next</a>
        <a href="/PlanningGIS/LLPG/WeeklyList/41826123,4" title="Go to last page">Last</a>
    </div>
    

    Scraper:

    import re
    import json
    import requests
    from selenium import webdriver
    from selenium.webdriver.support.ui import Select
    from selenium.webdriver.chrome.options import Options
    
    options = Options()
    # options.add_argument('--headless')
    options.add_argument("start-maximized")
    options.add_argument('disable-infobars')
    driver=webdriver.Chrome(chrome_options=options, 
    executable_path=r'/Users/weaabduljamac/Downloads/chromedriver')
    
    url = 'https://services.wiltshire.gov.uk/PlanningGIS/LLPG/WeeklyList'
    driver.get(url)
    
    def getData():
      data = []
      rows = driver.find_element_by_xpath('//*[@id="form1"]/table/tbody').find_elements_by_tag_name('tr')
     for row in rows:
        app_number = row.find_elements_by_tag_name('td')[1].text
        address =  row.find_elements_by_tag_name('td')[2].text
        proposals =  row.find_elements_by_tag_name('td')[3].text
        status =  row.find_elements_by_tag_name('td')[4].text
        data.append({"CaseRef": app_number, "address": address, "proposals": proposals, "status": status})
    print(data)
    return data
    
    
    def main():
     all_data = []
     select = Select(driver.find_element_by_xpath("//select[@class='formitem' and @id='selWeek']"))
     list_options = select.options
    
     for item in range(len(list_options)):
        select = Select(driver.find_element_by_xpath("//select[@class='formitem' and @id='selWeek']"))
        select.select_by_index(str(item))
        driver.find_element_by_css_selector("input.formbutton#csbtnSearch").click()
        all_data.extend( getData() )
        driver.find_element_by_xpath('//*[@id="form1"]/div[3]/a[4]').click()
        driver.get(url)
    
     with open( 'wiltshire.json', 'w+' ) as f:
        json.dump( all_data, f )
     driver.quit()
    
    
    if __name__ == "__main__":
        main()
    
    • Swadhikar
      Swadhikar over 5 years
      Have you tried scroll down using javascript?
    • undetected Selenium
      undetected Selenium over 5 years
      Where is the code block to handle pagination?
    • Abdul Jamac
      Abdul Jamac over 5 years
      @DebanjanB thats what I need help with because the first three anchor tags are always the page your on I don't how to iterate through it since every page has different pages
  • Abdul Jamac
    Abdul Jamac over 5 years
    thank you so much this worked and made perfect sense
  • Abdul Jamac
    Abdul Jamac over 5 years
    quick question whats this doing 'preceding-sibling::a[1]' is it getting the number 13
  • Shivam Mishra
    Shivam Mishra over 5 years
    @AbdulJamac yes, I have also mentioned that in the updated answer.
  • Abdul Jamac
    Abdul Jamac over 5 years
    so now whats you have the 13 does it click the next button 13 times and everytime it clicks the button it -1 from the number_of_pages
  • Shivam Mishra
    Shivam Mishra over 5 years
    @AbdulJamac Suppose if there are 13 pages, the inner loop will be executed 12 times. Since the first page is already loaded, we only have to click the Next button 12 (13 -1) times.
  • Abdul Jamac
    Abdul Jamac over 5 years
    hey Ive got another navigation question if you can help it would be great stackoverflow.com/questions/51987761/…
  • Shivam Mishra
    Shivam Mishra over 5 years
    @AbdulJamac hey, it seems that you already accepted an answer to this question. Nevertheless I gave an another solution which you can try:
  • Heinz
    Heinz over 4 years
    what if the pagination is implemented as "Previous" and "Next", to move around you have to click the two links?