Sunday, 27 September 2020

Getting started with Python Web Scraping

Getting started with Python on a Mac was fairly straightforward, but I had a few stumbling blocks on Windows. The easiest way to get started with Python, a decent IDE & terminal, and additional libraries, was to install Anaconda on Windows, and use the Spyder IDE.

Using Beautiful Soup for web scraping, the following is a script I wrote to get the current top non fiction audiobooks from Audible:

# Script to get top Audible personal development books  
import requests 
from bs4 import BeautifulSoup
 
URL = 'https://www.amazon.co.uk/Best-Sellers-Books-Self-Help-How/zgbs/books/2996349031/ref=zg_bs_nav_b_3_2996114031' 
page = requests.get(URL) 
soup = BeautifulSoup(page.content, 'html.parser') 
results = soup.find('ol', class_ = 'a-ordered-list a-vertical') 

list_elems = results.find_all('li', class_ = 'zg-item-immersion') 
for list_elem in list_elems[:50]: 
    rank_elem = list_elem.find('span', class_ = 'zg-badge-text') 
    title_elem = list_elem.find('div', class_ = 'p13n-sc-truncate p13n-sc-line-clamp-1') 
    author_elem = list_elem.find('span', class_ = 'a-size-small a-color-base') 
  
    title_elem = title_elem.text.replace('  ', '') 
    title_elem = title_elem.replace('\n', '') 
 
    print(rank_elem.text.replace('#', '') + ' - ' + title_elem + ' - ' + author_elem.text.replace('\t','')) 

Returns the following:


 

No comments:

Post a Comment

Updating massive amount of rows whilst avoiding blocking

The following SQL is a good means to split an update on a massive table into smaller chunks, whilst reducing blocking. The method is to upda...