Showing posts with label Python. Show all posts
Showing posts with label Python. Show all posts

Sunday, 27 September 2020

Getting started with Python Web Scraping

Getting started with Python on a Mac was fairly straightforward, but I had a few stumbling blocks on Windows. The easiest way to get started with Python, a decent IDE & terminal, and additional libraries, was to install Anaconda on Windows, and use the Spyder IDE.

Using Beautiful Soup for web scraping, the following is a script I wrote to get the current top non fiction audiobooks from Audible:

# Script to get top Audible personal development books  
import requests 
from bs4 import BeautifulSoup
 
URL = 'https://www.amazon.co.uk/Best-Sellers-Books-Self-Help-How/zgbs/books/2996349031/ref=zg_bs_nav_b_3_2996114031' 
page = requests.get(URL) 
soup = BeautifulSoup(page.content, 'html.parser') 
results = soup.find('ol', class_ = 'a-ordered-list a-vertical') 

list_elems = results.find_all('li', class_ = 'zg-item-immersion') 
for list_elem in list_elems[:50]: 
    rank_elem = list_elem.find('span', class_ = 'zg-badge-text') 
    title_elem = list_elem.find('div', class_ = 'p13n-sc-truncate p13n-sc-line-clamp-1') 
    author_elem = list_elem.find('span', class_ = 'a-size-small a-color-base') 
  
    title_elem = title_elem.text.replace('  ', '') 
    title_elem = title_elem.replace('\n', '') 
 
    print(rank_elem.text.replace('#', '') + ' - ' + title_elem + ' - ' + author_elem.text.replace('\t','')) 

Returns the following:


 

Updating massive amount of rows whilst avoiding blocking

The following SQL is a good means to split an update on a massive table into smaller chunks, whilst reducing blocking. The method is to upda...