Content Gardening

22 Apr 2018

This article starts a series where we are going to discuss how to use Selenium, from Python, for automating website navigation and doing visual data scraping.

We will go through the installation steps and start with a couple of examples.

Installation step 1: Python bindings for Selenium

You need the Python "selenium" package, which you can install by doing pip install selenium.

Even better, use "virtualenv" to create isolated Python environments. In my case, as discussed in previous posts, I currently use "Pyenv" to manage multiple Python installations, including virtual environments.

To create my environment, I could do:

pyenv virtualenv 3.6.5 env_selenium

This gives me a virtual environment for any Selenium-related work I have to do. Of course, to get started, I have to activate that environment and then "pip install" the selenium package.

pyenv activate env_selenium

pip install selenium

Installation step 2: the Chrome driver

Selenium requires a driver to interface with the chosen browser: Firefox requires "geckodriver", while Chrome requires "Chromedriver".

I will focus on the Chrome driver, which I have experience with. I recently needed to install it on my Linux Ubuntu box for a project and it is now part of my standard toolset.

To use Selenium with the Chrome driver, you need to install Chrome itself (if it is not yet installed) and then ChromeDriver:

On Ubuntu 16.04, the following commands might help install Chrome:

# install dependencies
sudo apt-get update
sudo apt-get install -y unzip openjdk-8-jre-headless xvfb libxi6 libgconf-2-4

# Chrome
sudo curl -sS -o - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add
sudo echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list
sudo apt-get -y update
sudo apt-get -y install google-chrome-stable

These other commands help install ChromeDriver version 2.38, by downloading the release archive:

wget -N https://chromedriver.storage.googleapis.com/2.38/chromedriver_linux64.zip -P ~/
unzip ~/chromedriver_linux64.zip -d ~/
rm ~/chromedriver_linux64.zip
sudo mv -f ~/chromedriver /usr/local/bin/chromedriver
sudo chown root:root /usr/local/bin/chromedriver
sudo chmod 0755 /usr/local/bin/chromedriver

Next, let's see examples of using Selenium with ChromeDriver.

Check the source of a web page

If you execute the following code snippet in your Python virtualenv, you should see the HTML content of the Python.org homepage displayed in the terminal as the output:

from selenium import webdriver

driver = webdriver.Chrome()
driver.get("http://www.python.org") 

print(driver.page_source)
driver.close()

What are we doing here?

We need to import the "selenium.webdriver" module which provides all the WebDriver implementations, including Firefox and Chrome.

First, the instance of Chrome WebDriver is created, with driver = webdriver.Chrome().

The driver.get method will go to the page corresponding to the passed URL. The driver waits until the page has fully loaded and then returns control to our script.

We can get the source of the page using the page_source attribute on the driver object.

Finally, the browser window is closed with driver.close().

In some cases, such as when running the script via an SSH session on a server, you need Chrome to run in headless mode. To ensure that, Selenium allows us to tweak the Chrome options, using selenium.webdriver.chrome.options.Options(), as you can see in the actual code I use in my real use cases:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
options.set_headless(headless=True)

driver = webdriver.Chrome(chrome_options=options)
driver.get("http://www.python.org") 

print(driver.page_source)
driver.close()

Closing the driver using Python's Context Manager feature

I like using Python's context managers feature and the with syntax, but unfortunately, it is not directly possible here, since the driver object that we get is not a Python context manager.

No problem, the standard contextlib.closing can help. So the version of the code that I prefer using is:

from contextlib import closing

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
options.set_headless(headless=True)

with closing(webdriver.Chrome(chrome_options=options)) as driver:
    driver.get("http://www.python.org") 

    print(driver.page_source)

Understand when you really need Chrome to quit

It is important to call the close() method function on the driver when you are done, since it closes the browser page or tab which currently has the focus. But if you want to shutdown the driver instance, which will close all browser windows and clean resources used in memory and on disk (temporary files), you have to call the quit() method function on the driver.

There are situations where what you really want is to quit the driver, so you would need the following version:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
options.set_headless(headless=True)

driver = webdriver.Chrome(chrome_options=options)
driver.get("http://www.python.org") 

print(driver.page_source)
driver.quit()

Send a query via the search form and get the results

As a second example, let's see and try this code:

from contextlib import closing

from selenium import webdriver
from selenium.webdriver.common.keys import Keys 
from selenium.webdriver.chrome.options import Options

options = Options()
options.set_headless(headless=True))

with closing(webdriver.Chrome(chrome_options=options)) as driver:
    driver.get("http://www.python.org") 

    elem = driver.find_element_by_name("q")
    elem.clear()
    elem.send_keys("pycon")
    elem.send_keys(Keys.RETURN) 

    print(driver.page_source)

There is a new import line needed for the Keys class which provide keys in the keyboard like RETURN, CTRL, and ALT.

WebDriver offers a number of ways to find elements using one of the "find_elementby*" methods. For example, the search input text element can be located by its name attribute using the find_element_by_name method.

Next, we send keys; this is similar to entering keys using your keyboard. Using elem.clear() first to clear any text that could have been injected by default.

After submission of the page, using the RETURN key, we should get the results.

Actually extracting the information from the results page

What is interesting also is getting to specific pieces of information that are present in the results page. So we could use the driver.find_elements_by_tag_name() function to find div elements for example. This function returns a list, and we can iterate through it to get the contents.

Our consolidated code would be:

from contextlib import closing

from selenium import webdriver
from selenium.webdriver.common.keys import Keys 
from selenium.webdriver.chrome.options import Options

options = Options()
options.set_headless(headless=True)

with closing(webdriver.Chrome(chrome_options=options)) as driver:
    driver.get("http://www.python.org") 

    elem = driver.find_element_by_name("q")
    elem.clear()
    elem.send_keys("pycon")
    elem.send_keys(Keys.RETURN) 

    elems = driver.find_elements_by_tag_name("div")
    for elem in elems:
        print(elem.text)

Wrapping things up

This was just scratching the surface, but we have seen the main part of the techniques you would use to start with web browsing automation and web scraping. In the next articles, we will look at more possibilities, including how other libraries could be combined with Selenium for more productivity.

Python, Selenium, And ChromeDriver: Automating Web Interactions - Part 1