ProgBlog: November 2013

Tuesday, November 19, 2013

Installing Selenium WebDriver and Python on Amazon Linux

I'm configuring a headless server, hosted on AWS, for browser-based automated testing. I'll use Selenium WebDriver for browser automation, and I've chosen Python as the programming language.

I've seen instructions for setting up Selenium headless automated testing in Ubuntu. However, I decided to try Amazon Linux instead of Ubuntu, so those instructions were a good start, but not exactly what I needed.

I also borrowed heavily from this excellent article on installing Firefox on Amazon Linux.

And my Python code is straight from the example in the Selenium documentation.

Here are the steps that worked for me:

One-time initial setup:

Creating an AWS Instance:

Log in to AWS and create a new Instance from Amazon Linux AMI 2013.09.1 (64-bit).
I made my Instance a t1.micro, but you might want to choose something more powerful that will run faster.
I created a new Security Group. You can do the same, or use an existing one. At a minimum, you must give your IP address access on port 22 (SSH).
I associated my existing keypair with the Instance. If you don't already have a keypair, you can create a new one.
I named my Instance tester. You can of course name it anything you like, or even leave the name blank.

Connect to the Instance using your favorite SSH client. I used the default browser-based Java client. The username is ec2-user.

Install Selenium (which requires pip, which in turn requires setuptools):


wget https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py
sudo python ez_setup.py
wget https://raw.github.com/pypa/pip/master/contrib/get-pip.py
sudo python get-pip.py
sudo pip install selenium

Install Firefox (which requires the Gimp Tool Kit):

Use your favorite text editor (I used vim) to create a new file named gtf-firefox.sh, and insert these lines:


#!/bin/bash
# GTK+ and Firefox for Amazon Linux
# Written by Joseph Lawson 2012-06-03
# http://joekiller.com
# http://joekiller.com/2012/06/03/install-firefox-on-amazon-linux-x86_64-compiling-gtk/
 
# chmod 755 ./gtk-firefox.sh
# sudo ./gtk-firefox.sh
 
 
TARGET=/usr/local
 
function init()
{
export installroot=$TARGET/src
export workpath=$TARGET
 
yum --assumeyes install make libjpeg-devel libpng-devel \
libtiff-devel gcc libffi-devel gettext-devel libmpc-devel \
libstdc++46-devel xauth gcc-c++ libtool libX11-devel \
libXext-devel libXinerama-devel libXi-devel libxml2-devel \
libXrender-devel libXrandr-devel libXt dbus-glib
mkdir -p $workpath
mkdir -p $installroot
cd $installroot
PKG_CONFIG_PATH="$workpath/lib/pkgconfig"
PATH=$workpath/bin:$PATH
export PKG_CONFIG_PATH PATH
 
bash -c "
cat << EOF > /etc/ld.so.conf.d/firefox.conf
$workpath/lib
$workpath/firefox
EOF
ldconfig
"
}
 
function finish()
{
    cd $workpath
    wget -r --no-parent --reject "index.html*" -nH --cut-dirs=7 http://releases.mozilla.org/pub/mozilla.org/firefox/releases/latest/linux-x86_64/en-US/
    tar xvf firefox*
    cd bin
    ln -s ../firefox/firefox
    ldconfig
}
 
function install()
{
    wget $1
    FILE=`basename $1`
    if [ ${FILE: -3} == ".xz" ]
       then tar xvfJ $FILE
       else tar xvf $FILE
    fi
SHORT=${FILE:0:4}*
    cd $SHORT
    ./configure --prefix=$workpath
    make
    make install
    ldconfig
    cd ..
}
 
init
install ftp://ftp.gnu.org/gnu/autoconf/autoconf-2.69.tar.xz
install http://download.savannah.gnu.org/releases/freetype/freetype-2.4.9.tar.gz
install http://www.freedesktop.org/software/fontconfig/release/fontconfig-2.9.0.tar.gz
install http://ftp.gnome.org/pub/gnome/sources/glib/2.32/glib-2.32.3.tar.xz
install http://cairographics.org/releases/pixman-0.26.0.tar.gz
install http://cairographics.org/releases/cairo-1.12.2.tar.xz
install http://ftp.gnome.org/pub/gnome/sources/pango/1.30/pango-1.30.0.tar.xz
install http://ftp.gnome.org/pub/gnome/sources/atk/2.4/atk-2.4.0.tar.xz
install http://ftp.gnome.org/pub/GNOME/sources/gdk-pixbuf/2.26/gdk-pixbuf-2.26.1.tar.xz
install http://ftp.gnome.org/pub/gnome/sources/gtk+/2.24/gtk+-2.24.10.tar.xz
finish
 
 
# adds the /usr/local/bin to your path by updating your .bashrc file.
cat << EOF >> ~/.bashrc
PATH=/usr/local/bin:\$PATH
export PATH
EOF

chmod 755 ./gtk-firefox.sh
sudo ./gtk-firefox.sh

On my t1.micro Instance, this step took over an hour to run!

Create a "hello world" Selenium automation program in Python:

Use a text editor to create a file named tester.py, and insert these lines:


from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait # available since 2.4.0
from selenium.webdriver.support import expected_conditions as EC # available since 2.26.0

# Create a new instance of the Firefox driver
driver = webdriver.Firefox()

# go to the google home page
driver.get("http://www.google.com")

# find the element that's name attribute is q (the google search box)
inputElement = driver.find_element_by_name("q")

# type in the search
inputElement.send_keys("cheese!")

# submit the form (although google automatically searches now without submitting)
inputElement.submit()

# the page is ajaxy so the title is originally this:
print driver.title

try:
    # we have to wait for the page to refresh, the last thing that seems to be updated is the title
    WebDriverWait(driver, 10).until(EC.title_contains("cheese!"))

    # You should see "cheese! - Google Search"
    print driver.title

finally:
    driver.quit()

Set up the ability to run Firefox headless:

Append this line to .bashrc: export DISPLAY=:10
sudo yum install Xvfb

Step that must be performed after each time you reboot the Instance:

sudo Xvfb :10 -ac

Steps that must be performed each time you connect to the Instance after disconnecting:

Connect to the Instance using an SSH client.
Now open a second SSH session.

If you try to run Firefox in the first SSH session, you’ll get an error, Xlib: extension "RANDR" missing on display ":10".)

firefox
python tester.py

If it's working, you'll see the output Google, followed by cheese! - Google Search.

Tuesday, November 12, 2013

A Python program to FTP a mix of text and binary files

When using FTP to download files from a server to your local computer, you must of course be sensitive to line endings. My server is a Linux box, so line endings are just LF. My laptop runs Windows 7, with CRLF line endings.

I recently wrote a Python program to automate the backup of files from the server to my laptop, and had to take this difference into account.

To make matters worse, some of the text files on the server have CRLF, even though it's a Linux box. The files came from a variety of sources -- uploaded by different developers, created in WordPress admin, etc -- and therefore don't have consistent line endings.

And there's another wrinkle: Some of the files are UTF-8 encoded and contain characters that can't be represented as ASCII.

At first, I thought I would simply use the Python ftplib module's retrbinary() method for binary files and retrlines() for text files. That didn't work out. The biggest obstacle was errors that resulted with the UTF-8 files when trying to read a line of text that contained a non-ASCII character and write it to a file. Two lesser hurdles were (1) figuring out when it was really necessary to add a CR, vs when the CR was already present, and (2) the possibility that the last line of the file doesn't end with a newline of any sort.

Eventually, I hit on a simple approach that almost (but not quite) 100% effective. It's good enough for my requirement, which is just to create a backup that's usable in case of disaster. My program uses retrbinary() for ALL files. If the file extension indicates a binary file, no additional action is taken. If the file extension indicates a text file, we replace every occurrence of the byte 0x0a (LF) with the two bytes 0x0d and 0x0a (CRLF) -- unless the 0x0a was already preceded by 0x0d, in which case we leave it alone.

Why is this only ALMOST 100% effective? Because it doesn't handle the case where a LF is the first byte in a buffer read by retrbinary(), and was preceded by a CR in the PREVIOUS buffer. In this case, my program inserts an extraneous CR. It wouldn't be terribly hard to fix, but not worth the trouble for my purpose.

Here's a simplified excerpt of the Python code:


from ftplib import FTP

# This is the callback function for retrbinary().
def processBytes(buffer):
 # Use the file that was opened in retrieve()
 global f

 previousByte = 0

 # Create an empty byte array.
 buffer2 = bytearray()

 # Loop over all the bytes that were read by retrbinary()
 for b in buffer:

  # Is the byte a LF? Is it NOT preceded by a CR?
  if b == 0x0a and previousByte != 0x0d:

   # Prepend a CR to the LF.
   buffer2.append(0x0d)
  buffer2.append(b)
  previousByte = b

 # Write the modified byte array to the local file.
 f.write(buffer2)

# Retrieve the specified file from the FTP server.
def retrieve(fname):
 global f
 global ftp

 print(fname)

 # Open the local file for writing in binary mode.
 f = open(fname, 'wb')

 # Determine whether the file is binary or text. In this simplified example, a file is
 # binary if and only if the file extension is GIF, JPG or PNG.
 name = fname.lower()
 if name.endswith('.gif') or name.endswith('.jpg') or name.endswith('.png'):

  # Binary file. Don't modify the bytes. Just write them to the local file.
  ftp.retrbinary('RETR ' + fname, f.write)
 else:

  # Text file. Insert CR's as needed before writing to the local file.
  ftp.retrbinary('RETR ' + fname, processBytes)

def main():
 global ftp

 print("STARTING...")

 # Connect to the FTP server. Replace the arguments with your URL, username and password.
 ftp = FTP('ftp.example.com', 'someuser', 'somepassword')

 # List all the files in the current directory, including the file type in the results.
 files = ftp.mlsd('', ['type'])
 for file in files:

  # If it's a file, not a directory, then download it.
  if file[1]['type'] == 'file':
   retrieve(file[0])
 ftp.quit()
 print("DONE!")

if __name__ == '__main__':
 main()

Monday, November 11, 2013

A Python program to back up all repos in a GitHub account

The goal: Automatically find all repos in a specified GitHub account and back them up to folders on the local hard disk.

I've seen instructions for backing up GitHub using Ruby (such as this example) or a shell script (like this one), but I wanted to use Python, which was already installed on my computer.

My environment:

Windows 7
Python 3.3.0

Here's the Python code, gitback.py:


import json
import os
import sys
import urllib.request
import zipfile

def main():
 # Change this to your own GitHub username.
 user = 'xxxxx'

 # The repos will be backed up to the directory you
 # specify as a command-line argument.Example: gitback.py c:\gitbkup
 target = sys.argv[1]

 # Delete the target directory if it exists. Then create it.
 if (target != None and target != '' and os.path.exists(target)):
  os.system('rd /S /Q ' + target)
 os.makedirs(target)

 # Get up to 100 repos. If your account has more than 100 repos,
 # increase the per_page value.
 req = urllib.request.Request('https://api.github.com/users/' + user + '/repos?per_page=100')
 resp = urllib.request.urlopen(req)

 # The result is in byte format, so we need to specify
 # the encoding and decode it.
 content = resp.read().decode('utf-8')

 # The result is in JSON format. Parse the JSON.
 data = json.loads(content)

 # Loop over all the repos, cloning each one.
 for repo in data:
  name = repo['name']
  os.system('git clone git@github.com:' + user + '/' + name + ' ' + target + "\\" + name)

if __name__ == '__main__':
 main()