As referenced in my previous post, I started playing with AWS instances this week for working on projects. Necessary? Maybe/probably not. Could be choosing different data sets or projects to work on. But that’s simply not how I work. I get an idea in my head, then I become stubborn [and incredibly sidetracked] in solving it.
After relaunching my second AWS instance in the same day, I thought this is going to become an incredible pain to relaunch and setup an environment every time. I heard on the Partially Derivative podcast that environment setup can be automated through shell scripts. While it would [probably] be relatively easy to setup environments with a package such as Anaconda, I am very much a “I want to tinker around and set this up myself” kind of person.
With some Google-fu, I created a shell script that installs curl, Git, virtualenv, and a basic stack of Python data science libraries. It also configures a publicly accessible Jupyter notebook (with a password), and some basic Git configuration. There’s still more I can probably flesh out in this, but this serves as a basic framework of automating environment setup.
You can view the full script here: https://github.com/byronhousten/scripts/blob/master/install_python_data_science_stack.sh
This should work on the Ubuntu AWS EC2 instances. Other Linux flavors will probably need some editing to make work.
Step 1: SSH into your AWS instance
ssh -i [EC2 key file name] -L 8888:localhost:8888 ubuntu@[AWS DNS address]
wget https://github.com/byronhousten/scripts/blob/master/install_python_data_science_stack.sh wget https://github.com/byronhousten/scripts/blob/master/passwd.py
chmod a+x install_python_data_science_stack.sh
Step 5: Go to https://[Amazon DNS]:8888/ – this is your Jupyter notebook address, just as you would have localhost:8888 if it was on your local machine.