If you are working on a big data machine learning project, most likely you require the services of Amazon Web Services (AWS), Google Cloud Services, or Microsoft Azure. My first exposure to cloud based computing was with AWS, so that is the platform I am most familiar with. I wish the process of setting up a virtual environment with the correct operating system and packages was as easy as clicking on an option, but for security reasons and an offering of a wide availability of options, that is not the case.
Here are the steps:
- Select an EC2 AMI & Instance
- Change your .pem key permissions
- Access your EC2 instance via SSH
- Manage your data via SCP or Git
Select An EC2 AMI & Instance
Click this link. Sign into your AWS account, or create one if you haven’t yet. Where you will be spending most of your time managing your AWS instances is the AWS Management Console.
Select the EC2 services.
Click Launch Instance
Now, you need to select your preferred AMI (Amazon Machine Image). AMI’s vary by their operating system and packages preinstalled. Most AMI’s offer variations of Linux or Ubuntu from my experience. I recommend this Ubuntu Deep Learning AMI for starters.
Next, select your preferred instance. Instances vary by storage capacity, along with CPU and GPU capabilities. Amazon details their options here.
Note: If you select a relatively powerful instance, then AWS requires you to send in an explanation of your usage intentions. They usually reply within a couple days.
Change Your .pem Key Permissions
After selecting your instance, you are asked to create a new or choose an existing key pair as shown below.
Your key will be downloaded to your computer as a .pem file. It is best practice to carefully store and organize your keys. If you lose them, then there is absolutely no hope in your ability to access your EC2 instance again. Please read the prior sentence 10 times. Launch your instance.
I usually store my key in the hidden folder labeled .ssh on MacOS. Assuming your key was downloaded to the downloads folder, open a Terminal window and type the following commands with your appropriate key name:
mv ~/downloads/yourkey.pem ~/.ssh
Your key by default comes with permission restrictions that will prevent you from either accessing your instance or using other commands on it. To avoid that issue, and assuming you are in the directory where your key is located, type the following:
chmod 700 yourkey.pem
Access Your EC2 Instance via SSH
EC2 instance, check; .pem key, check. Before proceeding, you need to locate your public DNS highlighted in green. Click on your newly created instance and a description box should appear like the own below.
You use the ssh (secure shell) command to access your instance. Open a Terminal window and type the following, obviously substituting your own key name and public DNS.
ssh -i ~/.ssh/yourkey.pem email@example.com
The Terminal window should show a prompt welcoming you to the instance’s virtual environment. From this window, you can transfer your files and work on your awesome projects!
Note: Depending on your AMI’s operating system, you will type “ec2-user” (usually for base Linux), “ubuntu”, or your appropriate OS title before the “@your_public_DNS” in the ssh command above.
BIG NOTE: Your public DNS may change depending on your location. So if you have trouble accessing your instance, then more than likely your public DNS has changed, in which case, reconfirm your correct public DNS.
Manage Your Data Via SCP or Git
To transfer files between your computer and instance, you use the scp (secure protocol) command. Open a new Terminal window that its current directory is located on your computer and not in the instance environment, and type the following, obviously substituting your own key name and public DNS.
Upload a File From Computer to Instance
scp ~/.ssh/yourkey.pem ~/path/to/file firstname.lastname@example.org:path/to/file
Download a File From Instance to Computer
scp ~/.ssh/yourkey.pem email@example.com:path/to/file ~/path/on/local/machine
Amazingly, in the Terminal window that has accessed your EC2 instance, you can use Git commands to pull and push files between your instance environment and your Github, as you would normally on your local machine. All data on the instance is stored on AWS’s servers obviously.
REALLY IMPORTANT NOTE:
AWS charges you a timely rate relative to the instance specifications you chose. Whether you are doing light or heavy work, if your instance is turned on, AWS will charge you.
I, and a few of my colleagues, made the rookie mistake out of ignorance and negligence of not understanding the protocol described above, resulting in our accumulating bills of several hundred dollars. Luckily Amazon was empathetic and rescinded the charge. However, you should not let this situation happen.
Hence, whenever you are finished with a working session, 1 minute or 1 hour, always stop your instance. Then simply start it when you resume work. Terminating will delete your entire instance, so you must create a new one from scratch.
And that’s a wrap! I had a harrowing experience with setting up AWS when I started out, but hopefully this guide can be your little cheat sheet.
Sometimes, the material that we learn is not what is difficult, but it is our learning methods and resources that contributes to the challenge.