AWS Learner Lab Tutorial: A Comprehensive Guide
This article provides a comprehensive guide to using AWS Academy Learner Lab, covering everything from initial setup and EC2 instance creation to running Jupyter Notebooks and practicing basic shell scripting. This tutorial is designed to help students in courses like DS5110/CS5501 effectively utilize AWS resources for their assignments.
Introduction to AWS Academy Learner Lab
AWS Academy provides access to a wide range of computing, storage, and network resources on Amazon Web Services (AWS). The Learner Lab environment offers a pre-configured, sandboxed environment for students to learn and experiment with AWS services without the risk of incurring unexpected costs.
Initial Setup and Registration
Before using AWS Academy Learner Lab, registration through AWS Academy Canvas is required. Note that this is different from UVA Canvas. After logging in to AWS Academy Canvas, navigate to the AWS Academy Learner Lab under the Dashboard. Accept the invitation to join the AWS Academy Learner Lab. Click on AWS Academy Learner Lab at the Dashboard and then click on Modules to view the available modules. It's highly recommended to go through the AWS Academy Learner Lab Student Guide to familiarize yourself with the environment, including how to start a lab and track your credit usage.
Launching the Learner Lab
To start using AWS resources, click on Launch AWS Academy Learner Lab under Module. This will lead to the Terms and Conditions, which you need to accept to enter the lab session. The lab session provides a web terminal interface with limited functionality. Start the lab by clicking on Start Lab at the top of the web terminal interface. This process may take a few minutes, especially the first time, as AWS Academy creates a temporary AWS account for you.
Monitoring Your Budget
The Learner Lab provides a cloud credit of $100. Monitor your lab budget in the lab interface. The remaining budget information is displayed at the top of the screen. This data comes from AWS Budgets, which typically updates every 8 to 12 hours. If you exceed your lab budget, your lab account will be disabled, and all progress and resources will be lost. It is highly recommended to push your progress to online repositories such as GitHub frequently to avoid data loss. When using GitHub or any other online repository service, create a private repository and DO NOT share any of your code with other students or the internet.
Read also: More on Temporary Learner Insurance
Accessing the AWS Console
Once the lab has started (indicated by a green signal light), click on the green signal light to access your AWS Console Home web page. A started lab session has a session duration of up to four hours. When the lab session timer runs to 0:00, the session will automatically end, but any data and resources created in the AWS account will be retained. Running EC2 instances will be stopped and then automatically restarted the next time you start a session.
Part 1: Creating and Accessing EC2 Instances
At the AWS Console Home page, click on Services at the top left corner and click on Compute to view all the available computing services. Click on EC2 to start creating new EC2 VM (virtual machine) instances.
Step 1: Create a Name and Choose an OS Image
Under EC2, click on Launch instances to start creating a new VM. Type a name to label your VM under Name and tags. For example, name your first instance vm0. It is recommended to use Ubuntu Server 22.04 LTS as the OS image and 64-bit (x86) as the architecture. Specify the number of instances to launch at this time. To start off, choose 1. Later for Assignment 1 choose 2 and for Assignment 2 choose 5.
Step 2: Choose an EC2 Instance Type
Next, choose an EC2 instance type. To test, you can always create one or multiple t2.micro or t1.micro, both of which are free tier eligible. However, there are a limited range of EC2 instances that you can choose from. It is recommended to use the t3.large instance type that comes with 2 vCPU cores and 8 GB of memory.
Step 3: Choose an SSH Key Pair
When creating new VMs, it is important to choose a key pair for SSH login. Under Key pair (login), click on the drop-down menu and select vockey (type rsa).
Read also: Temporary Learner Insurance
Step 4: Check HTTP/HTTPS Options
Under Network settings, you may check Allow HTTPs traffic from the internet and Allow HTTP traffic from the internet so that you can access the web servers hosted on your EC2 VM. (Apache Spark comes with a web-based dashboard, and would require HTTP/HTTPS traffic.)
Step 5: Configure Storage
You may also increase the storage capacity of the Root volume under Configure storage. By default, you will be allocated a small 8-GB root disk. It is recommended to increase it to 100 GB so that you have sufficient disk storage capacity for dependency installation and storing datasets. Optionally, you can also add a new EBS volume of 100GB just in case the 100GB root volume runs out of capacity.
Step 6: Configure the Subnet Availability Zone
It is recommended to launch all your EC2 instances in the same availability zone. Any US east zones should be fine. This can be configured in Network settings. Once choosing one, always stick with it when creating new EC2 instances. This is to guarantee the best network performance among your EC2 instances. For example, you may choose to use a subnet within an availability zone of us-east-1a and stick with it.
Step 7: Launch the EC2 Instance
After finalizing the EC2 VM configuration, click on the Launch instance button on the right side to launch the configured VM. It may take a few minutes to start the VM.
Step 8: Download the SSH Private Key
Now go back to the Learner Lab web page. Click on the AWS Details tab to download the SSH key. Under SSH key, click on Download PEM to download the SSH key to your local computer. It comes with a default name of labsuser.pem. To view the connection instruction, click on the instance ID from your AWS Console, then click on Connect at the top right corner of the page to view the public DNS address of your EC2 instance. So far you should be able to SSH login to a remote Linux VM computer that you have just created on AWS.
Read also: Comprehensive Guide: Motorcycle Permit
Troubleshooting SSH Issues
If you have trouble signing up for AWS Academy due to the term item not showing up, it’s because of a browser compatibility issue. Try updating your Firefox and then try logging in using Firefox. If you work with WSL, make sure to store your SSH key file under your Ubuntu home directory. Windows FS does not work well with chmod 400. How to locate your /home/user directory is by typing cd and that will automatically bring you to your home dir. Accessing the SSH key file stored under a cloud drive (OneDrive, Dropbox, or Google Drive) might have permission issue. If that happens, move vockey.pem to a local directory not under any cloud-managed folder and that should fix the permission issue. For Windows users, it is recommended to install WSL (the Windows Subsystem for Linux). You can find the installation documentation here. With WSL, you can use the familiar Linux commands for SSH, file operations, and all other kinds of shell-related tasks. To copy your submission file from remote EC2 to local, you can use scp (SSH copy). Note that calling scp from EC2 to copy file out will not work as your computer is behind NAT so it is almost not possible to be directly addressable. You need to run scp from your computer to copy file in from a remote EC2, which is addressable.
Part 2: Setting Up an AWS-EC2-Hosted Jupyter Notebook Service
There are two ways of writing/editing PySpark programs on a remote cloud server:
- Use a shell text editor of your choice (e.g., VIM, nano, Emacs, etc.).
- Launch a Jupyter Notebook and directly write PySpark code there.
This tutorial is provided in case you need to use Jupyter Notebook for Assignment 1, though you can complete the assignment without it. The third command above, which pip3, should output something like /usr/bin/pip3 if it’s successfully installed.
Step 1: Install Jupyter Notebook
To install Jupyter Notebook:
sudo apt updatesudo apt install -y python3-pippip3 install jupyterpip3 by default will install everything in /home/ubuntu/.local/bin on your EC2 instance. This path, however, is not included by the environment variable $PATH, so your shell will not be able to locate the installed programs through pip3.
Step 2: Launch Jupyter Notebook
To launch Jupyter Notebook, run the following commands:
export PATH=$PATH:/home/ubuntu/.local/binjupyter notebook --no-browser --port=8888The -L option is to forward any connections to the given TCP port 8000 on your local client host to the given remote host (the specified EC2 instance of jupyter notebook. On your local client machine, open a web browser and go to localhost:8000 and copy paste the token when it prompts. Now you can start writing Python code using your Notebook GUI from your browser.
Step 3: Work on Python Programming
Create a new Notebook and select the default Python 3 (ipykernel) Notebook kernel. Than you are good to go.
Part 3: Shell Scripting
In this part, you will practice some basic shell command / shell script skill by completing the following tasks.
Task 1: Check System and Environment Information
Create some system and environment information using basic shell commands. You should use man the Linux manual command to print what a command does. For example, if you type man lscpu it will tell you what lscpu is.
Create the following files:
os.txt: Containing the output ofcat /etc/os-releasecpu.txt: Containing the output oflscpumem.txt: Containing the output offree -mpip3.txt: Containing the output ofpip3 --versionjupyter.txt: Containing the output ofjupyter --version
Task 2: Analyze StackOverflow Data
Download a zip file containing StackOverflow post data and extract it. Then, count the number of lines containing the text "python" in the extracted files.
sudo apt install -y gdowngdown --id 1K_49asiVhp6jE6a_Kj9GKe3EKpWpzhKunzip stackoverflow.zipgrep -c "python" posts_1.csvgrep -c "python" posts_2.csvIn the above commands, the first command is to install the downloading tool gdown so that you can use it to download large files from Google Drive. Try running some shell commands to extract the contents and print how many lines contain the text "python". Now, combine these commands in a count_python.sh script file; the script should have a shebang line so that the following runs with bash:
#!/bin/bashunzip stackoverflow.zipecho "posts_1.csv: $(grep -c "python" posts_1.csv)"echo "posts_2.csv: $(grep -c "python" posts_2.csv)"HINTS: You can use unzip to extract the CSV files from the zip archive. If unzip is not installed by default, you should run sudo apt install -y unzip to install it on your EC2 instance.
Deliverables
You should submit a tar.gz file to Canvas, which follows the naming convention of LastName_FirstName_ComputingID_A0.tar.gz. The submitted file should contain the following: os.txt, cpu.txt, mem.txt, pip3.txt, jupyter.txt, count_python.sh.
HINTS: You can use the following command to create a tar.gz file: tar -czvf [submission_file_name] os.txt cpu.txt mem.txt pip3.txt jupyter.txt count_python.sh.
Autograder
You can use supplied autograding test suite to test your work and environment setup. Use wget to download these two files from the links above. Run python3 autograder.py to test your work. The test result will be written to a test.json file in your working directory. This will probably be your grade, but autograders are imperfect, so we reserve the right to deduct further points.
Creating an AWS Cloud Lab in Vocareum
Vocareum is a platform that simplifies the process of creating and managing cloud-based labs, which can be particularly useful for educational purposes. Here's a guide to creating an AWS Cloud Lab within Vocareum:
Creating an Assignment
- From your course page, select the 'Edit Assignments' tab and then navigate to and select the 'New Assignment' tab.
- Enter the title of your assignment and press 'Save and Continue'.
Enabling Cloud Resources
- Select 'Part 1' to access the Part settings of your Assignment.
- From the Part settings, navigate to Resources.
- Select your Cloud Vendor from the dropdown (In this case, AWS).
- When you have made your selections, click 'Save Part' before moving on.
NOTE: Always select 'Save Part' before moving on to a different section of settings.
Setting a Budget
- In the same Part settings, navigate to Budgets.
- In this section, you can determine multiple factors relating to budgeting cloud resources for learners.
- Budget can be set based on allotted time and spend, per month or in total.
- Scroll down the Budget settings further to specify resource management within the lab.
- Define session length by time (including extensions).
- Set 'End Lab' behavior to either terminate resources completely or put them in a stopped state so the student can return to them how they were left.
- You can also set whether to terminate resources if a student needs to reset Vocareum lab back to its original state.
- When you have made your selections, click 'Save Part' before moving on.
Interface Options
- In the same Part settings, navigate to 'Interface'.
- Specific to cloud labs, you can set helpful features for students in the interface, from necessary controls such as start, stop, and reset to helpful information like a timer, active budget, and progress of the lab in regards to cloud resources.
- When you have made your selections, click 'Save Part' before moving on.
Lab Policies and Templates
- From the Assignment settings of your cloud lab, select Configure Workspace to navigate to the teacher authoring environment.
- Within the Vocareum Notebook, select Files to open the File Browser.
- From the file browser, open the
/vocdirectory. - Navigate to
voc/private, and right-click to create a new file or upload your lab policy. The file must be labeledlab.policy. - If you are not uploading an existing policy, create your file and select
lab.policyto open the file editor. You can use the below example as a jumping off point by copying it intolab.policy. This example only permits the launch of smaller ec2 instances:
{ "policies": [ { "Action": "launch", "Effect": "Allow", "Resource": "ec2.instance", "Condition": { "StringEquals": { "ec2:InstanceType": [ "t2.micro", "t3.nano" ] } } } ]}The same process can be followed to add an AWS Cloud Formation template to your lab to pre-configure AWS resources. Navigate to voc/private and right-click to create a new file or upload your Cloud Formation template. In this case, the file must be labeled lab.template.
Adding Lab Instructions (README.md)
- From the file browser, navigate to
voc/docs. - Right-click to upload or create a new file. The file must be named
README.md. - Select the file to open the file editor.
Accessing the AWS Management Console
A button to launch the web AWS Management Console. Click the AWS button and a new tab will open. At this point, you have left the student portal and are now in the full AWS management console. While some features and services may be unavailable to you, the overall interface is the same as a full commercial AWS account paid with real dollars, and any instructions or tutorials you find online will be usable here. In the top bar, you can see your user credential is voclabs/userXXXXXX, indicating the linkage between the full AWS system and your Vocareum-created credential. For this lab, you should create an Amazon Linux 2 virtual machine. Let’s do this the quick and dirty way via the GUI to get started. Search for "Amazon Linux 2 AMI (HVM), SSD Volume Type" - it's the first option in the list. For THIS LAB only, leave “Delete on Termination” checked. What that means is, when you terminate your EC2 virtual machine, the corresponding EBS (Elastic Block Store) disk image containing your operating system, all your data, and anything else you did on that virtual machine will be DELETED. For today, that’s fine, we’re just getting our feet wet. Create a new tag. Leave the default options here. Don’t be alarmed when it says your instance configuration is not eligible for the free usage tier - You have a dedicated pool of credits for the classroom. You will be prompted to either select an existing keypair or create a new one. What is a key pair? A key pair consists of a public key that AWS stores, and a private key file that you store. Together, they allow you to connect to your instance securely. For Windows AMIs, the private key file is required to obtain the password used to log into your instance. For Linux AMIs, the private key file allows you to securely SSH into your instance. Since this is your first instance, you don’t have a keypair yet. Select “Download Key Pair” to download the file “COMP-175-Lab-1.pem" - DO NOT LOSE THIS FILE! YOU CANNOT OBTAIN THIS PRIVATE KEY AGAIN. While you're viewing your instance, take a moment to confirm that you set the tag correctly. Select your instance, and then click on the "Tag" tab below. An instance can have many tags, and each tag has a "key" and a "value" associated with it. The "key" for this tag should be Name (a reserved word for AWS), and the "value" for the tag should be the human-readable string that you want to appear. Notice how AWS then places your string in the table of all instances under the Name column. Right-click on the instance you just created. (You should be in the AWS Management Console, in the EC2 service, in the Instances panel. Choose “Connect” and then “SSH client” as the connection method. Run your native SSH program and paste in the command that AWS suggested (with the private key, username, and hostname filled in already). The free "home edition" is sufficient. Either the installer edition or portable edition is fine. Using the GUI, specify the hostname, username, and under the "Advanced SSH Settings" tab, the location of the private key file you wish to use in connecting to your VM. What is my username? What is my private (AWS internal) IP address? What is the hostname of my VM? How long has the system been running? Finally, TERMINATE YOUR EC2 VIRTUAL MACHINE to avoid paying AWS money for a virtual machine which you no longer need. Confirm that you do indeed want to terminate your instance and that you’re aware that the default action is for EBS volumes to be deleted when doing so. You should be able to watch the instance shut down in the Instances panel, and confirm that the EBS image has been deleted in the Volumes panel. Go back to the Vocareum dashboard and click the "End Lab" button. There shouldn't be anything running in your lab now, and it'll end after 4 hours automatically, but it's a good practice to stop the lab when you're finished working. If you “stop” a virtual machine, you can start it again later via a single click in the GUI. Ephemeral (local) storage will be deleted, but storage on EBS (Elastic Block Storage) will persist. If you “terminate” a virtual machine, you can no longer start it again, but will have to create a new VM instead. Ephemeral (local) storage is lost, and depending on the setting you chose when launching the VM, the EBS disk image may also be automatically deleted. The figure below illustrates the different states your EC2 instance will be in over its lifecycle.
tags: #aws #learner #lab #tutorial

