In this tutorial, we'll walk you through the setup of your deep learning environment using Google Cloud Platform (GCP). Please follow the steps carefully, and don't hesitate to ask for help once you run into problems.
Important note: this setup instruction is part of Assignment 0.
Go to Google Cloud Console (https://cloud.google.com/) and sign in with your LionMail account (yourUNI@columbia.edu).
As a new user, you can get $300 credits for free by clicking 'Try it free'. However, this $300 credits may not allow you to use GPU resources. You can explore the GCP for a while with the free credits.
Redeem your educational coupon. NEW COUPON RULE TBA Using GPU charges approximately $1/hour. So please manage your resources wisely. A good way to do this is to create your local deep learning environment, debug your code there, and finally run it on Google Cloud.
If you have received the coupon code, go to https://console.cloud.google.com/education, select your LionMail account on the top right , and redeem the coupon.
Now, you can visit your Google Cloud dashboard.
Create your project, click 'Select a project' -> 'NEW PROJECT'. For some administrative reasons, we suggest that you use 'ecbm4040-yourUNI' as your project name. Choose the right billing account, the one from the coupon. After few seconds, you should be able to see your newly created project’s homepage.
Create a new GCE virtual machine(VM) instance. We provide you with 2 options:
Make sure to select the project you just created, 'ecbm4040-yourUNI'.
Enable Compute Engine API and edit your GPU quota.
If your quota is 0, select it and click 'Edit Quotas', submit a request. Wait for a moment to let Google process your request. You should be able to receive an e-mail informing you of the success.
Now you can create your instance
Note: check GPU availability on this site, you may need to set different zone.
In boot disk section, select from custom image ‘ecbm4040-imageforstudents-tf18cuda90cudnn705’ under project ‘ecbm4040-ta’.
In this image, we will use tensorflow 1.8.0, CUDA 9.0 and cuDNN v7.0.5.
50GB of disk size should be enough for course use.
Note: you can always create another instance of more computation power for your project, it follows the same procedure.
There are two methods for the connection.
Note: the image we provided has everything installed under the user 'ecbm4040'. If you ssh into another user, some of the components(such as Miniconda) will not work.
Connect online. Click the 'SSH' button next to your running instance. This is very useful for uploading files from local to the instance.
After the connection, Change your Linux user name into 'ecbm4040'.
Connect by Google Cloud SDK. Recommended, and will be used later for Jupyter Notebook.
For this method, you need to install Google Cloud SDK first.
After the installation, the SDK can be called via command lines. Open a
console, and initialize your Gcloud account using gcloud init
command. If this is the first
time you install the SDK, you'll be directed to a website. The
information such as zone or project id should conform with your previous online settings.
After the initialization, type gcloud init
again, and you should see something like the following:
Now you can use ssh tools provided by Google Cloud SDK to connect to your instance with the following command:
gcloud compute ssh ecbm4040@your_instance_name
Later you could type exit
to close the connection.
This step is to let you to check whether those aforementioned tools are available.
CUDA examination. First, check whether GPU device is available:
ecbm4040@your-instance-name: $ nvidia-smi
If GPU is available, then it will show some basic info of your GPU device.
Second, verify CUDA installation.
ecbm4040@your-instance-name: $ nvcc -V
If it is correctly installed, then this command will return its version.
Miniconda is a lite version of Anaconda, which helps you manage your different python environments. In our source image, a virtual python environment 'dlenv' has been set up. You just need to use following command to activate it. Moreover, for your future assignment, it is also recommended to use this environment.
ecbm4040@your-instance-name: $ source activate dlenv
After the activation, you can review all installed packages by
(dlenv)ecbm4040@your-instance-name: $ conda list
Tensorflow is an open-source library for deep
learning provided by Google. To check its installation, type python
, and try to run the
following code:
(Note: don't mistake Python prompt >>
for Linux command prompt
$
).
>> import tensorflow as tf
>> # Creates a graph.
>> a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
>> b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
>> c = tf.matmul(a, b)
>> # Creates a session with log_device_placement set to True.
>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
>> # Runs the op.
>> print(sess.run(c))
If tensorflow is correctly installed and using GPU in backend, then you will see something like this,
Found device 0 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:04.0
Total memory: 11.17GiB
Free memory: 11.11GiB
2017-05-19 01:52:26.255195: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0
2017-05-19 01:52:26.255216: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0: Y
2017-05-19 01:52:26.255229: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0)
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K80, pci bus id: 0000:00:04.0
2017-05-19 01:52:26.320382: I tensorflow/core/common_runtime/direct_session.cc:257] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K80, pci bus id: 0000:00:04.0
MatMul: (MatMul): /job:localhost/replica:0/task:0/gpu:0
2017-05-19 01:52:40.431855: I tensorflow/core/common_runtime/simple_placer.cc:841] MatMul: (MatMul)/job:localhost/replica:0/task:0/gpu:0
b: (Const): /job:localhost/replica:0/task:0/gpu:0
2017-05-19 01:52:40.431913: I tensorflow/core/common_runtime/simple_placer.cc:841] b: (Const)/job:localhost/replica:0/task:0/gpu:0
a: (Const): /job:localhost/replica:0/task:0/gpu:0
2017-05-19 01:52:40.431929: I tensorflow/core/common_runtime/simple_placer.cc:841] a: (Const)/job:localhost/replica:0/task:0/gpu:0
[[ 22. 28.]
[ 49. 64.]]
Type exit()
to quit Python.
Jupyter is a web-based Python programming environment, allowing you to edit your code, display plots and show animations. You can even finish your whole report in Jupyter notebook, since it supports Latex grammar. For the future assignments, we also require you to use Jupyter to show your work. Now, we get started on how to open your Jupyter notebook on your Google Cloud VM instance.
Jupyter has been installed in the 'dlenv' virtual environment.
Configure your Jupyter notebook
First, generate a new configuration file.
(dlenv)ecbm4040@your-instance-name: $ jupyter notebook --generate-config
Open this config file.
(dlenv)ecbm4040@your-instance-name: $ vi ~/.jupyter/jupyter_notebook_config.py
Add the following lines into the file. If you are new to Linux and don't know how to use vi editor, see this quick tutorial: https://www.cs.colostate.edu/helpdocs/vi.html
c = get_config()
c.NotebookApp.ip='*'
c.NotebookApp.open_browser = False
c.NotebookApp.port =9999 # or other port number
Generate your jupyter login password, press Enter for no password.
(dlenv)ecbm4040@your-instance-name: $ jupyter notebook password
Enter password:
Verify password:
[NotebookPasswordApp] Wrote hashed password to /Users/you/.jupyter/jupyter_notebook_config.json
Open Jupyter notebook.
(dlenv)ecbm4040@your-instance-name: $ jupyter notebook
Now, your Jupyter notebook server is running remotely. You need to connect your local computer to the server in order to view your Jupyter notebooks with your browser.
Open a console and use SSH to connect to jupyter notebook. Type in following code to set up a connection with your remote instance. Note that in “-L 9999:localhost:9999”, the first “9999” is your local port and you can set another port number if you want. The second “9999” is the remote port number and it should be the same as the port that jupyter notebook server is running.
gcloud compute ssh --ssh-flag="-L 9999:localhost:9999" --zone "us-east1-d" "ecbm4040@your-instance-name"
Open your browser(Chrome, IE etc.)
Go to http://localhost:9999
or https://localhost:9999
and you will be directed to your remote Jupyter server. Type in your jupyter password that you created
before, and now you can enter your home directory.
An optional way to connect to Jupyter Notebook using SSH in GCP console without Google Cloud SDK:
http://yourExtenalIP:9999
.
You should be directed to jupyter notebook homepage.
Now, you have finished prerequisite of Assignment 0. Please proceed to the rest of it.
Optional: If you like to play with jupyter notebook and want to explore more on it, here are some interesting tutorial links.
http://jakevdp.github.io/blog/2017/03/03/reproducible-data-analysis-in-jupyter/
http://ipywidgets.readthedocs.io/en/latest/examples/Lorenz%20Differential%20Equations.html
ECBM E4040 Neural Networks and Deep Learning, 2017.
Columbia University