In this tutorial, we'll walk you through the setup of your deep learning environment using Google Cloud Platform (GCP). Please follow the steps carefully, and don't hesitate to ask for help once you run into problems.

Important note: this setup instruction is part of Assignment 0.

Contents


Step 1: Log into Google Cloud Account

  1. Log into your LionMail account

  2. Visit https://cloud.google.com/ and sign in using your cloud.cs account. You should see your account info on the top right of the page.

  3. As a new user, you can get $300 credits for free by clicking 'Try it free'. However, this $300 credits may not allow you to use GPU resources. But it depends and sometimes it will allow you to get GPU resources with free credits. You can explore the GCP for a while with the free credits.



  4. Redeem your educational coupon. During the course pre-registration, each student one will get $20 credits (the coupon code will be sent to each one's Lionmail). Later, every enrolled student will get additional 150 dollar credits. Using GPU charges approximately $1/hour. So please manage your resources wisely. A good way to do this is to create your local deep learning environment, debug your code there, and finally run it on Googld Cloud.

    If you have received the coupon code, go to https://console.cloud.google.com/education, select your cloud.cs account on the top right , and redeem the coupon.

    Now, you can visit your Googld Cloud dashboard.


Step 2: Create your project and Google Compute Engine(GCE) instance

  1. Create your project like the following. For some administrative reasons, we suggest that you use 'ecbm4040-your_uni' as your project name. Choose the right billing account.

  2. Create a new GCE virtual machine(VM) instance. We provide you with 2 options:

    • To create a VM instance based on the image provided by ECBM E4040 TAs, which includes everything you need (CUDA, Minconda, Jupyter, Tensorflow, etc.), proceed to the following steps.
    • If you're interested in exploring GCP and would like to configure your own VM instance from zero, see gcp-from-zero.

    First, make sure to select the project you just created.


    Check your available GPU quota. From the left navigation bar, select 'IAM & Admin' -> 'Quotas'.


    In the quotas page, choose Quota type as 'All quotas', then locate Google Compute Engine API - NVidia K80 GPUs in us-east1 region. If your quota is 0, select it and click 'Edit Quotas'. Wait for a moment to let Google process your request. You should be able to receive an e-mail informing you of the success.

    Note: If you don't see the GPU option in the quotas page, which is a known issue by Google(currently not fixed), try visiting GCP in your browser's incognito mode, and clear your browser's cookies and cache. Switching to another browser(Firefox, Edge, Safari, etc.) migh also help.


    Now, go to 'Compute Engine' menu and choose 'VM Instances'.

    Configure your instance as follows, and click 'Create':

    • Set zone as us-east1-d.
    • Choose the number of CPUs and memory size as you wish.
    • Select 1 Tesla K80 GPU.
    • Select the image "ecbm4040-students-image-py3-tf13" under project "ecbm4040-tas-dl" with project id "dltest-167717" as the boot disk.
    • Check 'Allow HTTP Traffic' and 'Allow HTTPs Traffic'.



    Wait for several minutes. The newly created VM instance will be running after the creation. You get charged only when the instance is running, so remember to stop the instance every time you finish your work.


Step 3: Connect to Your Instance

There are two methods for the connection.

Note: the image we provided has everything installed under the user 'ecbm4040'. If you ssh into another user, some of the components(such as Miniconda) will not work.

Step 4: Tools

This step is to let you to check whether those aforementioned tools are available.

  1. CUDA examination. First, check whether GPU device is available:

    ecbm4040@your-instance-name: $ nvidia-smi

    If GPU is available, then it will show some basic info of your GPU device.

    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 375.26                 Driver Version: 375.26                    |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  Tesla K80           Off  | 0000:00:04.0     Off |                    0 |
    | N/A   49C    P0    71W / 149W |      0MiB / 11439MiB |    100%      Default |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                       GPU Memory |
    |  GPU       PID  Type  Process name                               Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    +-----------------------------------------------------------------------------+
                

    Second, check CUDA toolkits.

    ecbm4040@your-instance-name: $ nvcc -V

    If it is correctly installed, then this command will return its version.

    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2016 NVIDIA Corporation
    Built on Tue_Jan_10_13:22:03_CST_2017
    Cuda compilation tools, release 8.0, V8.0.61
  2. Miniconda. Miniconda is a lite version of Anaconda, which helps you manage your different python environments. In our source image, a virtual python environment 'dlenv' has been set up. You just need to use following command to activate it. Moreover, for your future assignment, it is also recommended to use this environment.

    ecbm4040@your-instance-name: $ source activate dlenv

    After the activation, you can review all installed packages by

    (dlenv)ecbm4040@your-instance-name: $ conda list
  3. Tensorflow is an open-source library for deep learning provided by Google. To check its installation, type python, and try to run the following code:

    (Note: don't mistake Python prompt >> for Linux command prompt $).

    >> import tensorflow as tf
    >> # Creates a graph.
    >> a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    >> b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    >> c = tf.matmul(a, b)
    >> # Creates a session with log_device_placement set to True.
    >> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
    >> # Runs the op.
    >> print(sess.run(c)) 

    If tensorflow is correctly installed and using GPU in backend, then you will see,

    Found device 0 with properties:
    name: Tesla K80
    major: 3 minor: 7 memoryClockRate (GHz) 0.8235
    pciBusID 0000:00:04.0
    Total memory: 11.17GiB
    Free memory: 11.11GiB
    2017-05-19 01:52:26.255195: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 
    2017-05-19 01:52:26.255216: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0:   Y 
    2017-05-19 01:52:26.255229: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0)
    Device mapping:
    /job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K80, pci bus id: 0000:00:04.0
    2017-05-19 01:52:26.320382: I tensorflow/core/common_runtime/direct_session.cc:257] Device mapping:
    /job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K80, pci bus id: 0000:00:04.0
    
    MatMul: (MatMul): /job:localhost/replica:0/task:0/gpu:0
    2017-05-19 01:52:40.431855: I tensorflow/core/common_runtime/simple_placer.cc:841] MatMul: (MatMul)/job:localhost/replica:0/task:0/gpu:0
    b: (Const): /job:localhost/replica:0/task:0/gpu:0
    2017-05-19 01:52:40.431913: I tensorflow/core/common_runtime/simple_placer.cc:841] b: (Const)/job:localhost/replica:0/task:0/gpu:0
    a: (Const): /job:localhost/replica:0/task:0/gpu:0
    2017-05-19 01:52:40.431929: I tensorflow/core/common_runtime/simple_placer.cc:841] a: (Const)/job:localhost/replica:0/task:0/gpu:0
    [[ 22.  28.]
     [ 49.  64.]]

    Type exit() to quit Python.


Step 5: Jupyter Notebook

Jupyter is a web-based Python programming environment, allowing you to edit your code, display plots and show animations. You can even finish your whole report in Jupyter notebook, since it supports Latex grammar. For the future assignments, we also require you to use Jupyter to show your work. Now, we get started on how to open your Jupyter notebook on your Google Cloud VM instance.

Jupyter has been installed in the 'dlenv' virtual environment.

  1. Configure your Jupyter notebook

    First, generate a new configuration file.

    (dlenv)ecbm4040@your-instance-name: $ jupyter notebook --generate-config

    Open this config file.

    (dlenv)ecbm4040@your-instance-name: $ vi ~/.jupyter/jupyter_notebook_config.py

    Add the following lines into the file. If you are new to Linux and don't know how to use vi editor, see this quick tutorial: https://www.cs.colostate.edu/helpdocs/vi.html

    c = get_config()
    c.NotebookApp.ip='*'
    c.NotebookApp.open_browser = False
    c.NotebookApp.port =9999      # or other port number
  2. Generate your jupyter login password.

    (dlenv)ecbm4040@your-instance-name: $ jupyter notebook password
    Enter password:  
    Verify password: 
    [NotebookPasswordApp] Wrote hashed password to /Users/you/.jupyter/jupyter_notebook_config.json
  3. Open Jupyter notebook.

    (dlenv)ecbm4040@your-instance-name: $ jupyter notebook

    Now, your Jupyter notebook server is running remotely. You need to connect your local computer to the server in order to view your Jupyter notebooks with your browser.

  4. Open a console and use SSH to connect to jupyter notebook. Type in following code to set up a connection with your remote instance. Note that in “-L 9999:localhost:9999”, the first “9999” is your local port and you can set another port number if you want. The second “9999” is the remote port number and it should be the same as the port that jupyter notebook server is running.

    gcloud compute ssh --ssh-flag="-L 9999:localhost:9999"  --zone "us-east1-d" "ecbm4040@your-instance-name"
  5. Open your browser(Chrome, IE etc.) and type in http://localhost:9999 or https://localhost:9999 and you will be directed to your remote Jupyter server. Type in your jupyter password that you created before, and now you can enter your home directory.



Now, you have finished most of Assignment 0. Please proceed to the rest of it.

Optional: If you like to play with jupyter notebook and want to explore more on it, here are some interesting tutorial links.

http://jakevdp.github.io/blog/2017/03/03/reproducible-data-analysis-in-jupyter/

http://ipywidgets.readthedocs.io/en/latest/examples/Lorenz%20Differential%20Equations.html


ECBM E4040 Neural Networks and Deep Learning, 2017.

Columbia University