-
Notifications
You must be signed in to change notification settings - Fork 1
/
jh-intro_slides.html
218 lines (218 loc) · 12.8 KB
/
jh-intro_slides.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta name="generator" content="pandoc" />
<meta name="author" content="March 7, 2017" />
<title>Savio Jupyterhub training: JupyterHub on the Berkeley Savio high-performance computing cluster</title>
<style type="text/css">code{white-space: pre;}</style>
<link rel="stylesheet" type="text/css" media="screen, projection, print"
href="http://www.w3.org/Talks/Tools/Slidy2/styles/slidy.css" />
<script src="http://www.w3.org/Talks/Tools/Slidy2/scripts/slidy.js"
charset="utf-8" type="text/javascript"></script>
</head>
<body>
<div class="slide titlepage">
<h1 class="title">Savio Jupyterhub training: JupyterHub on the Berkeley Savio high-performance computing cluster</h1>
<p class="author">
March 7, 2017
</p>
<p class="date">Chris Paciorek and Maurice Manning</p>
</div>
<div id="introduction" class="slide section level1">
<h1>Introduction</h1>
<p>We'll do this mostly as a demonstration. I encourage you to login to your account and try out the various examples yourself as we go through them.</p>
<p>The first half of this material is based on the Savio JupyterHub documention we have prepared and continue to prepare, available at <a href="http://research-it.berkeley.edu/services/high-performance-computing/using-juypterhub-savio">http://research-it.berkeley.edu/services/high-performance-computing/using-juypterhub-savio</a>.</p>
<p>The materials for this tutorial are available using git at <a href="https://github.com/ucberkeley/savio-training-jupyterhub-2017">https://github.com/ucberkeley/savio-training-jupyterhub-2017</a> or simply as a <a href="https://github.com/ucberkeley/savio-training-jupyterhub-2017/archive/master.zip">zip file</a>.</p>
<p>These <em>jh-intro.html</em> and <em>jh-intro_slides.html</em> files were created from <em>jh-intro.md</em> by running <code>make all</code> (see <em>Makefile</em> for details on how that creates the html files).</p>
<p>You can find the material from previous trainings at:</p>
<ul>
<li>Introduction to Savio (August 2016): <a href="https://github.com/ucberkeley/savio-training-intro-2017/archive/master.zip">zip file</a></li>
<li>Parallelization on Savio (September 2016): <a href="https://github.com/ucberkeley/savio-training-parallel-2017/archive/master.zip">zip file</a></li>
</ul>
</div>
<div id="outline" class="slide section level1">
<h1>Outline</h1>
<p>This training session will cover the following topics:</p>
<ul>
<li>Review of Jupyter/IPython notebook basics</li>
<li>Introduction to Jupyterhub on Savio
<ul>
<li>Logging on and getting started</li>
<li>Managing notebooks</li>
</ul></li>
<li>Parallel processing using IPython Notebooks on Savio
<ul>
<li>Setting up a basic IPython cluster on one node</li>
<li>Customizing your IPython cluster setup</li>
<li>Setting up an IPython cluster to run on multiple nodes</li>
</ul></li>
<li>Integration with Box
<ul>
<li>Use case(s) for Box integration</li>
<li>Installing the Box SDK in your kernel of choice</li>
<li>Authentication</li>
<li>Creating a client id</li>
<li>Bootstrap process and notebook</li>
<li>Notebook authentication boilerplate code</li>
<li>Retrieving data from Box</li>
<li>Pushing data to Box</li>
<li>Modularization</li>
</ul></li>
</ul>
</div>
<div id="review-of-jupyteripython-notebook-basics" class="slide section level1">
<h1>Review of Jupyter/IPython notebook basics</h1>
<p>We'll review basic usage of a notebook using <em>notebook-example.ipynb</em>.</p>
<p>I've installed the Anaconda distribution of Python (actually a reduced form version called Miniconda) on my laptop, from <a href="https://conda.io/miniconda.html">https://conda.io/miniconda.html</a> and then from the command line installed Jupyter:</p>
<pre><code>conda install jupyter</code></pre>
<p>which also installs IPython. Then I can start a Jupyter notebook server and run notebooks by simply invoking <code>jupyter notebook</code> or (to open an existing notebook) <code>jupyter notebook notebook-example.ipynb</code>.</p>
</div>
<div id="introduction-to-jupyterhub-on-savio-logging-on" class="slide section level1">
<h1>Introduction to Jupyterhub on Savio: logging on</h1>
<p>We can run notebooks on Savio, using a variety of partitions/nodes.</p>
<p>We need to login via a browser from <a href="https://ln000.brc.berkeley.edu">https://ln000.brc.berkeley.edu</a>, authenticating using our usual PIN and one-time password (from Google Authenticator).</p>
<p>We can choose from:</p>
<ul>
<li>Local server
<ul>
<li>testing</li>
<li>controlling multi-node parallel clusters</li>
</ul></li>
<li>Savio (1 node)
<ul>
<li>for parallel computation</li>
</ul></li>
<li>Savio2 (1 node)
<ul>
<li>for parallel computation</li>
</ul></li>
<li>Savio2_HTC (1 cpu)
<ul>
<li>for serial computation</li>
</ul></li>
<li>Let us know if you need other configurations (e.g., GPU nodes, big memory nodes, etc.)</li>
</ul>
<p>Then we can immediately start working on our example notebook, <em>notebook-example.ipynb</em>.</p>
<p>We could also start a terminal and get a view on the filesystem and on the processes running on the node.</p>
<p>You can use either Python 2 or Python 3, but note that Python 3 is Python 3.5.1 (not the Python 3.2.3 available through the Savio module).</p>
</div>
<div id="introduction-to-jupyterhub-on-savio-installing-python-packages" class="slide section level1">
<h1>Introduction to Jupyterhub on Savio: installing Python packages</h1>
<p>Since our JupyterHub installation uses Python 3.5.1 (not the Python 3.2.3 available via the Savio module system), to install additional Python 3 packages you'll need to run the Python 3.5.1 version of pip, e.g.,</p>
<pre><code>/global/software/sl-6.x86_64/modules/langs/python/3.5.1/bin/pip install --user statsmodels</code></pre>
<p>To install Python 2 packages, follow the usual procedure:</p>
<pre><code>module load python/2.7.8
pip install --user statsmodels</code></pre>
<p>All of the Python 2 packages available via the Python 2 module and all of the Python 3.5.1 packages we've installed in <em>/global/software/sl-6.x86_64/modules/langs/python/3.5.1/lib/python3.5/site-packages</em> and <em>/global/software/sl-6.x86_64/modules/langs/python/3.5.1/lib/python3.5</em> are available in your IPython notebooks.</p>
<p>To see what packages you've installed we can look in <em>~/.local/lib/python2.7</em> and <em>~/.local/lib/python3.5</em>.</p>
</div>
<div id="introduction-to-jupyterhub-on-savio-managing-notebooks" class="slide section level1">
<h1>Introduction to Jupyterhub on Savio: managing notebooks</h1>
<p>The key navigation points in your Jupyter browser session are:</p>
<ul>
<li><code>jupyter</code> banner: to navigate within your session (to the file manager and between processes)</li>
<li><code>Running -> shutdown</code>: to terminate an individual process</li>
<li><code>Logout</code>: to logout of the session but NOT terminate your notebook/terminal processes (running processes on compute nodes will still incur charges)</li>
<li><code>Control Panel -> Stop My Server</code>: to terminate all processes and your server</li>
</ul>
</div>
<div id="ipython-clusters-one-node-cluster----setup" class="slide section level1">
<h1>IPython clusters: one node cluster -- setup</h1>
<p>One can do parallel processing in a variety of ways in Python. Here we'll cover IPython parallel clusters. We'll start with a one node cluster. By default the parallel workers will run Python 3.5.1, so it's best to start a Python 3 notebook.</p>
<p>Such clusters (except for basic testing) should be run under the <code>Savio - 1 node</code> or <code>Savio2 - 1 node</code> job profile.</p>
<ul>
<li><p>Invoke</p>
<pre><code>/global/software/sl-6.x86_64/modules/langs/python/3.5.1/bin/ipcluster \
nbextension enable --user</code></pre>
from a terminal (either logging in via SSH or in a Jupyter terminal session).</li>
<li><p>You should see that the <code>Clusters</code> tab is now <code>IPython Clusters</code> on the main Jupyter page. If not, then stop and restart your server through <code>Control Panel</code>.</p></li>
</ul>
</div>
<div id="ipython-clusters-one-node-cluster----running" class="slide section level1">
<h1>IPython clusters: one node cluster -- running</h1>
<ul>
<li>Start a server under the <code>Savio - 1 node</code> or <code>Savio2 - 1 node</code> job profile.</li>
<li>Under <code>IPython Clusters</code>, select the number of engines (i.e., cores, cpus) and <code>Start</code> under the <code>default</code> profile.</li>
<li>Start or go to a running notebook.</li>
<li>This code should connect you to your cluster:</li>
</ul>
<pre><code>import ipyparallel as ipp # Python 3
rc = ipp.Client(profile='default', cluster_id='')</code></pre>
<p>The file <em>parallel-example.ipynb</em> has some examples of some basic usage.</p>
</div>
<div id="ipython-clusters-customization----setup" class="slide section level1">
<h1>IPython clusters: customization -- setup</h1>
<p>Suppose you want to run your workers in a different Savio partition, change the time limit, use more than one node for your computation, or use Python 2.7.8 with an IPython cluster.</p>
<p>In that case you need to set up a custom cluster profile. This involves setting values for SLURM options in such a way that IPython can submit the appropriate job to SLURM.</p>
<p>Again, this is based on Python 3.5.1 and would involve some customization for Python 2.7.8 (see comments below and in <em>custom_profile_code.py</em>).</p>
<p>Here are the steps (also documented on the Savio JupyterHub page):</p>
<ul>
<li><p>In a terminal (either from JupyterHub or not), enter:</p>
<pre><code>PROFILENAME=myNewProfile
/global/software/sl-6.x86_64/modules/langs/python/3.5.1/bin/ipython \
profile create --parallel --profile=${PROFILENAME}
# module load python/2.7.8 ipython; ipython profile create --parallel \
--profile=${PROFILENAME} # for Python 2.7.8</code></pre></li>
<li><p>Now go to the newly created directory for the profile:</p>
<pre><code>cd $HOME/.ipython/profile_${PROFILENAME}</code></pre></li>
<li><p>Add the following to <em>ipcontroller_config.py</em>:</p>
<pre><code>cat >> ipcontroller_config.py << EOF
import netifaces
c.IPControllerApp.location = netifaces.ifaddresses('eth0')[netifaces.AF_INET][0]['addr']
c.HubFactory.ip = '*'
EOF</code></pre></li>
<li><p>Now modify the options in <em>custom_profile_code.py</em> and append it to <em>ipcluster_config.py</em>.</p>
<pre><code>cat custom_profile_code.py >> ipcluster_config.py</code></pre></li>
</ul>
</div>
<div id="ipython-clusters-customization----using" class="slide section level1">
<h1>IPython clusters: customization -- using</h1>
<p>Now we should be able to start a cluster using this profile under the <em>IPython Clusters</em> tab, as follows.</p>
<p>Select the number of engines corresponding to the number of tasks you specified in ipcluster_config.py (48 in this case).</p>
<p>Then start a Notebook and connect to the cluster.</p>
<p>For example, we use the same code as we did before, but needing to specify the profile name, 'myNewProfile' in this case.</p>
<pre><code>import ipyparallel as ipp # Python 3.5.1
# import IPython.parallel as ipp # Python 2.7.8
rc = ipp.Client(profile='myNewProfile', cluster_id='')
rc.ids</code></pre>
</div>
<div id="integration-with-box" class="slide section level1">
<h1>Integration with Box</h1>
<p>Maurice will lead this section, using the <em>BoxAuthenticationBootstrap.ipynb</em> and <em>TransferFilesFromBoxToSavioScratch.ipynb</em> notebooks.</p>
</div>
<div id="how-to-get-additional-help" class="slide section level1">
<h1>How to get additional help</h1>
<ul>
<li>For technical issues and questions about using Savio:
<ul>
<li>brc-hpc-help@berkeley.edu</li>
</ul></li>
<li>For questions about computing resources in general, including cloud computing:
<ul>
<li>brc@berkeley.edu</li>
</ul></li>
<li>For questions about data management (including HIPAA-protected data):
<ul>
<li>researchdata@berkeley.edu</li>
</ul></li>
</ul>
</div>
<div id="how-to-help-us" class="slide section level1">
<h1>How to help us</h1>
<ul>
<li>Please help us justify the campus investment in Savio (and keep it available in the future) by <a href="https://docs.google.com/a/berkeley.edu/forms/d/e/1FAIpQLSdqhh2A77-l8N3eOcOzrH508UKfhIvPn8h5gLDUJ9XrRLvA5Q/viewform">telling us how BRC impacts your research</a>, e.g., through
<ul>
<li>publications about research supported by BRC</li>
<li>grants for research that will be supported by BRC resources or consulting</li>
<li>recruitment or retention cases in which BRC resources/services play a role</li>
<li>classes that will be supported by the BRC program</li>
</ul></li>
<li>Please fill out an evaluation form</li>
</ul>
</div>
</body>
</html>