/[gentoo]/xml/htdocs/doc/en/hpc-howto.xml
Gentoo

Diff of /xml/htdocs/doc/en/hpc-howto.xml

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

Revision 1.4 Revision 1.15
1<?xml version='1.0' encoding="UTF-8"?> 1<?xml version='1.0' encoding="UTF-8"?>
2
3<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/hpc-howto.xml,v 1.4 2005/05/20 16:54:18 neysx Exp $ --> 2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/hpc-howto.xml,v 1.15 2010/06/07 09:08:37 nightmorph Exp $ -->
4
5<!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> 3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd">
6<guide link="/doc/en/hpc-howto.xml">
7 4
5<guide>
8<title>High Performance Computing on Gentoo Linux</title> 6<title>High Performance Computing on Gentoo Linux</title>
9 7
10<author title="Author"> 8<author title="Author">
11 <mail link="marc@adelielinux.com">Marc St-Pierre</mail> 9 <mail link="marc@adelielinux.com">Marc St-Pierre</mail>
12</author> 10</author>
18</author> 16</author>
19<author title="Assistant/Research"> 17<author title="Assistant/Research">
20 <mail link="olivier@adelielinux.com">Olivier Crete</mail> 18 <mail link="olivier@adelielinux.com">Olivier Crete</mail>
21</author> 19</author>
22<author title="Reviewer"> 20<author title="Reviewer">
23 <mail link="spyderous@gentoo.org">Donnie Berkholz</mail> 21 <mail link="dberkholz@gentoo.org">Donnie Berkholz</mail>
22</author>
23<author title="Editor">
24 <mail link="nightmorph"/>
24</author> 25</author>
25 26
26<!-- No licensing information; this document has been written by a third-party 27<!-- No licensing information; this document has been written by a third-party
27 organisation without additional licensing information. 28 organisation without additional licensing information.
28 29
29 In other words, this is copyright adelielinux R&D; Gentoo only has 30 In other words, this is copyright adelielinux R&D; Gentoo only has
30 permission to distribute this document as-is and update it when appropriate 31 permission to distribute this document as-is and update it when appropriate
31 as long as the adelie linux R&D notice stays 32 as long as the adelie linux R&D notice stays
32--> 33-->
33 34
34<abstract> 35<abstract>
35This document was written by people at the Adelie Linux R&amp;D Center 36This document was written by people at the Adelie Linux R&amp;D Center
36&lt;http://www.adelielinux.com&gt; as a step-by-step guide to turn a Gentoo 37&lt;http://www.adelielinux.com&gt; as a step-by-step guide to turn a Gentoo
37System into an High Performance Computing (HPC) system. 38System into a High Performance Computing (HPC) system.
38</abstract> 39</abstract>
39 40
40<version>1.1</version> 41<version>1.7</version>
41<date>2003-08-01</date> 42<date>2010-06-07</date>
42 43
43<chapter> 44<chapter>
44<title>Introduction</title> 45<title>Introduction</title>
45<section> 46<section>
46<body> 47<body>
47 48
48<p> 49<p>
49Gentoo Linux, a special flavor of Linux that can be automatically optimized 50Gentoo Linux, a special flavor of Linux that can be automatically optimized
50and customized for just about any application or need. Extreme performance, 51and customized for just about any application or need. Extreme performance,
51configurability and a top-notch user and developer community are all hallmarks 52configurability and a top-notch user and developer community are all hallmarks
52of the Gentoo experience. 53of the Gentoo experience.
53</p> 54</p>
54 55
55<p> 56<p>
56Thanks to a technology called Portage, Gentoo Linux can become an ideal secure 57Thanks to a technology called Portage, Gentoo Linux can become an ideal secure
57server, development workstation, professional desktop, gaming system, embedded 58server, development workstation, professional desktop, gaming system, embedded
58solution or... a High Performance Computing system. Because of its 59solution or... a High Performance Computing system. Because of its
59near-unlimited adaptability, we call Gentoo Linux a metadistribution. 60near-unlimited adaptability, we call Gentoo Linux a metadistribution.
60</p> 61</p>
61 62
62<p> 63<p>
63This document explains how to turn a Gentoo system into a High Performance 64This document explains how to turn a Gentoo system into a High Performance
64Computing system. Step by step, it explains what packages one may want to 65Computing system. Step by step, it explains what packages one may want to
65install and helps configure them. 66install and helps configure them.
66</p> 67</p>
67 68
68<p> 69<p>
69Obtain Gentoo Linux from the website <uri>http://www.gentoo.org</uri>, and 70Obtain Gentoo Linux from the website <uri>http://www.gentoo.org</uri>, and
85We refer to the <uri link="/doc/en/handbook/">Gentoo Linux Handbooks</uri> in 86We refer to the <uri link="/doc/en/handbook/">Gentoo Linux Handbooks</uri> in
86this section. 87this section.
87</note> 88</note>
88 89
89<p> 90<p>
90During the installation process, you will have to set your USE variables in 91During the installation process, you will have to set your USE variables in
91<path>/etc/make.conf</path>. We recommended that you deactivate all the 92<path>/etc/make.conf</path>. We recommended that you deactivate all the
92defaults (see <path>/etc/make.profile/make.defaults</path>) by negating them 93defaults (see <path>/etc/make.profile/make.defaults</path>) by negating them in
93in make.conf. However, you may want to keep such use variables as x86, 3dnow, 94make.conf. However, you may want to keep such use variables as 3dnow, gpm,
94gpm, mmx, sse, ncurses, pam and tcpd. Refer to the USE documentation for more 95mmx, nptl, nptlonly, sse, ncurses, pam and tcpd. Refer to the USE documentation
95information. 96for more information.
96</p> 97</p>
97 98
98<pre caption="USE Flags"> 99<pre caption="USE Flags">
99USE="-oss 3dnow -apm -arts -avi -berkdb -crypt -cups -encode -gdbm 100USE="-oss 3dnow -apm -avi -berkdb -crypt -cups -encode -gdbm -gif gpm -gtk
100-gif gpm -gtk -imlib -java -jpeg -kde -gnome -libg++ -libwww -mikmod 101-imlib -java -jpeg -kde -gnome -libg++ -libwww -mikmod mmx -motif -mpeg ncurses
101mmx -motif -mpeg ncurses -nls -oggvorbis -opengl pam -pdflib -png 102-nls nptl nptlonly -ogg -opengl pam -pdflib -png -python -qt4 -qtmt
102-python -qt -qtmt -quicktime -readline -sdl -slang -spell -ssl 103-quicktime -readline -sdl -slang -spell -ssl -svga tcpd -truetype -vorbis -X
103-svga tcpd -truetype -X -xml2 -xmms -xv -zlib" 104-xml2 -xv -zlib"
104</pre> 105</pre>
105 106
106<p> 107<p>
107Or simply: 108Or simply:
108</p> 109</p>
114<note> 115<note>
115The <e>tcpd</e> USE flag increases security for packages such as xinetd. 116The <e>tcpd</e> USE flag increases security for packages such as xinetd.
116</note> 117</note>
117 118
118<p> 119<p>
119In step 15 ("Installing the kernel and a System Logger") for stability 120In step 15 ("Installing the kernel and a System Logger") for stability
120reasons, we recommend the vanilla-sources, the official kernel sources 121reasons, we recommend the vanilla-sources, the official kernel sources
121released on <uri>http://www.kernel.org/</uri>, unless you require special 122released on <uri>http://www.kernel.org/</uri>, unless you require special
122support such as xfs. 123support such as xfs.
123</p> 124</p>
124 125
125<pre caption="Installing vanilla-sources"> 126<pre caption="Installing vanilla-sources">
126# <i>emerge -p syslog-ng vanilla-sources</i> 127# <i>emerge -a syslog-ng vanilla-sources</i>
127</pre> 128</pre>
128 129
129<p> 130<p>
130When you install miscellaneous packages, we recommend installing the 131When you install miscellaneous packages, we recommend installing the
131following: 132following:
132</p> 133</p>
133 134
134<pre caption="Installing necessary packages"> 135<pre caption="Installing necessary packages">
135# <i>emerge -p nfs-utils portmap tcpdump ssmtp iptables xinetd</i> 136# <i>emerge -a nfs-utils portmap tcpdump ssmtp iptables xinetd</i>
136</pre> 137</pre>
137 138
138</body> 139</body>
139</section> 140</section>
140<section> 141<section>
141<title>Communication Layer (TCP/IP Network)</title> 142<title>Communication Layer (TCP/IP Network)</title>
142<body> 143<body>
143 144
144<p> 145<p>
145A cluster requires a communication layer to interconnect the slave nodes to 146A cluster requires a communication layer to interconnect the slave nodes to
146the master node. Typically, a FastEthernet or GigaEthernet LAN can be used 147the master node. Typically, a FastEthernet or GigaEthernet LAN can be used
147since they have a good price/performance ratio. Other possibilities include 148since they have a good price/performance ratio. Other possibilities include
148use of products like <uri link="http://www.myricom.com/">Myrinet</uri>, <uri 149use of products like <uri link="http://www.myricom.com/">Myrinet</uri>, <uri
149link="http://quadrics.com/">QsNet</uri> or others. 150link="http://quadrics.com/">QsNet</uri> or others.
150</p> 151</p>
151 152
152<p> 153<p>
153A cluster is composed of two node types: master and slave. Typically, your 154A cluster is composed of two node types: master and slave. Typically, your
154cluster will have one master node and several slave nodes. 155cluster will have one master node and several slave nodes.
155</p> 156</p>
156 157
157<p> 158<p>
158The master node is the cluster's server. It is responsible for telling the 159The master node is the cluster's server. It is responsible for telling the
159slave nodes what to do. This server will typically run such daemons as dhcpd, 160slave nodes what to do. This server will typically run such daemons as dhcpd,
160nfs, pbs-server, and pbs-sched. Your master node will allow interactive 161nfs, pbs-server, and pbs-sched. Your master node will allow interactive
161sessions for users, and accept job executions. 162sessions for users, and accept job executions.
162</p> 163</p>
163 164
164<p> 165<p>
165The slave nodes listen for instructions (via ssh/rsh perhaps) from the master 166The slave nodes listen for instructions (via ssh/rsh perhaps) from the master
166node. They should be dedicated to crunching results and therefore should not 167node. They should be dedicated to crunching results and therefore should not
167run any unecessary services. 168run any unnecessary services.
168</p>
169
170<p> 169</p>
170
171<p>
171The rest of this documentation will assume a cluster configuration as per the 172The rest of this documentation will assume a cluster configuration as per the
172hosts file below. You should maintain on every node such a hosts file 173hosts file below. You should maintain on every node such a hosts file
173(<path>/etc/hosts</path>) with entries for each node participating node in the 174(<path>/etc/hosts</path>) with entries for each node participating node in the
174cluster. 175cluster.
175</p> 176</p>
176 177
177<pre caption="/etc/hosts"> 178<pre caption="/etc/hosts">
178# Adelie Linux Research &amp; Development Center 179# Adelie Linux Research &amp; Development Center
179# /etc/hosts 180# /etc/hosts
180 181
181127.0.0.1 localhost 182127.0.0.1 localhost
182 183
183192.168.1.100 master.adelie master 184192.168.1.100 master.adelie master
184 185
185192.168.1.1 node01.adelie node01 186192.168.1.1 node01.adelie node01
186192.168.1.2 node02.adelie node02 187192.168.1.2 node02.adelie node02
187</pre> 188</pre>
188 189
189<p> 190<p>
190To setup your cluster dedicated LAN, edit your <path>/etc/conf.d/net</path> 191To setup your cluster dedicated LAN, edit your <path>/etc/conf.d/net</path>
191file on the master node. 192file on the master node.
192</p> 193</p>
193 194
194<pre caption="/etc/conf.d/net"> 195<pre caption="/etc/conf.d/net">
195# Copyright 1999-2002 Gentoo Technologies, Inc.
196# Distributed under the terms of the GNU General Public License, v2 or later
197
198# Global config file for net.* rc-scripts 196# Global config file for net.* rc-scripts
199 197
200# This is basically the ifconfig argument without the ifconfig $iface 198# This is basically the ifconfig argument without the ifconfig $iface
201# 199#
202 200
205iface_eth1="dhcp" 203iface_eth1="dhcp"
206</pre> 204</pre>
207 205
208 206
209<p> 207<p>
210Finally, setup a DHCP daemon on the master node to avoid having to maintain a 208Finally, setup a DHCP daemon on the master node to avoid having to maintain a
211network configuration on each slave node. 209network configuration on each slave node.
212</p> 210</p>
213 211
214<pre caption="/etc/dhcp/dhcpd.conf"> 212<pre caption="/etc/dhcp/dhcpd.conf">
215# Adelie Linux Research &amp; Development Center 213# Adelie Linux Research &amp; Development Center
223 option domain-name "adelie"; 221 option domain-name "adelie";
224 range 192.168.1.10 192.168.1.99; 222 range 192.168.1.10 192.168.1.99;
225 option routers 192.168.1.100; 223 option routers 192.168.1.100;
226 224
227 host node01.adelie { 225 host node01.adelie {
228 # MAC address of network card on node 01 226 # MAC address of network card on node 01
229 hardware ethernet 00:07:e9:0f:e2:d4; 227 hardware ethernet 00:07:e9:0f:e2:d4;
230 fixed-address 192.168.1.1; 228 fixed-address 192.168.1.1;
231 } 229 }
232 host node02.adelie { 230 host node02.adelie {
233 # MAC address of network card on node 02 231 # MAC address of network card on node 02
234 hardware ethernet 00:07:e9:0f:e2:6b; 232 hardware ethernet 00:07:e9:0f:e2:6b;
235 fixed-address 192.168.1.2; 233 fixed-address 192.168.1.2;
236 } 234 }
237} 235}
238</pre> 236</pre>
242<section> 240<section>
243<title>NFS/NIS</title> 241<title>NFS/NIS</title>
244<body> 242<body>
245 243
246<p> 244<p>
247The Network File System (NFS) was developed to allow machines to mount a disk 245The Network File System (NFS) was developed to allow machines to mount a disk
248partition on a remote machine as if it were on a local hard drive. This allows 246partition on a remote machine as if it were on a local hard drive. This allows
249for fast, seamless sharing of files across a network. 247for fast, seamless sharing of files across a network.
250</p> 248</p>
251 249
252<p> 250<p>
253There are other systems that provide similar functionality to NFS which could 251There are other systems that provide similar functionality to NFS which could
254be used in a cluster environment. The <uri 252be used in a cluster environment. The <uri
255link="http://www.transarc.com/Product/EFS/AFS/index.html">Andrew File System 253link="http://www.openafs.org">Andrew File System
256from IBM</uri>, recently open-sourced, provides a file sharing mechanism with 254from IBM</uri>, recently open-sourced, provides a file sharing mechanism with
257some additional security and performance features. The <uri 255some additional security and performance features. The <uri
258link="http://www.coda.cs.cmu.edu/">Coda File System</uri> is still in 256link="http://www.coda.cs.cmu.edu/">Coda File System</uri> is still in
259development, but is designed to work well with disconnected clients. Many 257development, but is designed to work well with disconnected clients. Many
260of the features of the Andrew and Coda file systems are slated for inclusion 258of the features of the Andrew and Coda file systems are slated for inclusion
261in the next version of <uri link="http://www.nfsv4.org">NFS (Version 4)</uri>. 259in the next version of <uri link="http://www.nfsv4.org">NFS (Version 4)</uri>.
262The advantage of NFS today is that it is mature, standard, well understood, 260The advantage of NFS today is that it is mature, standard, well understood,
263and supported robustly across a variety of platforms. 261and supported robustly across a variety of platforms.
264</p> 262</p>
265 263
266<pre caption="Ebuilds for NFS-support"> 264<pre caption="Ebuilds for NFS-support">
267# <i>emerge -p nfs-utils portmap</i> 265# <i>emerge -a nfs-utils portmap</i>
268# <i>emerge nfs-utils portmap</i>
269</pre> 266</pre>
270 267
271<p> 268<p>
272Configure and install a kernel to support NFS v3 on all nodes: 269Configure and install a kernel to support NFS v3 on all nodes:
273</p> 270</p>
280CONFIG_NFSD_V3=y 277CONFIG_NFSD_V3=y
281CONFIG_LOCKD_V4=y 278CONFIG_LOCKD_V4=y
282</pre> 279</pre>
283 280
284<p> 281<p>
285On the master node, edit your <path>/etc/hosts.allow</path> file to allow 282On the master node, edit your <path>/etc/hosts.allow</path> file to allow
286connections from slave nodes. If your cluster LAN is on 192.168.1.0/24, 283connections from slave nodes. If your cluster LAN is on 192.168.1.0/24,
287your <path>hosts.allow</path> will look like: 284your <path>hosts.allow</path> will look like:
288</p> 285</p>
289 286
290<pre caption="hosts.allow"> 287<pre caption="hosts.allow">
291portmap:192.168.1.0/255.255.255.0 288portmap:192.168.1.0/255.255.255.0
292</pre> 289</pre>
293 290
294<p> 291<p>
295Edit the <path>/etc/exports</path> file of the master node to export a work 292Edit the <path>/etc/exports</path> file of the master node to export a work
296directory struture (/home is good for this). 293directory structure (/home is good for this).
297</p> 294</p>
298 295
299<pre caption="/etc/exports"> 296<pre caption="/etc/exports">
300/home/ *(rw) 297/home/ *(rw)
301</pre> 298</pre>
302 299
303<p> 300<p>
304Add nfs to your master node's default runlevel: 301Add nfs to your master node's default runlevel:
305</p> 302</p>
307<pre caption="Adding NFS to the default runlevel"> 304<pre caption="Adding NFS to the default runlevel">
308# <i>rc-update add nfs default</i> 305# <i>rc-update add nfs default</i>
309</pre> 306</pre>
310 307
311<p> 308<p>
312To mount the nfs exported filesystem from the master, you also have to 309To mount the nfs exported filesystem from the master, you also have to
313configure your salve nodes' <path>/etc/fstab</path>. Add a line like this 310configure your salve nodes' <path>/etc/fstab</path>. Add a line like this
314one: 311one:
315</p> 312</p>
316 313
317<pre caption="/etc/fstab"> 314<pre caption="/etc/fstab">
318master:/home/ /home nfs rw,exec,noauto,nouser,async 0 0 315master:/home/ /home nfs rw,exec,noauto,nouser,async 0 0
319</pre> 316</pre>
320 317
321<p> 318<p>
322You'll also need to set up your nodes so that they mount the nfs filesystem by 319You'll also need to set up your nodes so that they mount the nfs filesystem by
323issuing this command: 320issuing this command:
324</p> 321</p>
325 322
326<pre caption="Adding nfsmount to the default runlevel"> 323<pre caption="Adding nfsmount to the default runlevel">
327# <i>rc-update add nfsmount default</i> 324# <i>rc-update add nfsmount default</i>
332<section> 329<section>
333<title>RSH/SSH</title> 330<title>RSH/SSH</title>
334<body> 331<body>
335 332
336<p> 333<p>
337SSH is a protocol for secure remote login and other secure network services 334SSH is a protocol for secure remote login and other secure network services
338over an insecure network. OpenSSH uses public key cryptography to provide 335over an insecure network. OpenSSH uses public key cryptography to provide
339secure authorization. Generating the public key, which is shared with remote 336secure authorization. Generating the public key, which is shared with remote
340systems, and the private key which is kept on the local system, is done first 337systems, and the private key which is kept on the local system, is done first
341to configure OpenSSH on the cluster. 338to configure OpenSSH on the cluster.
342</p> 339</p>
343 340
344<p> 341<p>
345For transparent cluster usage, private/public keys may be used. This process 342For transparent cluster usage, private/public keys may be used. This process
346has two steps: 343has two steps:
347</p> 344</p>
348 345
349<ul> 346<ul>
350 <li>Generate public and private keys</li> 347 <li>Generate public and private keys</li>
351 <li>Copy public key to slave nodes</li> 348 <li>Copy public key to slave nodes</li>
352</ul> 349</ul>
353 350
354<p> 351<p>
355For user based authentification, generate and copy as follows: 352For user based authentication, generate and copy as follows:
356</p> 353</p>
357 354
358<pre caption="SSH key authentication"> 355<pre caption="SSH key authentication">
359# <i>ssh-keygen -t dsa</i> 356# <i>ssh-keygen -t dsa</i>
360Generating public/private dsa key pair. 357Generating public/private dsa key pair.
377root@master's password: 374root@master's password:
378id_dsa.pub 100% 234 2.0MB/s 00:00 375id_dsa.pub 100% 234 2.0MB/s 00:00
379</pre> 376</pre>
380 377
381<note> 378<note>
382Host keys must have an empty passphrase. RSA is required for host-based 379Host keys must have an empty passphrase. RSA is required for host-based
383authentification. 380authentication.
384</note> 381</note>
385 382
386<p> 383<p>
387For host based authentication, you will also need to edit your 384For host based authentication, you will also need to edit your
388<path>/etc/ssh/shosts.equiv</path>. 385<path>/etc/ssh/shosts.equiv</path>.
389</p> 386</p>
390 387
391<pre caption="/etc/ssh/shosts.equiv"> 388<pre caption="/etc/ssh/shosts.equiv">
392node01.adelie 389node01.adelie
400 397
401<pre caption="sshd configurations"> 398<pre caption="sshd configurations">
402# $OpenBSD: sshd_config,v 1.42 2001/09/20 20:57:51 mouring Exp $ 399# $OpenBSD: sshd_config,v 1.42 2001/09/20 20:57:51 mouring Exp $
403# This sshd was compiled with PATH=/usr/bin:/bin:/usr/sbin:/sbin 400# This sshd was compiled with PATH=/usr/bin:/bin:/usr/sbin:/sbin
404 401
405# This is the sshd server system-wide configuration file. See sshd(8) 402# This is the sshd server system-wide configuration file. See sshd(8)
406# for more information. 403# for more information.
407 404
408# HostKeys for protocol version 2 405# HostKeys for protocol version 2
409HostKey /etc/ssh/ssh_host_rsa_key 406HostKey /etc/ssh/ssh_host_rsa_key
410</pre> 407</pre>
411 408
412<p> 409<p>
413If your application require RSH communications, you will need to emerge 410If your application require RSH communications, you will need to emerge
414net-misc/netkit-rsh and sys-apps/xinetd. 411<c>net-misc/netkit-rsh</c> and <c>sys-apps/xinetd</c>.
415</p> 412</p>
416 413
417<pre caption="Installing necessary applicaitons"> 414<pre caption="Installing necessary applicaitons">
418# <i>emerge -p xinetd</i> 415# <i>emerge -a xinetd</i>
419# <i>emerge xinetd</i>
420# <i>emerge -p netkit-rsh</i> 416# <i>emerge -a netkit-rsh</i>
421# <i>emerge netkit-rsh</i>
422</pre> 417</pre>
423 418
424<p> 419<p>
425Then configure the rsh deamon. Edit your <path>/etc/xinet.d/rsh</path> file. 420Then configure the rsh deamon. Edit your <path>/etc/xinet.d/rsh</path> file.
426</p> 421</p>
427 422
428<pre caption="rsh"> 423<pre caption="rsh">
429# Adelie Linux Research &amp; Development Center 424# Adelie Linux Research &amp; Development Center
430# /etc/xinetd.d/rsh 425# /etc/xinetd.d/rsh
459Or you can simply trust your cluster LAN: 454Or you can simply trust your cluster LAN:
460</p> 455</p>
461 456
462<pre caption="hosts.allow"> 457<pre caption="hosts.allow">
463# Adelie Linux Research &amp; Development Center 458# Adelie Linux Research &amp; Development Center
464# /etc/hosts.allow 459# /etc/hosts.allow
465 460
466ALL:192.168.1.0/255.255.255.0 461ALL:192.168.1.0/255.255.255.0
467</pre> 462</pre>
468 463
469<p> 464<p>
470Finally, configure host authentification from <path>/etc/hosts.equiv</path>. 465Finally, configure host authentication from <path>/etc/hosts.equiv</path>.
471</p> 466</p>
472 467
473<pre caption="hosts.equiv"> 468<pre caption="hosts.equiv">
474# Adelie Linux Research &amp; Development Center 469# Adelie Linux Research &amp; Development Center
475# /etc/hosts.equiv 470# /etc/hosts.equiv
492<section> 487<section>
493<title>NTP</title> 488<title>NTP</title>
494<body> 489<body>
495 490
496<p> 491<p>
497The Network Time Protocol (NTP) is used to synchronize the time of a computer 492The Network Time Protocol (NTP) is used to synchronize the time of a computer
498client or server to another server or reference time source, such as a radio 493client or server to another server or reference time source, such as a radio
499or satellite receiver or modem. It provides accuracies typically within a 494or satellite receiver or modem. It provides accuracies typically within a
500millisecond on LANs and up to a few tens of milliseconds on WANs relative to 495millisecond on LANs and up to a few tens of milliseconds on WANs relative to
501Coordinated Universal Time (UTC) via a Global Positioning Service (GPS) 496Coordinated Universal Time (UTC) via a Global Positioning Service (GPS)
502receiver, for example. Typical NTP configurations utilize multiple redundant 497receiver, for example. Typical NTP configurations utilize multiple redundant
503servers and diverse network paths in order to achieve high accuracy and 498servers and diverse network paths in order to achieve high accuracy and
504reliability. 499reliability.
505</p> 500</p>
506 501
507<p> 502<p>
508Select a NTP server geographically close to you from <uri 503Select a NTP server geographically close to you from <uri
509link="http://www.eecis.udel.edu/~mills/ntp/servers.html">Public NTP Time 504link="http://www.eecis.udel.edu/~mills/ntp/servers.html">Public NTP Time
510Servers</uri>, and configure your <path>/etc/conf.d/ntp</path> and 505Servers</uri>, and configure your <path>/etc/conf.d/ntp</path> and
511<path>/etc/ntp.conf</path> files on the master node. 506<path>/etc/ntp.conf</path> files on the master node.
512</p> 507</p>
513 508
514<pre caption="Master /etc/conf.d/ntp"> 509<pre caption="Master /etc/conf.d/ntp">
515# Copyright 1999-2002 Gentoo Technologies, Inc.
516# Distributed under the terms of the GNU General Public License v2
517# /etc/conf.d/ntpd 510# /etc/conf.d/ntpd
518 511
519# NOTES: 512# NOTES:
520# - NTPDATE variables below are used if you wish to set your 513# - NTPDATE variables below are used if you wish to set your
521# clock when you start the ntp init.d script 514# clock when you start the ntp init.d script
536NTPDATE_CMD="ntpdate" 529NTPDATE_CMD="ntpdate"
537 530
538# Options to pass to the above command 531# Options to pass to the above command
539# Most people should just uncomment this variable and 532# Most people should just uncomment this variable and
540# change 'someserver' to a valid hostname which you 533# change 'someserver' to a valid hostname which you
541# can aquire from the URL's below 534# can acquire from the URL's below
542NTPDATE_OPTS="-b ntp1.cmc.ec.gc.ca" 535NTPDATE_OPTS="-b ntp1.cmc.ec.gc.ca"
543 536
544## 537##
545# A list of available servers is available here: 538# A list of available servers is available here:
546# http://www.eecis.udel.edu/~mills/ntp/servers.html 539# http://www.eecis.udel.edu/~mills/ntp/servers.html
554#NTPD_OPTS="" 547#NTPD_OPTS=""
555 548
556</pre> 549</pre>
557 550
558<p> 551<p>
559Edit your <path>/etc/ntp.conf</path> file on the master to setup an external 552Edit your <path>/etc/ntp.conf</path> file on the master to setup an external
560synchronization source: 553synchronization source:
561</p> 554</p>
562 555
563<pre caption="Master ntp.conf"> 556<pre caption="Master ntp.conf">
564# Adelie Linux Research &amp; Development Center 557# Adelie Linux Research &amp; Development Center
570# Synchronization source #2 563# Synchronization source #2
571server ntp2.cmc.ec.gc.ca 564server ntp2.cmc.ec.gc.ca
572restrict ntp2.cmc.ec.gc.ca 565restrict ntp2.cmc.ec.gc.ca
573stratum 10 566stratum 10
574driftfile /etc/ntp.drift.server 567driftfile /etc/ntp.drift.server
575logfile /var/log/ntp 568logfile /var/log/ntp
576broadcast 192.168.1.255 569broadcast 192.168.1.255
577restrict default kod 570restrict default kod
578restrict 127.0.0.1 571restrict 127.0.0.1
579restrict 192.168.1.0 mask 255.255.255.0 572restrict 192.168.1.0 mask 255.255.255.0
580</pre> 573</pre>
581 574
582<p> 575<p>
583And on all your slave nodes, setup your synchronization source as your master 576And on all your slave nodes, setup your synchronization source as your master
584node. 577node.
585</p> 578</p>
586 579
587<pre caption="Node /etc/conf.d/ntp"> 580<pre caption="Node /etc/conf.d/ntp">
588# Copyright 1999-2002 Gentoo Technologies, Inc.
589# Distributed under the terms of the GNU General Public License v2
590# /etc/conf.d/ntpd 581# /etc/conf.d/ntpd
591 582
592NTPDATE_WARN="n" 583NTPDATE_WARN="n"
593NTPDATE_CMD="ntpdate" 584NTPDATE_CMD="ntpdate"
594NTPDATE_OPTS="-b master" 585NTPDATE_OPTS="-b master"
601# Synchronization source #1 592# Synchronization source #1
602server master 593server master
603restrict master 594restrict master
604stratum 11 595stratum 11
605driftfile /etc/ntp.drift.server 596driftfile /etc/ntp.drift.server
606logfile /var/log/ntp 597logfile /var/log/ntp
607restrict default kod 598restrict default kod
608restrict 127.0.0.1 599restrict 127.0.0.1
609</pre> 600</pre>
610 601
611<p> 602<p>
615<pre caption="Adding ntpd to the default runlevel"> 606<pre caption="Adding ntpd to the default runlevel">
616# <i>rc-update add ntpd default</i> 607# <i>rc-update add ntpd default</i>
617</pre> 608</pre>
618 609
619<note> 610<note>
620NTP will not update the local clock if the time difference between your 611NTP will not update the local clock if the time difference between your
621synchronization source and the local clock is too great. 612synchronization source and the local clock is too great.
622</note> 613</note>
623 614
624</body> 615</body>
625</section> 616</section>
630<p> 621<p>
631To setup a firewall on your cluster, you will need iptables. 622To setup a firewall on your cluster, you will need iptables.
632</p> 623</p>
633 624
634<pre caption="Installing iptables"> 625<pre caption="Installing iptables">
635# <i>emerge -p iptables</i> 626# <i>emerge -a iptables</i>
636# <i>emerge iptables</i>
637</pre> 627</pre>
638 628
639<p> 629<p>
640Required kernel configuration: 630Required kernel configuration:
641</p> 631</p>
657And the rules required for this firewall: 647And the rules required for this firewall:
658</p> 648</p>
659 649
660<pre caption="rule-save"> 650<pre caption="rule-save">
661# Adelie Linux Research &amp; Development Center 651# Adelie Linux Research &amp; Development Center
662# /var/lib/iptbles/rule-save 652# /var/lib/iptables/rule-save
663 653
664*filter 654*filter
665:INPUT ACCEPT [0:0] 655:INPUT ACCEPT [0:0]
666:FORWARD ACCEPT [0:0] 656:FORWARD ACCEPT [0:0]
667:OUTPUT ACCEPT [0:0] 657:OUTPUT ACCEPT [0:0]
698<section> 688<section>
699<title>OpenPBS</title> 689<title>OpenPBS</title>
700<body> 690<body>
701 691
702<p> 692<p>
703The Portable Batch System (PBS) is a flexible batch queueing and workload 693The Portable Batch System (PBS) is a flexible batch queueing and workload
704management system originally developed for NASA. It operates on networked, 694management system originally developed for NASA. It operates on networked,
705multi-platform UNIX environments, including heterogeneous clusters of 695multi-platform UNIX environments, including heterogeneous clusters of
706workstations, supercomputers, and massively parallel systems. Development of 696workstations, supercomputers, and massively parallel systems. Development of
707PBS is provided by Altair Grid Technologies. 697PBS is provided by Altair Grid Technologies.
708</p> 698</p>
709 699
710<pre caption="Installing openpbs"> 700<pre caption="Installing openpbs">
711# <i>emerge -p openpbs</i> 701# <i>emerge -a openpbs</i>
712</pre> 702</pre>
713 703
714<note> 704<note>
715OpenPBS ebuild does not currently set proper permissions on var-directories 705OpenPBS ebuild does not currently set proper permissions on var-directories
716used by OpenPBS. 706used by OpenPBS.
717</note> 707</note>
718 708
719<p> 709<p>
720Before starting using OpenPBS, some configurations are required. The files 710Before starting using OpenPBS, some configurations are required. The files
721you will need to personalize for your system are: 711you will need to personalize for your system are:
722</p> 712</p>
723 713
724<ul> 714<ul>
725 <li>/etc/pbs_environment</li> 715 <li>/etc/pbs_environment</li>
726 <li>/var/spool/PBS/server_name</li> 716 <li>/var/spool/PBS/server_name</li>
727 <li>/var/spool/PBS/server_priv/nodes</li> 717 <li>/var/spool/PBS/server_priv/nodes</li>
728 <li>/var/spool/PBS/mom_priv/config</li> 718 <li>/var/spool/PBS/mom_priv/config</li>
729 <li>/var/spool/PBS/sched_priv/sched_config</li> 719 <li>/var/spool/PBS/sched_priv/sched_config</li>
730</ul> 720</ul>
731 721
732<p> 722<p>
733Here is a sample sched_config: 723Here is a sample sched_config:
734</p> 724</p>
769set server resources_default.nodes = 1 759set server resources_default.nodes = 1
770set server scheduler_iteration = 60 760set server scheduler_iteration = 60
771</pre> 761</pre>
772 762
773<p> 763<p>
774To submit a task to OpenPBS, the command <c>qsub</c> is used with some 764To submit a task to OpenPBS, the command <c>qsub</c> is used with some
775optional parameters. In the exemple below, "-l" allows you to specify 765optional parameters. In the example below, "-l" allows you to specify
776the resources required, "-j" provides for redirection of standard out and 766the resources required, "-j" provides for redirection of standard out and
777standard error, and the "-m" will e-mail the user at begining (b), end (e) 767standard error, and the "-m" will e-mail the user at beginning (b), end (e)
778and on abort (a) of the job. 768and on abort (a) of the job.
779</p> 769</p>
780 770
781<pre caption="Submitting a task"> 771<pre caption="Submitting a task">
782<comment>(submit and request from OpenPBS that myscript be executed on 2 nodes)</comment> 772<comment>(submit and request from OpenPBS that myscript be executed on 2 nodes)</comment>
783# <i>qsub -l nodes=2 -j oe -m abe myscript</i> 773# <i>qsub -l nodes=2 -j oe -m abe myscript</i>
784</pre> 774</pre>
785 775
786<p> 776<p>
787Normally jobs submitted to OpenPBS are in the form of scripts. Sometimes, you 777Normally jobs submitted to OpenPBS are in the form of scripts. Sometimes, you
788may want to try a task manually. To request an interactive shell from OpenPBS, 778may want to try a task manually. To request an interactive shell from OpenPBS,
789use the "-I" parameter. 779use the "-I" parameter.
790</p> 780</p>
791 781
792<pre caption="Requesting an interactive shell"> 782<pre caption="Requesting an interactive shell">
793# <i>qsub -I</i> 783# <i>qsub -I</i>
809<section> 799<section>
810<title>MPICH</title> 800<title>MPICH</title>
811<body> 801<body>
812 802
813<p> 803<p>
814Message passing is a paradigm used widely on certain classes of parallel 804Message passing is a paradigm used widely on certain classes of parallel
815machines, especially those with distributed memory. MPICH is a freely 805machines, especially those with distributed memory. MPICH is a freely
816available, portable implementation of MPI, the Standard for message-passing 806available, portable implementation of MPI, the Standard for message-passing
817libraries. 807libraries.
818</p> 808</p>
819 809
820<p> 810<p>
821The mpich ebuild provided by Adelie Linux allows for two USE flags: 811The mpich ebuild provided by Adelie Linux allows for two USE flags:
822<e>doc</e> and <e>crypt</e>. <e>doc</e> will cause documentation to be 812<e>doc</e> and <e>crypt</e>. <e>doc</e> will cause documentation to be
823installed, while <e>crypt</e> will configure MPICH to use <c>ssh</c> instead 813installed, while <e>crypt</e> will configure MPICH to use <c>ssh</c> instead
824of <c>rsh</c>. 814of <c>rsh</c>.
825</p> 815</p>
826 816
827<pre caption="Installing the mpich application"> 817<pre caption="Installing the mpich application">
828# <i>emerge -p mpich</i> 818# <i>emerge -a mpich</i>
829# <i>emerge mpich</i>
830</pre> 819</pre>
831 820
832<p> 821<p>
833You may need to export a mpich work directory to all your slave nodes in 822You may need to export a mpich work directory to all your slave nodes in
834<path>/etc/exports</path>: 823<path>/etc/exports</path>:
835</p> 824</p>
836 825
837<pre caption="/etc/exports"> 826<pre caption="/etc/exports">
838/home *(rw) 827/home *(rw)
839</pre> 828</pre>
840 829
841<p> 830<p>
842Most massively parallel processors (MPPs) provide a way to start a program on 831Most massively parallel processors (MPPs) provide a way to start a program on
843a requested number of processors; <c>mpirun</c> makes use of the appropriate 832a requested number of processors; <c>mpirun</c> makes use of the appropriate
844command whenever possible. In contrast, workstation clusters require that each 833command whenever possible. In contrast, workstation clusters require that each
845process in a parallel job be started individually, though programs to help 834process in a parallel job be started individually, though programs to help
846start these processes exist. Because workstation clusters are not already 835start these processes exist. Because workstation clusters are not already
847organized as an MPP, additional information is required to make use of them. 836organized as an MPP, additional information is required to make use of them.
848Mpich should be installed with a list of participating workstations in the 837Mpich should be installed with a list of participating workstations in the
849file <path>machines.LINUX</path> in the directory 838file <path>machines.LINUX</path> in the directory
850<path>/usr/share/mpich/</path>. This file is used by <c>mpirun</c> to choose 839<path>/usr/share/mpich/</path>. This file is used by <c>mpirun</c> to choose
851processors to run on. 840processors to run on.
852</p> 841</p>
853 842
854<p> 843<p>
855Edit this file to reflect your cluster-lan configuration: 844Edit this file to reflect your cluster-lan configuration:
856</p> 845</p>
857 846
858<pre caption="/usr/share/mpich/machines.LINUX"> 847<pre caption="/usr/share/mpich/machines.LINUX">
859# Change this file to contain the machines that you want to use 848# Change this file to contain the machines that you want to use
860# to run MPI jobs on. The format is one host name per line, with either 849# to run MPI jobs on. The format is one host name per line, with either
861# hostname 850# hostname
862# or 851# or
863# hostname:n 852# hostname:n
864# where n is the number of processors in an SMP. The hostname should 853# where n is the number of processors in an SMP. The hostname should
865# be the same as the result from the command "hostname" 854# be the same as the result from the command "hostname"
866master 855master
867node01 856node01
868node02 857node02
869# node03 858# node03
870# node04 859# node04
871# ... 860# ...
872</pre> 861</pre>
873 862
874<p> 863<p>
875Use the script <c>tstmachines</c> in <path>/usr/sbin/</path> to ensure that 864Use the script <c>tstmachines</c> in <path>/usr/sbin/</path> to ensure that
876you can use all of the machines that you have listed. This script performs 865you can use all of the machines that you have listed. This script performs
877an <c>rsh</c> and a short directory listing; this tests that you both have 866an <c>rsh</c> and a short directory listing; this tests that you both have
878access to the node and that a program in the current directory is visible on 867access to the node and that a program in the current directory is visible on
879the remote node. If there are any problems, they will be listed. These 868the remote node. If there are any problems, they will be listed. These
880problems must be fixed before proceeding. 869problems must be fixed before proceeding.
881</p> 870</p>
882 871
883<p> 872<p>
884The only argument to <c>tstmachines</c> is the name of the architecture; this 873The only argument to <c>tstmachines</c> is the name of the architecture; this
885is the same name as the extension on the machines file. For example, the 874is the same name as the extension on the machines file. For example, the
886following tests that a program in the current directory can be executed by 875following tests that a program in the current directory can be executed by
887all of the machines in the LINUX machines list. 876all of the machines in the LINUX machines list.
888</p> 877</p>
889 878
890<pre caption="Running a test"> 879<pre caption="Running a test">
891# <i>/usr/local/mpich/sbin/tstmachines LINUX</i> 880# <i>/usr/local/mpich/sbin/tstmachines LINUX</i>
892</pre> 881</pre>
893 882
894<note> 883<note>
895This program is silent if all is well; if you want to see what it is doing, 884This program is silent if all is well; if you want to see what it is doing,
896use the -v (for verbose) argument: 885use the -v (for verbose) argument:
897</note> 886</note>
898 887
899<pre caption="Running a test verbosively"> 888<pre caption="Running a test verbosively">
900# <i>/usr/local/mpich/sbin/tstmachines -v LINUX</i> 889# <i>/usr/local/mpich/sbin/tstmachines -v LINUX</i>
912Trying user program on host1.uoffoo.edu ... 901Trying user program on host1.uoffoo.edu ...
913Trying user program on host2.uoffoo.edu ... 902Trying user program on host2.uoffoo.edu ...
914</pre> 903</pre>
915 904
916<p> 905<p>
917If <c>tstmachines</c> finds a problem, it will suggest possible reasons and 906If <c>tstmachines</c> finds a problem, it will suggest possible reasons and
918solutions. In brief, there are three tests: 907solutions. In brief, there are three tests:
919</p> 908</p>
920 909
921<ul> 910<ul>
922 <li> 911 <li>
923 <e>Can processes be started on remote machines?</e> tstmachines attempts 912 <e>Can processes be started on remote machines?</e> tstmachines attempts
924 to run the shell command true on each machine in the machines files by 913 to run the shell command true on each machine in the machines files by
925 using the remote shell command. 914 using the remote shell command.
926 </li> 915 </li>
927 <li> 916 <li>
928 <e>Is current working directory available to all machines?</e> This 917 <e>Is current working directory available to all machines?</e> This
929 attempts to ls a file that tstmachines creates by running ls using the 918 attempts to ls a file that tstmachines creates by running ls using the
930 remote shell command. 919 remote shell command.
931 </li> 920 </li>
932 <li> 921 <li>
933 <e>Can user programs be run on remote systems?</e> This checks that shared 922 <e>Can user programs be run on remote systems?</e> This checks that shared
934 libraries and other components have been properly installed on all 923 libraries and other components have been properly installed on all
935 machines. 924 machines.
936 </li> 925 </li>
937</ul> 926</ul>
938 927
939<p> 928<p>
946# <i>make hello++</i> 935# <i>make hello++</i>
947# <i>mpirun -machinefile /usr/share/mpich/machines.LINUX -np 1 hello++</i> 936# <i>mpirun -machinefile /usr/share/mpich/machines.LINUX -np 1 hello++</i>
948</pre> 937</pre>
949 938
950<p> 939<p>
951For further information on MPICH, consult the documentation at <uri 940For further information on MPICH, consult the documentation at <uri
952link="http://www-unix.mcs.anl.gov/mpi/mpich/docs/mpichman-chp4/mpichman-chp4.htm">http://www-unix.mcs.anl.gov/mpi/mpich/docs/mpichman-chp4/mpichman-chp4.htm</uri>. 941link="http://www-unix.mcs.anl.gov/mpi/mpich/docs/mpichman-chp4/mpichman-chp4.htm">http://www-unix.mcs.anl.gov/mpi/mpich/docs/mpichman-chp4/mpichman-chp4.htm</uri>.
953</p> 942</p>
954 943
955</body> 944</body>
956</section> 945</section>
980<title>Bibliography</title> 969<title>Bibliography</title>
981<section> 970<section>
982<body> 971<body>
983 972
984<p> 973<p>
985The original document is published at the <uri 974The original document is published at the <uri
986link="http://www.adelielinux.com">Adelie Linux R&amp;D Centre</uri> web site, 975link="http://www.adelielinux.com">Adelie Linux R&amp;D Centre</uri> web site,
987and is reproduced here with the permission of the authors and <uri 976and is reproduced here with the permission of the authors and <uri
988link="http://www.cyberlogic.ca">Cyberlogic</uri>'s Adelie Linux R&amp;D 977link="http://www.cyberlogic.ca">Cyberlogic</uri>'s Adelie Linux R&amp;D
989Centre. 978Centre.
990</p> 979</p>
991 980
992<ul> 981<ul>
993 <li><uri>http://www.gentoo.org</uri>, Gentoo Technologies, Inc.</li> 982 <li><uri>http://www.gentoo.org</uri>, Gentoo Foundation, Inc.</li>
994 <li> 983 <li>
995 <uri link="http://www.adelielinux.com">http://www.adelielinux.com</uri>, 984 <uri link="http://www.adelielinux.com">http://www.adelielinux.com</uri>,
996 Adelie Linux Research and Development Centre 985 Adelie Linux Research and Development Centre
997 </li> 986 </li>
998 <li> 987 <li>
999 <uri link="http://nfs.sourceforge.net/">http://nfs.sourceforge.net</uri>, 988 <uri link="http://nfs.sourceforge.net/">http://nfs.sourceforge.net</uri>,
1000 Linux NFS Project 989 Linux NFS Project
1001 </li> 990 </li>
1002 <li> 991 <li>
1003 <uri link="http://www-unix.mcs.anl.gov/mpi/mpich/">http://www-unix.mcs.anl.gov/mpi/mpich/</uri>, 992 <uri link="http://www-unix.mcs.anl.gov/mpi/mpich/">http://www-unix.mcs.anl.gov/mpi/mpich/</uri>,
1004 Mathematics and Computer Science Division, Argonne National Laboratory 993 Mathematics and Computer Science Division, Argonne National Laboratory
1005 </li> 994 </li>
1006 <li> 995 <li>
1007 <uri link="http://www.ntp.org/">http://ntp.org</uri> 996 <uri link="http://www.ntp.org/">http://ntp.org</uri>
1008 </li> 997 </li>
1009 <li> 998 <li>
1010 <uri link="http://www.eecis.udel.edu/~mills/">http://www.eecis.udel.edu/~mills/</uri>, 999 <uri link="http://www.eecis.udel.edu/~mills/">http://www.eecis.udel.edu/~mills/</uri>,
1011 David L. Mills, University of Delaware 1000 David L. Mills, University of Delaware
1012 </li> 1001 </li>
1013 <li> 1002 <li>
1014 <uri link="http://www.ietf.org/html.charters/secsh-charter.html">http://www.ietf.org/html.charters/secsh-charter.html</uri>, 1003 <uri link="http://www.ietf.org/html.charters/secsh-charter.html">http://www.ietf.org/html.charters/secsh-charter.html</uri>,
1015 Secure Shell Working Group, IETF, Internet Society 1004 Secure Shell Working Group, IETF, Internet Society
1016 </li> 1005 </li>
1017 <li> 1006 <li>
1018 <uri link="http://www.linuxsecurity.com/">http://www.linuxsecurity.com/</uri>, 1007 <uri link="http://www.linuxsecurity.com/">http://www.linuxsecurity.com/</uri>,
1019 Guardian Digital 1008 Guardian Digital
1020 </li> 1009 </li>
1021 <li> 1010 <li>
1022 <uri link="http://www.openpbs.org/">http://www.openpbs.org/</uri>, 1011 <uri link="http://www.openpbs.org/">http://www.openpbs.org/</uri>,
1023 Altair Grid Technologies, LLC. 1012 Altair Grid Technologies, LLC.
1024 </li> 1013 </li>
1025</ul> 1014</ul>
1026 1015
1027</body> 1016</body>

Legend:
Removed from v.1.4  
changed lines
  Added in v.1.15

  ViewVC Help
Powered by ViewVC 1.1.20