/[gentoo]/xml/htdocs/doc/en/hpc-howto.xml
Gentoo

Diff of /xml/htdocs/doc/en/hpc-howto.xml

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

Revision 1.5 Revision 1.15
1<?xml version='1.0' encoding="UTF-8"?> 1<?xml version='1.0' encoding="UTF-8"?>
2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/hpc-howto.xml,v 1.5 2005/10/04 19:05:41 rane Exp $ --> 2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/hpc-howto.xml,v 1.15 2010/06/07 09:08:37 nightmorph Exp $ -->
3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> 3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd">
4 4
5<guide link="/doc/en/hpc-howto.xml"> 5<guide>
6<title>High Performance Computing on Gentoo Linux</title> 6<title>High Performance Computing on Gentoo Linux</title>
7 7
8<author title="Author"> 8<author title="Author">
9 <mail link="marc@adelielinux.com">Marc St-Pierre</mail> 9 <mail link="marc@adelielinux.com">Marc St-Pierre</mail>
10</author> 10</author>
16</author> 16</author>
17<author title="Assistant/Research"> 17<author title="Assistant/Research">
18 <mail link="olivier@adelielinux.com">Olivier Crete</mail> 18 <mail link="olivier@adelielinux.com">Olivier Crete</mail>
19</author> 19</author>
20<author title="Reviewer"> 20<author title="Reviewer">
21 <mail link="spyderous@gentoo.org">Donnie Berkholz</mail> 21 <mail link="dberkholz@gentoo.org">Donnie Berkholz</mail>
22</author>
23<author title="Editor">
24 <mail link="nightmorph"/>
22</author> 25</author>
23 26
24<!-- No licensing information; this document has been written by a third-party 27<!-- No licensing information; this document has been written by a third-party
25 organisation without additional licensing information. 28 organisation without additional licensing information.
26 29
27 In other words, this is copyright adelielinux R&D; Gentoo only has 30 In other words, this is copyright adelielinux R&D; Gentoo only has
28 permission to distribute this document as-is and update it when appropriate 31 permission to distribute this document as-is and update it when appropriate
29 as long as the adelie linux R&D notice stays 32 as long as the adelie linux R&D notice stays
30--> 33-->
31 34
32<abstract> 35<abstract>
33This document was written by people at the Adelie Linux R&amp;D Center 36This document was written by people at the Adelie Linux R&amp;D Center
34&lt;http://www.adelielinux.com&gt; as a step-by-step guide to turn a Gentoo 37&lt;http://www.adelielinux.com&gt; as a step-by-step guide to turn a Gentoo
35System into an High Performance Computing (HPC) system. 38System into a High Performance Computing (HPC) system.
36</abstract> 39</abstract>
37 40
38<version>1.2</version> 41<version>1.7</version>
39<date>2003-08-01</date> 42<date>2010-06-07</date>
40 43
41<chapter> 44<chapter>
42<title>Introduction</title> 45<title>Introduction</title>
43<section> 46<section>
44<body> 47<body>
45 48
46<p> 49<p>
47Gentoo Linux, a special flavor of Linux that can be automatically optimized 50Gentoo Linux, a special flavor of Linux that can be automatically optimized
48and customized for just about any application or need. Extreme performance, 51and customized for just about any application or need. Extreme performance,
49configurability and a top-notch user and developer community are all hallmarks 52configurability and a top-notch user and developer community are all hallmarks
50of the Gentoo experience. 53of the Gentoo experience.
51</p> 54</p>
52 55
53<p> 56<p>
54Thanks to a technology called Portage, Gentoo Linux can become an ideal secure 57Thanks to a technology called Portage, Gentoo Linux can become an ideal secure
55server, development workstation, professional desktop, gaming system, embedded 58server, development workstation, professional desktop, gaming system, embedded
56solution or... a High Performance Computing system. Because of its 59solution or... a High Performance Computing system. Because of its
57near-unlimited adaptability, we call Gentoo Linux a metadistribution. 60near-unlimited adaptability, we call Gentoo Linux a metadistribution.
58</p> 61</p>
59 62
60<p> 63<p>
61This document explains how to turn a Gentoo system into a High Performance 64This document explains how to turn a Gentoo system into a High Performance
62Computing system. Step by step, it explains what packages one may want to 65Computing system. Step by step, it explains what packages one may want to
63install and helps configure them. 66install and helps configure them.
64</p> 67</p>
65 68
66<p> 69<p>
67Obtain Gentoo Linux from the website <uri>http://www.gentoo.org</uri>, and 70Obtain Gentoo Linux from the website <uri>http://www.gentoo.org</uri>, and
83We refer to the <uri link="/doc/en/handbook/">Gentoo Linux Handbooks</uri> in 86We refer to the <uri link="/doc/en/handbook/">Gentoo Linux Handbooks</uri> in
84this section. 87this section.
85</note> 88</note>
86 89
87<p> 90<p>
88During the installation process, you will have to set your USE variables in 91During the installation process, you will have to set your USE variables in
89<path>/etc/make.conf</path>. We recommended that you deactivate all the 92<path>/etc/make.conf</path>. We recommended that you deactivate all the
90defaults (see <path>/etc/make.profile/make.defaults</path>) by negating them 93defaults (see <path>/etc/make.profile/make.defaults</path>) by negating them in
91in make.conf. However, you may want to keep such use variables as x86, 3dnow, 94make.conf. However, you may want to keep such use variables as 3dnow, gpm,
92gpm, mmx, sse, ncurses, pam and tcpd. Refer to the USE documentation for more 95mmx, nptl, nptlonly, sse, ncurses, pam and tcpd. Refer to the USE documentation
93information. 96for more information.
94</p> 97</p>
95 98
96<pre caption="USE Flags"> 99<pre caption="USE Flags">
97USE="-oss 3dnow -apm -arts -avi -berkdb -crypt -cups -encode -gdbm 100USE="-oss 3dnow -apm -avi -berkdb -crypt -cups -encode -gdbm -gif gpm -gtk
98-gif gpm -gtk -imlib -java -jpeg -kde -gnome -libg++ -libwww -mikmod 101-imlib -java -jpeg -kde -gnome -libg++ -libwww -mikmod mmx -motif -mpeg ncurses
99mmx -motif -mpeg ncurses -nls -oggvorbis -opengl pam -pdflib -png 102-nls nptl nptlonly -ogg -opengl pam -pdflib -png -python -qt4 -qtmt
100-python -qt -qtmt -quicktime -readline -sdl -slang -spell -ssl 103-quicktime -readline -sdl -slang -spell -ssl -svga tcpd -truetype -vorbis -X
101-svga tcpd -truetype -X -xml2 -xmms -xv -zlib" 104-xml2 -xv -zlib"
102</pre> 105</pre>
103 106
104<p> 107<p>
105Or simply: 108Or simply:
106</p> 109</p>
112<note> 115<note>
113The <e>tcpd</e> USE flag increases security for packages such as xinetd. 116The <e>tcpd</e> USE flag increases security for packages such as xinetd.
114</note> 117</note>
115 118
116<p> 119<p>
117In step 15 ("Installing the kernel and a System Logger") for stability 120In step 15 ("Installing the kernel and a System Logger") for stability
118reasons, we recommend the vanilla-sources, the official kernel sources 121reasons, we recommend the vanilla-sources, the official kernel sources
119released on <uri>http://www.kernel.org/</uri>, unless you require special 122released on <uri>http://www.kernel.org/</uri>, unless you require special
120support such as xfs. 123support such as xfs.
121</p> 124</p>
122 125
123<pre caption="Installing vanilla-sources"> 126<pre caption="Installing vanilla-sources">
124# <i>emerge -p syslog-ng vanilla-sources</i> 127# <i>emerge -a syslog-ng vanilla-sources</i>
125</pre> 128</pre>
126 129
127<p> 130<p>
128When you install miscellaneous packages, we recommend installing the 131When you install miscellaneous packages, we recommend installing the
129following: 132following:
130</p> 133</p>
131 134
132<pre caption="Installing necessary packages"> 135<pre caption="Installing necessary packages">
133# <i>emerge -p nfs-utils portmap tcpdump ssmtp iptables xinetd</i> 136# <i>emerge -a nfs-utils portmap tcpdump ssmtp iptables xinetd</i>
134</pre> 137</pre>
135 138
136</body> 139</body>
137</section> 140</section>
138<section> 141<section>
139<title>Communication Layer (TCP/IP Network)</title> 142<title>Communication Layer (TCP/IP Network)</title>
140<body> 143<body>
141 144
142<p> 145<p>
143A cluster requires a communication layer to interconnect the slave nodes to 146A cluster requires a communication layer to interconnect the slave nodes to
144the master node. Typically, a FastEthernet or GigaEthernet LAN can be used 147the master node. Typically, a FastEthernet or GigaEthernet LAN can be used
145since they have a good price/performance ratio. Other possibilities include 148since they have a good price/performance ratio. Other possibilities include
146use of products like <uri link="http://www.myricom.com/">Myrinet</uri>, <uri 149use of products like <uri link="http://www.myricom.com/">Myrinet</uri>, <uri
147link="http://quadrics.com/">QsNet</uri> or others. 150link="http://quadrics.com/">QsNet</uri> or others.
148</p> 151</p>
149 152
150<p> 153<p>
151A cluster is composed of two node types: master and slave. Typically, your 154A cluster is composed of two node types: master and slave. Typically, your
152cluster will have one master node and several slave nodes. 155cluster will have one master node and several slave nodes.
153</p> 156</p>
154 157
155<p> 158<p>
156The master node is the cluster's server. It is responsible for telling the 159The master node is the cluster's server. It is responsible for telling the
157slave nodes what to do. This server will typically run such daemons as dhcpd, 160slave nodes what to do. This server will typically run such daemons as dhcpd,
158nfs, pbs-server, and pbs-sched. Your master node will allow interactive 161nfs, pbs-server, and pbs-sched. Your master node will allow interactive
159sessions for users, and accept job executions. 162sessions for users, and accept job executions.
160</p> 163</p>
161 164
162<p> 165<p>
163The slave nodes listen for instructions (via ssh/rsh perhaps) from the master 166The slave nodes listen for instructions (via ssh/rsh perhaps) from the master
164node. They should be dedicated to crunching results and therefore should not 167node. They should be dedicated to crunching results and therefore should not
165run any unecessary services. 168run any unnecessary services.
166</p>
167
168<p> 169</p>
170
171<p>
169The rest of this documentation will assume a cluster configuration as per the 172The rest of this documentation will assume a cluster configuration as per the
170hosts file below. You should maintain on every node such a hosts file 173hosts file below. You should maintain on every node such a hosts file
171(<path>/etc/hosts</path>) with entries for each node participating node in the 174(<path>/etc/hosts</path>) with entries for each node participating node in the
172cluster. 175cluster.
173</p> 176</p>
174 177
175<pre caption="/etc/hosts"> 178<pre caption="/etc/hosts">
176# Adelie Linux Research &amp; Development Center 179# Adelie Linux Research &amp; Development Center
177# /etc/hosts 180# /etc/hosts
178 181
179127.0.0.1 localhost 182127.0.0.1 localhost
180 183
181192.168.1.100 master.adelie master 184192.168.1.100 master.adelie master
182 185
183192.168.1.1 node01.adelie node01 186192.168.1.1 node01.adelie node01
184192.168.1.2 node02.adelie node02 187192.168.1.2 node02.adelie node02
185</pre> 188</pre>
186 189
187<p> 190<p>
188To setup your cluster dedicated LAN, edit your <path>/etc/conf.d/net</path> 191To setup your cluster dedicated LAN, edit your <path>/etc/conf.d/net</path>
189file on the master node. 192file on the master node.
190</p> 193</p>
191 194
192<pre caption="/etc/conf.d/net"> 195<pre caption="/etc/conf.d/net">
193# Copyright 1999-2002 Gentoo Technologies, Inc.
194# Distributed under the terms of the GNU General Public License, v2 or later
195
196# Global config file for net.* rc-scripts 196# Global config file for net.* rc-scripts
197 197
198# This is basically the ifconfig argument without the ifconfig $iface 198# This is basically the ifconfig argument without the ifconfig $iface
199# 199#
200 200
203iface_eth1="dhcp" 203iface_eth1="dhcp"
204</pre> 204</pre>
205 205
206 206
207<p> 207<p>
208Finally, setup a DHCP daemon on the master node to avoid having to maintain a 208Finally, setup a DHCP daemon on the master node to avoid having to maintain a
209network configuration on each slave node. 209network configuration on each slave node.
210</p> 210</p>
211 211
212<pre caption="/etc/dhcp/dhcpd.conf"> 212<pre caption="/etc/dhcp/dhcpd.conf">
213# Adelie Linux Research &amp; Development Center 213# Adelie Linux Research &amp; Development Center
221 option domain-name "adelie"; 221 option domain-name "adelie";
222 range 192.168.1.10 192.168.1.99; 222 range 192.168.1.10 192.168.1.99;
223 option routers 192.168.1.100; 223 option routers 192.168.1.100;
224 224
225 host node01.adelie { 225 host node01.adelie {
226 # MAC address of network card on node 01 226 # MAC address of network card on node 01
227 hardware ethernet 00:07:e9:0f:e2:d4; 227 hardware ethernet 00:07:e9:0f:e2:d4;
228 fixed-address 192.168.1.1; 228 fixed-address 192.168.1.1;
229 } 229 }
230 host node02.adelie { 230 host node02.adelie {
231 # MAC address of network card on node 02 231 # MAC address of network card on node 02
232 hardware ethernet 00:07:e9:0f:e2:6b; 232 hardware ethernet 00:07:e9:0f:e2:6b;
233 fixed-address 192.168.1.2; 233 fixed-address 192.168.1.2;
234 } 234 }
235} 235}
236</pre> 236</pre>
240<section> 240<section>
241<title>NFS/NIS</title> 241<title>NFS/NIS</title>
242<body> 242<body>
243 243
244<p> 244<p>
245The Network File System (NFS) was developed to allow machines to mount a disk 245The Network File System (NFS) was developed to allow machines to mount a disk
246partition on a remote machine as if it were on a local hard drive. This allows 246partition on a remote machine as if it were on a local hard drive. This allows
247for fast, seamless sharing of files across a network. 247for fast, seamless sharing of files across a network.
248</p> 248</p>
249 249
250<p> 250<p>
251There are other systems that provide similar functionality to NFS which could 251There are other systems that provide similar functionality to NFS which could
252be used in a cluster environment. The <uri 252be used in a cluster environment. The <uri
253link="http://www.openafs.org">Andrew File System 253link="http://www.openafs.org">Andrew File System
254from IBM</uri>, recently open-sourced, provides a file sharing mechanism with 254from IBM</uri>, recently open-sourced, provides a file sharing mechanism with
255some additional security and performance features. The <uri 255some additional security and performance features. The <uri
256link="http://www.coda.cs.cmu.edu/">Coda File System</uri> is still in 256link="http://www.coda.cs.cmu.edu/">Coda File System</uri> is still in
257development, but is designed to work well with disconnected clients. Many 257development, but is designed to work well with disconnected clients. Many
258of the features of the Andrew and Coda file systems are slated for inclusion 258of the features of the Andrew and Coda file systems are slated for inclusion
259in the next version of <uri link="http://www.nfsv4.org">NFS (Version 4)</uri>. 259in the next version of <uri link="http://www.nfsv4.org">NFS (Version 4)</uri>.
260The advantage of NFS today is that it is mature, standard, well understood, 260The advantage of NFS today is that it is mature, standard, well understood,
261and supported robustly across a variety of platforms. 261and supported robustly across a variety of platforms.
262</p> 262</p>
263 263
264<pre caption="Ebuilds for NFS-support"> 264<pre caption="Ebuilds for NFS-support">
265# <i>emerge -p nfs-utils portmap</i> 265# <i>emerge -a nfs-utils portmap</i>
266# <i>emerge nfs-utils portmap</i>
267</pre> 266</pre>
268 267
269<p> 268<p>
270Configure and install a kernel to support NFS v3 on all nodes: 269Configure and install a kernel to support NFS v3 on all nodes:
271</p> 270</p>
278CONFIG_NFSD_V3=y 277CONFIG_NFSD_V3=y
279CONFIG_LOCKD_V4=y 278CONFIG_LOCKD_V4=y
280</pre> 279</pre>
281 280
282<p> 281<p>
283On the master node, edit your <path>/etc/hosts.allow</path> file to allow 282On the master node, edit your <path>/etc/hosts.allow</path> file to allow
284connections from slave nodes. If your cluster LAN is on 192.168.1.0/24, 283connections from slave nodes. If your cluster LAN is on 192.168.1.0/24,
285your <path>hosts.allow</path> will look like: 284your <path>hosts.allow</path> will look like:
286</p> 285</p>
287 286
288<pre caption="hosts.allow"> 287<pre caption="hosts.allow">
289portmap:192.168.1.0/255.255.255.0 288portmap:192.168.1.0/255.255.255.0
290</pre> 289</pre>
291 290
292<p> 291<p>
293Edit the <path>/etc/exports</path> file of the master node to export a work 292Edit the <path>/etc/exports</path> file of the master node to export a work
294directory struture (/home is good for this). 293directory structure (/home is good for this).
295</p> 294</p>
296 295
297<pre caption="/etc/exports"> 296<pre caption="/etc/exports">
298/home/ *(rw) 297/home/ *(rw)
299</pre> 298</pre>
300 299
301<p> 300<p>
302Add nfs to your master node's default runlevel: 301Add nfs to your master node's default runlevel:
303</p> 302</p>
305<pre caption="Adding NFS to the default runlevel"> 304<pre caption="Adding NFS to the default runlevel">
306# <i>rc-update add nfs default</i> 305# <i>rc-update add nfs default</i>
307</pre> 306</pre>
308 307
309<p> 308<p>
310To mount the nfs exported filesystem from the master, you also have to 309To mount the nfs exported filesystem from the master, you also have to
311configure your salve nodes' <path>/etc/fstab</path>. Add a line like this 310configure your salve nodes' <path>/etc/fstab</path>. Add a line like this
312one: 311one:
313</p> 312</p>
314 313
315<pre caption="/etc/fstab"> 314<pre caption="/etc/fstab">
316master:/home/ /home nfs rw,exec,noauto,nouser,async 0 0 315master:/home/ /home nfs rw,exec,noauto,nouser,async 0 0
317</pre> 316</pre>
318 317
319<p> 318<p>
320You'll also need to set up your nodes so that they mount the nfs filesystem by 319You'll also need to set up your nodes so that they mount the nfs filesystem by
321issuing this command: 320issuing this command:
322</p> 321</p>
323 322
324<pre caption="Adding nfsmount to the default runlevel"> 323<pre caption="Adding nfsmount to the default runlevel">
325# <i>rc-update add nfsmount default</i> 324# <i>rc-update add nfsmount default</i>
330<section> 329<section>
331<title>RSH/SSH</title> 330<title>RSH/SSH</title>
332<body> 331<body>
333 332
334<p> 333<p>
335SSH is a protocol for secure remote login and other secure network services 334SSH is a protocol for secure remote login and other secure network services
336over an insecure network. OpenSSH uses public key cryptography to provide 335over an insecure network. OpenSSH uses public key cryptography to provide
337secure authorization. Generating the public key, which is shared with remote 336secure authorization. Generating the public key, which is shared with remote
338systems, and the private key which is kept on the local system, is done first 337systems, and the private key which is kept on the local system, is done first
339to configure OpenSSH on the cluster. 338to configure OpenSSH on the cluster.
340</p> 339</p>
341 340
342<p> 341<p>
343For transparent cluster usage, private/public keys may be used. This process 342For transparent cluster usage, private/public keys may be used. This process
344has two steps: 343has two steps:
345</p> 344</p>
346 345
347<ul> 346<ul>
348 <li>Generate public and private keys</li> 347 <li>Generate public and private keys</li>
349 <li>Copy public key to slave nodes</li> 348 <li>Copy public key to slave nodes</li>
350</ul> 349</ul>
351 350
352<p> 351<p>
353For user based authentification, generate and copy as follows: 352For user based authentication, generate and copy as follows:
354</p> 353</p>
355 354
356<pre caption="SSH key authentication"> 355<pre caption="SSH key authentication">
357# <i>ssh-keygen -t dsa</i> 356# <i>ssh-keygen -t dsa</i>
358Generating public/private dsa key pair. 357Generating public/private dsa key pair.
375root@master's password: 374root@master's password:
376id_dsa.pub 100% 234 2.0MB/s 00:00 375id_dsa.pub 100% 234 2.0MB/s 00:00
377</pre> 376</pre>
378 377
379<note> 378<note>
380Host keys must have an empty passphrase. RSA is required for host-based 379Host keys must have an empty passphrase. RSA is required for host-based
381authentification. 380authentication.
382</note> 381</note>
383 382
384<p> 383<p>
385For host based authentication, you will also need to edit your 384For host based authentication, you will also need to edit your
386<path>/etc/ssh/shosts.equiv</path>. 385<path>/etc/ssh/shosts.equiv</path>.
387</p> 386</p>
388 387
389<pre caption="/etc/ssh/shosts.equiv"> 388<pre caption="/etc/ssh/shosts.equiv">
390node01.adelie 389node01.adelie
398 397
399<pre caption="sshd configurations"> 398<pre caption="sshd configurations">
400# $OpenBSD: sshd_config,v 1.42 2001/09/20 20:57:51 mouring Exp $ 399# $OpenBSD: sshd_config,v 1.42 2001/09/20 20:57:51 mouring Exp $
401# This sshd was compiled with PATH=/usr/bin:/bin:/usr/sbin:/sbin 400# This sshd was compiled with PATH=/usr/bin:/bin:/usr/sbin:/sbin
402 401
403# This is the sshd server system-wide configuration file. See sshd(8) 402# This is the sshd server system-wide configuration file. See sshd(8)
404# for more information. 403# for more information.
405 404
406# HostKeys for protocol version 2 405# HostKeys for protocol version 2
407HostKey /etc/ssh/ssh_host_rsa_key 406HostKey /etc/ssh/ssh_host_rsa_key
408</pre> 407</pre>
409 408
410<p> 409<p>
411If your application require RSH communications, you will need to emerge 410If your application require RSH communications, you will need to emerge
412net-misc/netkit-rsh and sys-apps/xinetd. 411<c>net-misc/netkit-rsh</c> and <c>sys-apps/xinetd</c>.
413</p> 412</p>
414 413
415<pre caption="Installing necessary applicaitons"> 414<pre caption="Installing necessary applicaitons">
416# <i>emerge -p xinetd</i> 415# <i>emerge -a xinetd</i>
417# <i>emerge xinetd</i>
418# <i>emerge -p netkit-rsh</i> 416# <i>emerge -a netkit-rsh</i>
419# <i>emerge netkit-rsh</i>
420</pre> 417</pre>
421 418
422<p> 419<p>
423Then configure the rsh deamon. Edit your <path>/etc/xinet.d/rsh</path> file. 420Then configure the rsh deamon. Edit your <path>/etc/xinet.d/rsh</path> file.
424</p> 421</p>
425 422
426<pre caption="rsh"> 423<pre caption="rsh">
427# Adelie Linux Research &amp; Development Center 424# Adelie Linux Research &amp; Development Center
428# /etc/xinetd.d/rsh 425# /etc/xinetd.d/rsh
457Or you can simply trust your cluster LAN: 454Or you can simply trust your cluster LAN:
458</p> 455</p>
459 456
460<pre caption="hosts.allow"> 457<pre caption="hosts.allow">
461# Adelie Linux Research &amp; Development Center 458# Adelie Linux Research &amp; Development Center
462# /etc/hosts.allow 459# /etc/hosts.allow
463 460
464ALL:192.168.1.0/255.255.255.0 461ALL:192.168.1.0/255.255.255.0
465</pre> 462</pre>
466 463
467<p> 464<p>
468Finally, configure host authentification from <path>/etc/hosts.equiv</path>. 465Finally, configure host authentication from <path>/etc/hosts.equiv</path>.
469</p> 466</p>
470 467
471<pre caption="hosts.equiv"> 468<pre caption="hosts.equiv">
472# Adelie Linux Research &amp; Development Center 469# Adelie Linux Research &amp; Development Center
473# /etc/hosts.equiv 470# /etc/hosts.equiv
490<section> 487<section>
491<title>NTP</title> 488<title>NTP</title>
492<body> 489<body>
493 490
494<p> 491<p>
495The Network Time Protocol (NTP) is used to synchronize the time of a computer 492The Network Time Protocol (NTP) is used to synchronize the time of a computer
496client or server to another server or reference time source, such as a radio 493client or server to another server or reference time source, such as a radio
497or satellite receiver or modem. It provides accuracies typically within a 494or satellite receiver or modem. It provides accuracies typically within a
498millisecond on LANs and up to a few tens of milliseconds on WANs relative to 495millisecond on LANs and up to a few tens of milliseconds on WANs relative to
499Coordinated Universal Time (UTC) via a Global Positioning Service (GPS) 496Coordinated Universal Time (UTC) via a Global Positioning Service (GPS)
500receiver, for example. Typical NTP configurations utilize multiple redundant 497receiver, for example. Typical NTP configurations utilize multiple redundant
501servers and diverse network paths in order to achieve high accuracy and 498servers and diverse network paths in order to achieve high accuracy and
502reliability. 499reliability.
503</p> 500</p>
504 501
505<p> 502<p>
506Select a NTP server geographically close to you from <uri 503Select a NTP server geographically close to you from <uri
507link="http://www.eecis.udel.edu/~mills/ntp/servers.html">Public NTP Time 504link="http://www.eecis.udel.edu/~mills/ntp/servers.html">Public NTP Time
508Servers</uri>, and configure your <path>/etc/conf.d/ntp</path> and 505Servers</uri>, and configure your <path>/etc/conf.d/ntp</path> and
509<path>/etc/ntp.conf</path> files on the master node. 506<path>/etc/ntp.conf</path> files on the master node.
510</p> 507</p>
511 508
512<pre caption="Master /etc/conf.d/ntp"> 509<pre caption="Master /etc/conf.d/ntp">
513# Copyright 1999-2002 Gentoo Technologies, Inc.
514# Distributed under the terms of the GNU General Public License v2
515# /etc/conf.d/ntpd 510# /etc/conf.d/ntpd
516 511
517# NOTES: 512# NOTES:
518# - NTPDATE variables below are used if you wish to set your 513# - NTPDATE variables below are used if you wish to set your
519# clock when you start the ntp init.d script 514# clock when you start the ntp init.d script
534NTPDATE_CMD="ntpdate" 529NTPDATE_CMD="ntpdate"
535 530
536# Options to pass to the above command 531# Options to pass to the above command
537# Most people should just uncomment this variable and 532# Most people should just uncomment this variable and
538# change 'someserver' to a valid hostname which you 533# change 'someserver' to a valid hostname which you
539# can aquire from the URL's below 534# can acquire from the URL's below
540NTPDATE_OPTS="-b ntp1.cmc.ec.gc.ca" 535NTPDATE_OPTS="-b ntp1.cmc.ec.gc.ca"
541 536
542## 537##
543# A list of available servers is available here: 538# A list of available servers is available here:
544# http://www.eecis.udel.edu/~mills/ntp/servers.html 539# http://www.eecis.udel.edu/~mills/ntp/servers.html
552#NTPD_OPTS="" 547#NTPD_OPTS=""
553 548
554</pre> 549</pre>
555 550
556<p> 551<p>
557Edit your <path>/etc/ntp.conf</path> file on the master to setup an external 552Edit your <path>/etc/ntp.conf</path> file on the master to setup an external
558synchronization source: 553synchronization source:
559</p> 554</p>
560 555
561<pre caption="Master ntp.conf"> 556<pre caption="Master ntp.conf">
562# Adelie Linux Research &amp; Development Center 557# Adelie Linux Research &amp; Development Center
568# Synchronization source #2 563# Synchronization source #2
569server ntp2.cmc.ec.gc.ca 564server ntp2.cmc.ec.gc.ca
570restrict ntp2.cmc.ec.gc.ca 565restrict ntp2.cmc.ec.gc.ca
571stratum 10 566stratum 10
572driftfile /etc/ntp.drift.server 567driftfile /etc/ntp.drift.server
573logfile /var/log/ntp 568logfile /var/log/ntp
574broadcast 192.168.1.255 569broadcast 192.168.1.255
575restrict default kod 570restrict default kod
576restrict 127.0.0.1 571restrict 127.0.0.1
577restrict 192.168.1.0 mask 255.255.255.0 572restrict 192.168.1.0 mask 255.255.255.0
578</pre> 573</pre>
579 574
580<p> 575<p>
581And on all your slave nodes, setup your synchronization source as your master 576And on all your slave nodes, setup your synchronization source as your master
582node. 577node.
583</p> 578</p>
584 579
585<pre caption="Node /etc/conf.d/ntp"> 580<pre caption="Node /etc/conf.d/ntp">
586# Copyright 1999-2002 Gentoo Technologies, Inc.
587# Distributed under the terms of the GNU General Public License v2
588# /etc/conf.d/ntpd 581# /etc/conf.d/ntpd
589 582
590NTPDATE_WARN="n" 583NTPDATE_WARN="n"
591NTPDATE_CMD="ntpdate" 584NTPDATE_CMD="ntpdate"
592NTPDATE_OPTS="-b master" 585NTPDATE_OPTS="-b master"
599# Synchronization source #1 592# Synchronization source #1
600server master 593server master
601restrict master 594restrict master
602stratum 11 595stratum 11
603driftfile /etc/ntp.drift.server 596driftfile /etc/ntp.drift.server
604logfile /var/log/ntp 597logfile /var/log/ntp
605restrict default kod 598restrict default kod
606restrict 127.0.0.1 599restrict 127.0.0.1
607</pre> 600</pre>
608 601
609<p> 602<p>
613<pre caption="Adding ntpd to the default runlevel"> 606<pre caption="Adding ntpd to the default runlevel">
614# <i>rc-update add ntpd default</i> 607# <i>rc-update add ntpd default</i>
615</pre> 608</pre>
616 609
617<note> 610<note>
618NTP will not update the local clock if the time difference between your 611NTP will not update the local clock if the time difference between your
619synchronization source and the local clock is too great. 612synchronization source and the local clock is too great.
620</note> 613</note>
621 614
622</body> 615</body>
623</section> 616</section>
628<p> 621<p>
629To setup a firewall on your cluster, you will need iptables. 622To setup a firewall on your cluster, you will need iptables.
630</p> 623</p>
631 624
632<pre caption="Installing iptables"> 625<pre caption="Installing iptables">
633# <i>emerge -p iptables</i> 626# <i>emerge -a iptables</i>
634# <i>emerge iptables</i>
635</pre> 627</pre>
636 628
637<p> 629<p>
638Required kernel configuration: 630Required kernel configuration:
639</p> 631</p>
655And the rules required for this firewall: 647And the rules required for this firewall:
656</p> 648</p>
657 649
658<pre caption="rule-save"> 650<pre caption="rule-save">
659# Adelie Linux Research &amp; Development Center 651# Adelie Linux Research &amp; Development Center
660# /var/lib/iptbles/rule-save 652# /var/lib/iptables/rule-save
661 653
662*filter 654*filter
663:INPUT ACCEPT [0:0] 655:INPUT ACCEPT [0:0]
664:FORWARD ACCEPT [0:0] 656:FORWARD ACCEPT [0:0]
665:OUTPUT ACCEPT [0:0] 657:OUTPUT ACCEPT [0:0]
696<section> 688<section>
697<title>OpenPBS</title> 689<title>OpenPBS</title>
698<body> 690<body>
699 691
700<p> 692<p>
701The Portable Batch System (PBS) is a flexible batch queueing and workload 693The Portable Batch System (PBS) is a flexible batch queueing and workload
702management system originally developed for NASA. It operates on networked, 694management system originally developed for NASA. It operates on networked,
703multi-platform UNIX environments, including heterogeneous clusters of 695multi-platform UNIX environments, including heterogeneous clusters of
704workstations, supercomputers, and massively parallel systems. Development of 696workstations, supercomputers, and massively parallel systems. Development of
705PBS is provided by Altair Grid Technologies. 697PBS is provided by Altair Grid Technologies.
706</p> 698</p>
707 699
708<pre caption="Installing openpbs"> 700<pre caption="Installing openpbs">
709# <i>emerge -p openpbs</i> 701# <i>emerge -a openpbs</i>
710</pre> 702</pre>
711 703
712<note> 704<note>
713OpenPBS ebuild does not currently set proper permissions on var-directories 705OpenPBS ebuild does not currently set proper permissions on var-directories
714used by OpenPBS. 706used by OpenPBS.
715</note> 707</note>
716 708
717<p> 709<p>
718Before starting using OpenPBS, some configurations are required. The files 710Before starting using OpenPBS, some configurations are required. The files
719you will need to personalize for your system are: 711you will need to personalize for your system are:
720</p> 712</p>
721 713
722<ul> 714<ul>
723 <li>/etc/pbs_environment</li> 715 <li>/etc/pbs_environment</li>
724 <li>/var/spool/PBS/server_name</li> 716 <li>/var/spool/PBS/server_name</li>
725 <li>/var/spool/PBS/server_priv/nodes</li> 717 <li>/var/spool/PBS/server_priv/nodes</li>
726 <li>/var/spool/PBS/mom_priv/config</li> 718 <li>/var/spool/PBS/mom_priv/config</li>
727 <li>/var/spool/PBS/sched_priv/sched_config</li> 719 <li>/var/spool/PBS/sched_priv/sched_config</li>
728</ul> 720</ul>
729 721
730<p> 722<p>
731Here is a sample sched_config: 723Here is a sample sched_config:
732</p> 724</p>
767set server resources_default.nodes = 1 759set server resources_default.nodes = 1
768set server scheduler_iteration = 60 760set server scheduler_iteration = 60
769</pre> 761</pre>
770 762
771<p> 763<p>
772To submit a task to OpenPBS, the command <c>qsub</c> is used with some 764To submit a task to OpenPBS, the command <c>qsub</c> is used with some
773optional parameters. In the exemple below, "-l" allows you to specify 765optional parameters. In the example below, "-l" allows you to specify
774the resources required, "-j" provides for redirection of standard out and 766the resources required, "-j" provides for redirection of standard out and
775standard error, and the "-m" will e-mail the user at begining (b), end (e) 767standard error, and the "-m" will e-mail the user at beginning (b), end (e)
776and on abort (a) of the job. 768and on abort (a) of the job.
777</p> 769</p>
778 770
779<pre caption="Submitting a task"> 771<pre caption="Submitting a task">
780<comment>(submit and request from OpenPBS that myscript be executed on 2 nodes)</comment> 772<comment>(submit and request from OpenPBS that myscript be executed on 2 nodes)</comment>
781# <i>qsub -l nodes=2 -j oe -m abe myscript</i> 773# <i>qsub -l nodes=2 -j oe -m abe myscript</i>
782</pre> 774</pre>
783 775
784<p> 776<p>
785Normally jobs submitted to OpenPBS are in the form of scripts. Sometimes, you 777Normally jobs submitted to OpenPBS are in the form of scripts. Sometimes, you
786may want to try a task manually. To request an interactive shell from OpenPBS, 778may want to try a task manually. To request an interactive shell from OpenPBS,
787use the "-I" parameter. 779use the "-I" parameter.
788</p> 780</p>
789 781
790<pre caption="Requesting an interactive shell"> 782<pre caption="Requesting an interactive shell">
791# <i>qsub -I</i> 783# <i>qsub -I</i>
807<section> 799<section>
808<title>MPICH</title> 800<title>MPICH</title>
809<body> 801<body>
810 802
811<p> 803<p>
812Message passing is a paradigm used widely on certain classes of parallel 804Message passing is a paradigm used widely on certain classes of parallel
813machines, especially those with distributed memory. MPICH is a freely 805machines, especially those with distributed memory. MPICH is a freely
814available, portable implementation of MPI, the Standard for message-passing 806available, portable implementation of MPI, the Standard for message-passing
815libraries. 807libraries.
816</p> 808</p>
817 809
818<p> 810<p>
819The mpich ebuild provided by Adelie Linux allows for two USE flags: 811The mpich ebuild provided by Adelie Linux allows for two USE flags:
820<e>doc</e> and <e>crypt</e>. <e>doc</e> will cause documentation to be 812<e>doc</e> and <e>crypt</e>. <e>doc</e> will cause documentation to be
821installed, while <e>crypt</e> will configure MPICH to use <c>ssh</c> instead 813installed, while <e>crypt</e> will configure MPICH to use <c>ssh</c> instead
822of <c>rsh</c>. 814of <c>rsh</c>.
823</p> 815</p>
824 816
825<pre caption="Installing the mpich application"> 817<pre caption="Installing the mpich application">
826# <i>emerge -p mpich</i> 818# <i>emerge -a mpich</i>
827# <i>emerge mpich</i>
828</pre> 819</pre>
829 820
830<p> 821<p>
831You may need to export a mpich work directory to all your slave nodes in 822You may need to export a mpich work directory to all your slave nodes in
832<path>/etc/exports</path>: 823<path>/etc/exports</path>:
833</p> 824</p>
834 825
835<pre caption="/etc/exports"> 826<pre caption="/etc/exports">
836/home *(rw) 827/home *(rw)
837</pre> 828</pre>
838 829
839<p> 830<p>
840Most massively parallel processors (MPPs) provide a way to start a program on 831Most massively parallel processors (MPPs) provide a way to start a program on
841a requested number of processors; <c>mpirun</c> makes use of the appropriate 832a requested number of processors; <c>mpirun</c> makes use of the appropriate
842command whenever possible. In contrast, workstation clusters require that each 833command whenever possible. In contrast, workstation clusters require that each
843process in a parallel job be started individually, though programs to help 834process in a parallel job be started individually, though programs to help
844start these processes exist. Because workstation clusters are not already 835start these processes exist. Because workstation clusters are not already
845organized as an MPP, additional information is required to make use of them. 836organized as an MPP, additional information is required to make use of them.
846Mpich should be installed with a list of participating workstations in the 837Mpich should be installed with a list of participating workstations in the
847file <path>machines.LINUX</path> in the directory 838file <path>machines.LINUX</path> in the directory
848<path>/usr/share/mpich/</path>. This file is used by <c>mpirun</c> to choose 839<path>/usr/share/mpich/</path>. This file is used by <c>mpirun</c> to choose
849processors to run on. 840processors to run on.
850</p> 841</p>
851 842
852<p> 843<p>
853Edit this file to reflect your cluster-lan configuration: 844Edit this file to reflect your cluster-lan configuration:
854</p> 845</p>
855 846
856<pre caption="/usr/share/mpich/machines.LINUX"> 847<pre caption="/usr/share/mpich/machines.LINUX">
857# Change this file to contain the machines that you want to use 848# Change this file to contain the machines that you want to use
858# to run MPI jobs on. The format is one host name per line, with either 849# to run MPI jobs on. The format is one host name per line, with either
859# hostname 850# hostname
860# or 851# or
861# hostname:n 852# hostname:n
862# where n is the number of processors in an SMP. The hostname should 853# where n is the number of processors in an SMP. The hostname should
863# be the same as the result from the command "hostname" 854# be the same as the result from the command "hostname"
864master 855master
865node01 856node01
866node02 857node02
867# node03 858# node03
868# node04 859# node04
869# ... 860# ...
870</pre> 861</pre>
871 862
872<p> 863<p>
873Use the script <c>tstmachines</c> in <path>/usr/sbin/</path> to ensure that 864Use the script <c>tstmachines</c> in <path>/usr/sbin/</path> to ensure that
874you can use all of the machines that you have listed. This script performs 865you can use all of the machines that you have listed. This script performs
875an <c>rsh</c> and a short directory listing; this tests that you both have 866an <c>rsh</c> and a short directory listing; this tests that you both have
876access to the node and that a program in the current directory is visible on 867access to the node and that a program in the current directory is visible on
877the remote node. If there are any problems, they will be listed. These 868the remote node. If there are any problems, they will be listed. These
878problems must be fixed before proceeding. 869problems must be fixed before proceeding.
879</p> 870</p>
880 871
881<p> 872<p>
882The only argument to <c>tstmachines</c> is the name of the architecture; this 873The only argument to <c>tstmachines</c> is the name of the architecture; this
883is the same name as the extension on the machines file. For example, the 874is the same name as the extension on the machines file. For example, the
884following tests that a program in the current directory can be executed by 875following tests that a program in the current directory can be executed by
885all of the machines in the LINUX machines list. 876all of the machines in the LINUX machines list.
886</p> 877</p>
887 878
888<pre caption="Running a test"> 879<pre caption="Running a test">
889# <i>/usr/local/mpich/sbin/tstmachines LINUX</i> 880# <i>/usr/local/mpich/sbin/tstmachines LINUX</i>
890</pre> 881</pre>
891 882
892<note> 883<note>
893This program is silent if all is well; if you want to see what it is doing, 884This program is silent if all is well; if you want to see what it is doing,
894use the -v (for verbose) argument: 885use the -v (for verbose) argument:
895</note> 886</note>
896 887
897<pre caption="Running a test verbosively"> 888<pre caption="Running a test verbosively">
898# <i>/usr/local/mpich/sbin/tstmachines -v LINUX</i> 889# <i>/usr/local/mpich/sbin/tstmachines -v LINUX</i>
910Trying user program on host1.uoffoo.edu ... 901Trying user program on host1.uoffoo.edu ...
911Trying user program on host2.uoffoo.edu ... 902Trying user program on host2.uoffoo.edu ...
912</pre> 903</pre>
913 904
914<p> 905<p>
915If <c>tstmachines</c> finds a problem, it will suggest possible reasons and 906If <c>tstmachines</c> finds a problem, it will suggest possible reasons and
916solutions. In brief, there are three tests: 907solutions. In brief, there are three tests:
917</p> 908</p>
918 909
919<ul> 910<ul>
920 <li> 911 <li>
921 <e>Can processes be started on remote machines?</e> tstmachines attempts 912 <e>Can processes be started on remote machines?</e> tstmachines attempts
922 to run the shell command true on each machine in the machines files by 913 to run the shell command true on each machine in the machines files by
923 using the remote shell command. 914 using the remote shell command.
924 </li> 915 </li>
925 <li> 916 <li>
926 <e>Is current working directory available to all machines?</e> This 917 <e>Is current working directory available to all machines?</e> This
927 attempts to ls a file that tstmachines creates by running ls using the 918 attempts to ls a file that tstmachines creates by running ls using the
928 remote shell command. 919 remote shell command.
929 </li> 920 </li>
930 <li> 921 <li>
931 <e>Can user programs be run on remote systems?</e> This checks that shared 922 <e>Can user programs be run on remote systems?</e> This checks that shared
932 libraries and other components have been properly installed on all 923 libraries and other components have been properly installed on all
933 machines. 924 machines.
934 </li> 925 </li>
935</ul> 926</ul>
936 927
937<p> 928<p>
944# <i>make hello++</i> 935# <i>make hello++</i>
945# <i>mpirun -machinefile /usr/share/mpich/machines.LINUX -np 1 hello++</i> 936# <i>mpirun -machinefile /usr/share/mpich/machines.LINUX -np 1 hello++</i>
946</pre> 937</pre>
947 938
948<p> 939<p>
949For further information on MPICH, consult the documentation at <uri 940For further information on MPICH, consult the documentation at <uri
950link="http://www-unix.mcs.anl.gov/mpi/mpich/docs/mpichman-chp4/mpichman-chp4.htm">http://www-unix.mcs.anl.gov/mpi/mpich/docs/mpichman-chp4/mpichman-chp4.htm</uri>. 941link="http://www-unix.mcs.anl.gov/mpi/mpich/docs/mpichman-chp4/mpichman-chp4.htm">http://www-unix.mcs.anl.gov/mpi/mpich/docs/mpichman-chp4/mpichman-chp4.htm</uri>.
951</p> 942</p>
952 943
953</body> 944</body>
954</section> 945</section>
978<title>Bibliography</title> 969<title>Bibliography</title>
979<section> 970<section>
980<body> 971<body>
981 972
982<p> 973<p>
983The original document is published at the <uri 974The original document is published at the <uri
984link="http://www.adelielinux.com">Adelie Linux R&amp;D Centre</uri> web site, 975link="http://www.adelielinux.com">Adelie Linux R&amp;D Centre</uri> web site,
985and is reproduced here with the permission of the authors and <uri 976and is reproduced here with the permission of the authors and <uri
986link="http://www.cyberlogic.ca">Cyberlogic</uri>'s Adelie Linux R&amp;D 977link="http://www.cyberlogic.ca">Cyberlogic</uri>'s Adelie Linux R&amp;D
987Centre. 978Centre.
988</p> 979</p>
989 980
990<ul> 981<ul>
991 <li><uri>http://www.gentoo.org</uri>, Gentoo Technologies, Inc.</li> 982 <li><uri>http://www.gentoo.org</uri>, Gentoo Foundation, Inc.</li>
992 <li> 983 <li>
993 <uri link="http://www.adelielinux.com">http://www.adelielinux.com</uri>, 984 <uri link="http://www.adelielinux.com">http://www.adelielinux.com</uri>,
994 Adelie Linux Research and Development Centre 985 Adelie Linux Research and Development Centre
995 </li> 986 </li>
996 <li> 987 <li>
997 <uri link="http://nfs.sourceforge.net/">http://nfs.sourceforge.net</uri>, 988 <uri link="http://nfs.sourceforge.net/">http://nfs.sourceforge.net</uri>,
998 Linux NFS Project 989 Linux NFS Project
999 </li> 990 </li>
1000 <li> 991 <li>
1001 <uri link="http://www-unix.mcs.anl.gov/mpi/mpich/">http://www-unix.mcs.anl.gov/mpi/mpich/</uri>, 992 <uri link="http://www-unix.mcs.anl.gov/mpi/mpich/">http://www-unix.mcs.anl.gov/mpi/mpich/</uri>,
1002 Mathematics and Computer Science Division, Argonne National Laboratory 993 Mathematics and Computer Science Division, Argonne National Laboratory
1003 </li> 994 </li>
1004 <li> 995 <li>
1005 <uri link="http://www.ntp.org/">http://ntp.org</uri> 996 <uri link="http://www.ntp.org/">http://ntp.org</uri>
1006 </li> 997 </li>
1007 <li> 998 <li>
1008 <uri link="http://www.eecis.udel.edu/~mills/">http://www.eecis.udel.edu/~mills/</uri>, 999 <uri link="http://www.eecis.udel.edu/~mills/">http://www.eecis.udel.edu/~mills/</uri>,
1009 David L. Mills, University of Delaware 1000 David L. Mills, University of Delaware
1010 </li> 1001 </li>
1011 <li> 1002 <li>
1012 <uri link="http://www.ietf.org/html.charters/secsh-charter.html">http://www.ietf.org/html.charters/secsh-charter.html</uri>, 1003 <uri link="http://www.ietf.org/html.charters/secsh-charter.html">http://www.ietf.org/html.charters/secsh-charter.html</uri>,
1013 Secure Shell Working Group, IETF, Internet Society 1004 Secure Shell Working Group, IETF, Internet Society
1014 </li> 1005 </li>
1015 <li> 1006 <li>
1016 <uri link="http://www.linuxsecurity.com/">http://www.linuxsecurity.com/</uri>, 1007 <uri link="http://www.linuxsecurity.com/">http://www.linuxsecurity.com/</uri>,
1017 Guardian Digital 1008 Guardian Digital
1018 </li> 1009 </li>
1019 <li> 1010 <li>
1020 <uri link="http://www.openpbs.org/">http://www.openpbs.org/</uri>, 1011 <uri link="http://www.openpbs.org/">http://www.openpbs.org/</uri>,
1021 Altair Grid Technologies, LLC. 1012 Altair Grid Technologies, LLC.
1022 </li> 1013 </li>
1023</ul> 1014</ul>
1024 1015
1025</body> 1016</body>

Legend:
Removed from v.1.5  
changed lines
  Added in v.1.15

  ViewVC Help
Powered by ViewVC 1.1.20