/[gentoo]/xml/htdocs/doc/en/hpc-howto.xml
Gentoo

Diff of /xml/htdocs/doc/en/hpc-howto.xml

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

Revision 1.2 Revision 1.15
1<?xml version='1.0' encoding="UTF-8"?> 1<?xml version='1.0' encoding="UTF-8"?>
2
3<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/hpc-howto.xml,v 1.2 2005/01/03 13:52:07 neysx Exp $ --> 2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/hpc-howto.xml,v 1.15 2010/06/07 09:08:37 nightmorph Exp $ -->
4
5<!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> 3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd">
6<guide link="/doc/en/hpc-howto.xml">
7 4
5<guide>
8<title>High Performance Computing on Gentoo Linux</title> 6<title>High Performance Computing on Gentoo Linux</title>
9 7
10<author title="Author"> 8<author title="Author">
11 <mail link="marc@adelielinux.com">Marc St-Pierre</mail> 9 <mail link="marc@adelielinux.com">Marc St-Pierre</mail>
12</author> 10</author>
18</author> 16</author>
19<author title="Assistant/Research"> 17<author title="Assistant/Research">
20 <mail link="olivier@adelielinux.com">Olivier Crete</mail> 18 <mail link="olivier@adelielinux.com">Olivier Crete</mail>
21</author> 19</author>
22<author title="Reviewer"> 20<author title="Reviewer">
23 <mail link="spyderous@gentoo.org">Donnie Berkholz</mail> 21 <mail link="dberkholz@gentoo.org">Donnie Berkholz</mail>
22</author>
23<author title="Editor">
24 <mail link="nightmorph"/>
24</author> 25</author>
25 26
26<!-- No licensing information; this document has been written by a third-party 27<!-- No licensing information; this document has been written by a third-party
27 organisation without additional licensing information. 28 organisation without additional licensing information.
28 29
29 In other words, this is copyright adelielinux R&D; Gentoo only has 30 In other words, this is copyright adelielinux R&D; Gentoo only has
30 permission to distribute this document as-is and update it when appropriate 31 permission to distribute this document as-is and update it when appropriate
31 as long as the adelie linux R&D notice stays 32 as long as the adelie linux R&D notice stays
32--> 33-->
33 34
34<abstract> 35<abstract>
35This document was written by people at the Adelie Linux R&amp;D Center 36This document was written by people at the Adelie Linux R&amp;D Center
36&lt;http://www.adelielinux.com&gt; as a step-by-step guide to turn a Gentoo 37&lt;http://www.adelielinux.com&gt; as a step-by-step guide to turn a Gentoo
37System into an High Performance Computing (HPC) system. 38System into a High Performance Computing (HPC) system.
38</abstract> 39</abstract>
39 40
40<version>1.0</version> 41<version>1.7</version>
41<date>2003-08-01</date> 42<date>2010-06-07</date>
42 43
43<chapter> 44<chapter>
44<title>Introduction</title> 45<title>Introduction</title>
45<section> 46<section>
46<body> 47<body>
47 48
48<p> 49<p>
49Gentoo Linux, a special flavor of Linux that can be automatically optimized 50Gentoo Linux, a special flavor of Linux that can be automatically optimized
50and customized for just about any application or need. Extreme performance, 51and customized for just about any application or need. Extreme performance,
51configurability and a top-notch user and developer community are all hallmarks 52configurability and a top-notch user and developer community are all hallmarks
52of the Gentoo experience. 53of the Gentoo experience.
53</p> 54</p>
54 55
55<p> 56<p>
56Thanks to a technology called Portage, Gentoo Linux can become an ideal secure 57Thanks to a technology called Portage, Gentoo Linux can become an ideal secure
57server, development workstation, professional desktop, gaming system, embedded 58server, development workstation, professional desktop, gaming system, embedded
58solution or... a High Performance Computing system. Because of its 59solution or... a High Performance Computing system. Because of its
59near-unlimited adaptability, we call Gentoo Linux a metadistribution. 60near-unlimited adaptability, we call Gentoo Linux a metadistribution.
60</p> 61</p>
61 62
62<p> 63<p>
63This document explains how to turn a Gentoo system into a High Performance 64This document explains how to turn a Gentoo system into a High Performance
64Computing system. Step by step, it explains what packages one may want to 65Computing system. Step by step, it explains what packages one may want to
65install and helps configure them. 66install and helps configure them.
66</p> 67</p>
67 68
68<p> 69<p>
69Obtain Gentoo Linux from the website <uri>http://www.gentoo.org</uri>, and 70Obtain Gentoo Linux from the website <uri>http://www.gentoo.org</uri>, and
85We refer to the <uri link="/doc/en/handbook/">Gentoo Linux Handbooks</uri> in 86We refer to the <uri link="/doc/en/handbook/">Gentoo Linux Handbooks</uri> in
86this section. 87this section.
87</note> 88</note>
88 89
89<p> 90<p>
90During the installation process, you will have to set your USE variables in 91During the installation process, you will have to set your USE variables in
91<path>/etc/make.conf</path>. We recommended that you deactivate all the 92<path>/etc/make.conf</path>. We recommended that you deactivate all the
92defaults (see <path>/etc/make.profile/make.defaults</path>) by negating them 93defaults (see <path>/etc/make.profile/make.defaults</path>) by negating them in
93in make.conf. However, you may want to keep such use variables as x86, 3dnow, 94make.conf. However, you may want to keep such use variables as 3dnow, gpm,
94gpm, mmx, sse, ncurses, pam and tcpd. Refer to the USE documentation for more 95mmx, nptl, nptlonly, sse, ncurses, pam and tcpd. Refer to the USE documentation
95information. 96for more information.
96</p> 97</p>
97 98
98<pre caption="USE Flags"> 99<pre caption="USE Flags">
99# Copyright 2000-2003 Daniel Robbins, Gentoo Technologies, Inc.
100# Contains local system settings for Portage system
101
102# Please review 'man make.conf' for more information.
103
104USE="-oss 3dnow -apm -arts -avi -berkdb -crypt -cups -encode -gdbm 100USE="-oss 3dnow -apm -avi -berkdb -crypt -cups -encode -gdbm -gif gpm -gtk
105-gif gpm -gtk -imlib -java -jpeg -kde -gnome -libg++ -libwww -mikmod 101-imlib -java -jpeg -kde -gnome -libg++ -libwww -mikmod mmx -motif -mpeg ncurses
106mmx -motif -mpeg ncurses -nls -oggvorbis -opengl pam -pdflib -png 102-nls nptl nptlonly -ogg -opengl pam -pdflib -png -python -qt4 -qtmt
107-python -qt -qtmt -quicktime -readline -sdl -slang -spell -ssl 103-quicktime -readline -sdl -slang -spell -ssl -svga tcpd -truetype -vorbis -X
108-svga tcpd -truetype -X -xml2 -xmms -xv -zlib" 104-xml2 -xv -zlib"
109</pre> 105</pre>
110 106
111<p> 107<p>
112Or simply: 108Or simply:
113</p> 109</p>
114 110
115<pre caption="USE Flags - simplified version"> 111<pre caption="USE Flags - simplified version">
116# Copyright 2000-2003 Daniel Robbins, Gentoo Technologies, Inc.
117# Contains local system settings for Portage system
118
119# Please review 'man make.conf' for more information.
120
121USE="-* 3dnow gpm mmx ncurses pam sse tcpd" 112USE="-* 3dnow gpm mmx ncurses pam sse tcpd"
122</pre> 113</pre>
123 114
124<note> 115<note>
125The <e>tcpd</e> USE flag increases security for packages such as xinetd. 116The <e>tcpd</e> USE flag increases security for packages such as xinetd.
126</note> 117</note>
127 118
128<p> 119<p>
129In step 15 ("Installing the kernel and a System Logger") for stability 120In step 15 ("Installing the kernel and a System Logger") for stability
130reasons, we recommend the vanilla-sources, the official kernel sources 121reasons, we recommend the vanilla-sources, the official kernel sources
131released on <uri>http://www.kernel.org/</uri>, unless you require special 122released on <uri>http://www.kernel.org/</uri>, unless you require special
132support such as xfs. 123support such as xfs.
133</p> 124</p>
134 125
135<pre caption="Installing vanilla-sources"> 126<pre caption="Installing vanilla-sources">
136# <i>emerge -p syslog-ng vanilla-sources</i> 127# <i>emerge -a syslog-ng vanilla-sources</i>
137</pre> 128</pre>
138 129
139<p> 130<p>
140When you install miscellaneous packages, we recommend installing the 131When you install miscellaneous packages, we recommend installing the
141following: 132following:
142</p> 133</p>
143 134
144<pre caption="Installing necessary packages"> 135<pre caption="Installing necessary packages">
145# <i>emerge -p nfs-utils portmap tcpdump ssmtp iptables xinetd</i> 136# <i>emerge -a nfs-utils portmap tcpdump ssmtp iptables xinetd</i>
146</pre> 137</pre>
147 138
148</body> 139</body>
149</section> 140</section>
150<section> 141<section>
151<title>Communication Layer (TCP/IP Network)</title> 142<title>Communication Layer (TCP/IP Network)</title>
152<body> 143<body>
153 144
154<p> 145<p>
155A cluster requires a communication layer to interconnect the slave nodes to 146A cluster requires a communication layer to interconnect the slave nodes to
156the master node. Typically, a FastEthernet or GigaEthernet LAN can be used 147the master node. Typically, a FastEthernet or GigaEthernet LAN can be used
157since they have a good price/performance ratio. Other possibilities include 148since they have a good price/performance ratio. Other possibilities include
158use of products like <uri link="http://www.myricom.com/">Myrinet</uri>, <uri 149use of products like <uri link="http://www.myricom.com/">Myrinet</uri>, <uri
159link="http://quadrics.com/">QsNet</uri> or others. 150link="http://quadrics.com/">QsNet</uri> or others.
160</p> 151</p>
161 152
162<p> 153<p>
163A cluster is composed of two node types: master and slave. Typically, your 154A cluster is composed of two node types: master and slave. Typically, your
164cluster will have one master node and several slave nodes. 155cluster will have one master node and several slave nodes.
165</p> 156</p>
166 157
167<p> 158<p>
168The master node is the cluster's server. It is responsible for telling the 159The master node is the cluster's server. It is responsible for telling the
169slave nodes what to do. This server will typically run such daemons as dhcpd, 160slave nodes what to do. This server will typically run such daemons as dhcpd,
170nfs, pbs-server, and pbs-sched. Your master node will allow interactive 161nfs, pbs-server, and pbs-sched. Your master node will allow interactive
171sessions for users, and accept job executions. 162sessions for users, and accept job executions.
172</p> 163</p>
173 164
174<p> 165<p>
175The slave nodes listen for instructions (via ssh/rsh perhaps) from the master 166The slave nodes listen for instructions (via ssh/rsh perhaps) from the master
176node. They should be dedicated to crunching results and therefore should not 167node. They should be dedicated to crunching results and therefore should not
177run any unecessary services. 168run any unnecessary services.
178</p>
179
180<p> 169</p>
170
171<p>
181The rest of this documentation will assume a cluster configuration as per the 172The rest of this documentation will assume a cluster configuration as per the
182hosts file below. You should maintain on every node such a hosts file 173hosts file below. You should maintain on every node such a hosts file
183(<path>/etc/hosts</path>) with entries for each node participating node in the 174(<path>/etc/hosts</path>) with entries for each node participating node in the
184cluster. 175cluster.
185</p> 176</p>
186 177
187<pre caption="/etc/hosts"> 178<pre caption="/etc/hosts">
188# Adelie Linux Research &amp; Development Center 179# Adelie Linux Research &amp; Development Center
189# /etc/hosts 180# /etc/hosts
190 181
191127.0.0.1 localhost 182127.0.0.1 localhost
192 183
193192.168.1.100 master.adelie master 184192.168.1.100 master.adelie master
194 185
195192.168.1.1 node01.adelie node01 186192.168.1.1 node01.adelie node01
196192.168.1.2 node02.adelie node02 187192.168.1.2 node02.adelie node02
197</pre> 188</pre>
198 189
199<p> 190<p>
200To setup your cluster dedicated LAN, edit your <path>/etc/conf.d/net</path> 191To setup your cluster dedicated LAN, edit your <path>/etc/conf.d/net</path>
201file on the master node. 192file on the master node.
202</p> 193</p>
203 194
204<pre caption="/etc/conf.d/net"> 195<pre caption="/etc/conf.d/net">
205# Copyright 1999-2002 Gentoo Technologies, Inc.
206# Distributed under the terms of the GNU General Public License, v2 or later
207
208# Global config file for net.* rc-scripts 196# Global config file for net.* rc-scripts
209 197
210# This is basically the ifconfig argument without the ifconfig $iface 198# This is basically the ifconfig argument without the ifconfig $iface
211# 199#
212 200
215iface_eth1="dhcp" 203iface_eth1="dhcp"
216</pre> 204</pre>
217 205
218 206
219<p> 207<p>
220Finally, setup a DHCP daemon on the master node to avoid having to maintain a 208Finally, setup a DHCP daemon on the master node to avoid having to maintain a
221network configuration on each slave node. 209network configuration on each slave node.
222</p> 210</p>
223 211
224<pre caption="/etc/dhcp/dhcpd.conf"> 212<pre caption="/etc/dhcp/dhcpd.conf">
225# Adelie Linux Research &amp; Development Center 213# Adelie Linux Research &amp; Development Center
233 option domain-name "adelie"; 221 option domain-name "adelie";
234 range 192.168.1.10 192.168.1.99; 222 range 192.168.1.10 192.168.1.99;
235 option routers 192.168.1.100; 223 option routers 192.168.1.100;
236 224
237 host node01.adelie { 225 host node01.adelie {
238 # MAC address of network card on node 01 226 # MAC address of network card on node 01
239 hardware ethernet 00:07:e9:0f:e2:d4; 227 hardware ethernet 00:07:e9:0f:e2:d4;
240 fixed-address 192.168.1.1; 228 fixed-address 192.168.1.1;
241 } 229 }
242 host node02.adelie { 230 host node02.adelie {
243 # MAC address of network card on node 02 231 # MAC address of network card on node 02
244 hardware ethernet 00:07:e9:0f:e2:6b; 232 hardware ethernet 00:07:e9:0f:e2:6b;
245 fixed-address 192.168.1.2; 233 fixed-address 192.168.1.2;
246 } 234 }
247} 235}
248</pre> 236</pre>
252<section> 240<section>
253<title>NFS/NIS</title> 241<title>NFS/NIS</title>
254<body> 242<body>
255 243
256<p> 244<p>
257The Network File System (NFS) was developed to allow machines to mount a disk 245The Network File System (NFS) was developed to allow machines to mount a disk
258partition on a remote machine as if it were on a local hard drive. This allows 246partition on a remote machine as if it were on a local hard drive. This allows
259for fast, seamless sharing of files across a network. 247for fast, seamless sharing of files across a network.
260</p> 248</p>
261 249
262<p> 250<p>
263There are other systems that provide similar functionality to NFS which could 251There are other systems that provide similar functionality to NFS which could
264be used in a cluster environment. The <uri 252be used in a cluster environment. The <uri
265link="http://www.transarc.com/Product/EFS/AFS/index.html">Andrew File System 253link="http://www.openafs.org">Andrew File System
266from IBM</uri>, recently open-sourced, provides a file sharing mechanism with 254from IBM</uri>, recently open-sourced, provides a file sharing mechanism with
267some additional security and performance features. The <uri 255some additional security and performance features. The <uri
268link="http://www.coda.cs.cmu.edu/">Coda File System</uri> is still in 256link="http://www.coda.cs.cmu.edu/">Coda File System</uri> is still in
269development, but is designed to work well with disconnected clients. Many 257development, but is designed to work well with disconnected clients. Many
270of the features of the Andrew and Coda file systems are slated for inclusion 258of the features of the Andrew and Coda file systems are slated for inclusion
271in the next version of <uri link="http://www.nfsv4.org">NFS (Version 4)</uri>. 259in the next version of <uri link="http://www.nfsv4.org">NFS (Version 4)</uri>.
272The advantage of NFS today is that it is mature, standard, well understood, 260The advantage of NFS today is that it is mature, standard, well understood,
273and supported robustly across a variety of platforms. 261and supported robustly across a variety of platforms.
274</p> 262</p>
275 263
276<pre caption="Ebuilds for NFS-support"> 264<pre caption="Ebuilds for NFS-support">
277# <i>emerge -p nfs-utils portmap</i> 265# <i>emerge -a nfs-utils portmap</i>
278# <i>emerge nfs-utils portmap</i>
279</pre> 266</pre>
280 267
281<p> 268<p>
282Configure and install a kernel to support NFS v3 on all nodes: 269Configure and install a kernel to support NFS v3 on all nodes:
283</p> 270</p>
290CONFIG_NFSD_V3=y 277CONFIG_NFSD_V3=y
291CONFIG_LOCKD_V4=y 278CONFIG_LOCKD_V4=y
292</pre> 279</pre>
293 280
294<p> 281<p>
295On the master node, edit your <path>/etc/hosts.allow</path> file to allow 282On the master node, edit your <path>/etc/hosts.allow</path> file to allow
296connections from slave nodes. If your cluster LAN is on 192.168.1.0/24, 283connections from slave nodes. If your cluster LAN is on 192.168.1.0/24,
297your <path>hosts.allow</path> will look like: 284your <path>hosts.allow</path> will look like:
298</p> 285</p>
299 286
300<pre caption="hosts.allow"> 287<pre caption="hosts.allow">
301portmap:192.168.1.0/255.255.255.0 288portmap:192.168.1.0/255.255.255.0
302</pre> 289</pre>
303 290
304<p> 291<p>
305Edit the <path>/etc/exports</path> file of the master node to export a work 292Edit the <path>/etc/exports</path> file of the master node to export a work
306directory struture (/home is good for this). 293directory structure (/home is good for this).
307</p> 294</p>
308 295
309<pre caption="/etc/exports"> 296<pre caption="/etc/exports">
310/home/ *(rw) 297/home/ *(rw)
311</pre> 298</pre>
312 299
313<p> 300<p>
314Add nfs to your master node's default runlevel: 301Add nfs to your master node's default runlevel:
315</p> 302</p>
317<pre caption="Adding NFS to the default runlevel"> 304<pre caption="Adding NFS to the default runlevel">
318# <i>rc-update add nfs default</i> 305# <i>rc-update add nfs default</i>
319</pre> 306</pre>
320 307
321<p> 308<p>
322To mount the nfs exported filesystem from the master, you also have to 309To mount the nfs exported filesystem from the master, you also have to
323configure your salve nodes' <path>/etc/fstab</path>. Add a line like this 310configure your salve nodes' <path>/etc/fstab</path>. Add a line like this
324one: 311one:
325</p> 312</p>
326 313
327<pre caption="/etc/fstab"> 314<pre caption="/etc/fstab">
328master:/home/ /home nfs rw,exec,noauto,nouser,async 0 0 315master:/home/ /home nfs rw,exec,noauto,nouser,async 0 0
329</pre> 316</pre>
330 317
331<p> 318<p>
332You'll also need to set up your nodes so that they mount the nfs filesystem by 319You'll also need to set up your nodes so that they mount the nfs filesystem by
333issuing this command: 320issuing this command:
334</p> 321</p>
335 322
336<pre caption="Adding nfsmount to the default runlevel"> 323<pre caption="Adding nfsmount to the default runlevel">
337# <i>rc-update add nfsmount default</i> 324# <i>rc-update add nfsmount default</i>
342<section> 329<section>
343<title>RSH/SSH</title> 330<title>RSH/SSH</title>
344<body> 331<body>
345 332
346<p> 333<p>
347SSH is a protocol for secure remote login and other secure network services 334SSH is a protocol for secure remote login and other secure network services
348over an insecure network. OpenSSH uses public key cryptography to provide 335over an insecure network. OpenSSH uses public key cryptography to provide
349secure authorization. Generating the public key, which is shared with remote 336secure authorization. Generating the public key, which is shared with remote
350systems, and the private key which is kept on the local system, is done first 337systems, and the private key which is kept on the local system, is done first
351to configure OpenSSH on the cluster. 338to configure OpenSSH on the cluster.
352</p> 339</p>
353 340
354<p> 341<p>
355For transparent cluster usage, private/public keys may be used. This process 342For transparent cluster usage, private/public keys may be used. This process
356has two steps: 343has two steps:
357</p> 344</p>
358 345
359<ul> 346<ul>
360 <li>Generate public and private keys</li> 347 <li>Generate public and private keys</li>
361 <li>Copy public key to slave nodes</li> 348 <li>Copy public key to slave nodes</li>
362</ul> 349</ul>
363 350
364<p> 351<p>
365For user based authentification, general and copy as follows: 352For user based authentication, generate and copy as follows:
366</p> 353</p>
367 354
368<pre caption="SSH key authentication"> 355<pre caption="SSH key authentication">
369# <i>ssh-keygen -t dsa</i> 356# <i>ssh-keygen -t dsa</i>
370Generating public/private dsa key pair. 357Generating public/private dsa key pair.
387root@master's password: 374root@master's password:
388id_dsa.pub 100% 234 2.0MB/s 00:00 375id_dsa.pub 100% 234 2.0MB/s 00:00
389</pre> 376</pre>
390 377
391<note> 378<note>
392Host keys must have an empty passphrase. RSA is required for host-based 379Host keys must have an empty passphrase. RSA is required for host-based
393authentification. 380authentication.
394</note> 381</note>
395 382
396<p> 383<p>
397For host based authentication, you will also need to edit your 384For host based authentication, you will also need to edit your
398<path>/etc/ssh/shosts.equiv</path>. 385<path>/etc/ssh/shosts.equiv</path>.
399</p> 386</p>
400 387
401<pre caption="/etc/ssh/shosts.equiv"> 388<pre caption="/etc/ssh/shosts.equiv">
402node01.adelie 389node01.adelie
410 397
411<pre caption="sshd configurations"> 398<pre caption="sshd configurations">
412# $OpenBSD: sshd_config,v 1.42 2001/09/20 20:57:51 mouring Exp $ 399# $OpenBSD: sshd_config,v 1.42 2001/09/20 20:57:51 mouring Exp $
413# This sshd was compiled with PATH=/usr/bin:/bin:/usr/sbin:/sbin 400# This sshd was compiled with PATH=/usr/bin:/bin:/usr/sbin:/sbin
414 401
415# This is the sshd server system-wide configuration file. See sshd(8) 402# This is the sshd server system-wide configuration file. See sshd(8)
416# for more information. 403# for more information.
417 404
418# HostKeys for protocol version 2 405# HostKeys for protocol version 2
419HostKey /etc/ssh/ssh_host_rsa_key 406HostKey /etc/ssh/ssh_host_rsa_key
420</pre> 407</pre>
421 408
422<p> 409<p>
423If your application require RSH communications, you will need to emerge 410If your application require RSH communications, you will need to emerge
424net-misc/netkit-rsh and sys-apps/xinetd. 411<c>net-misc/netkit-rsh</c> and <c>sys-apps/xinetd</c>.
425</p> 412</p>
426 413
427<pre caption="Installing necessary applicaitons"> 414<pre caption="Installing necessary applicaitons">
428# <i>emerge -p xinetd</i> 415# <i>emerge -a xinetd</i>
429# <i>emerge xinetd</i>
430# <i>emerge -p netkit-rsh</i> 416# <i>emerge -a netkit-rsh</i>
431# <i>emerge netkit-rsh</i>
432</pre> 417</pre>
433 418
434<p> 419<p>
435Then configure the rsh deamon. Edit your <path>/etc/xinet.d/rsh</path> file. 420Then configure the rsh deamon. Edit your <path>/etc/xinet.d/rsh</path> file.
436</p> 421</p>
437 422
438<pre caption="rsh"> 423<pre caption="rsh">
439# Adelie Linux Research &amp; Development Center 424# Adelie Linux Research &amp; Development Center
440# /etc/xinetd.d/rsh 425# /etc/xinetd.d/rsh
469Or you can simply trust your cluster LAN: 454Or you can simply trust your cluster LAN:
470</p> 455</p>
471 456
472<pre caption="hosts.allow"> 457<pre caption="hosts.allow">
473# Adelie Linux Research &amp; Development Center 458# Adelie Linux Research &amp; Development Center
474# /etc/hosts.allow 459# /etc/hosts.allow
475 460
476ALL:192.168.1.0/255.255.255.0 461ALL:192.168.1.0/255.255.255.0
477</pre> 462</pre>
478 463
479<p> 464<p>
480Finally, configure host authentification from <path>/etc/hosts.equiv</path>. 465Finally, configure host authentication from <path>/etc/hosts.equiv</path>.
481</p> 466</p>
482 467
483<pre caption="hosts.equiv"> 468<pre caption="hosts.equiv">
484# Adelie Linux Research &amp; Development Center 469# Adelie Linux Research &amp; Development Center
485# /etc/hosts.equiv 470# /etc/hosts.equiv
502<section> 487<section>
503<title>NTP</title> 488<title>NTP</title>
504<body> 489<body>
505 490
506<p> 491<p>
507The Network Time Protocol (NTP) is used to synchronize the time of a computer 492The Network Time Protocol (NTP) is used to synchronize the time of a computer
508client or server to another server or reference time source, such as a radio 493client or server to another server or reference time source, such as a radio
509or satellite receiver or modem. It provides accuracies typically within a 494or satellite receiver or modem. It provides accuracies typically within a
510millisecond on LANs and up to a few tens of milliseconds on WANs relative to 495millisecond on LANs and up to a few tens of milliseconds on WANs relative to
511Coordinated Universal Time (UTC) via a Global Positioning Service (GPS) 496Coordinated Universal Time (UTC) via a Global Positioning Service (GPS)
512receiver, for example. Typical NTP configurations utilize multiple redundant 497receiver, for example. Typical NTP configurations utilize multiple redundant
513servers and diverse network paths in order to achieve high accuracy and 498servers and diverse network paths in order to achieve high accuracy and
514reliability. 499reliability.
515</p> 500</p>
516 501
517<p> 502<p>
518Select a NTP server geographically close to you from <uri 503Select a NTP server geographically close to you from <uri
519link="http://www.eecis.udel.edu/~mills/ntp/servers.html">Public NTP Time 504link="http://www.eecis.udel.edu/~mills/ntp/servers.html">Public NTP Time
520Servers</uri>, and configure your <path>/etc/conf.d/ntp</path> and 505Servers</uri>, and configure your <path>/etc/conf.d/ntp</path> and
521<path>/etc/ntp.conf</path> files on the master node. 506<path>/etc/ntp.conf</path> files on the master node.
522</p> 507</p>
523 508
524<pre caption="Master /etc/conf.d/ntp"> 509<pre caption="Master /etc/conf.d/ntp">
525# Copyright 1999-2002 Gentoo Technologies, Inc.
526# Distributed under the terms of the GNU General Public License v2
527# /etc/conf.d/ntpd 510# /etc/conf.d/ntpd
528 511
529# NOTES: 512# NOTES:
530# - NTPDATE variables below are used if you wish to set your 513# - NTPDATE variables below are used if you wish to set your
531# clock when you start the ntp init.d script 514# clock when you start the ntp init.d script
546NTPDATE_CMD="ntpdate" 529NTPDATE_CMD="ntpdate"
547 530
548# Options to pass to the above command 531# Options to pass to the above command
549# Most people should just uncomment this variable and 532# Most people should just uncomment this variable and
550# change 'someserver' to a valid hostname which you 533# change 'someserver' to a valid hostname which you
551# can aquire from the URL's below 534# can acquire from the URL's below
552NTPDATE_OPTS="-b ntp1.cmc.ec.gc.ca" 535NTPDATE_OPTS="-b ntp1.cmc.ec.gc.ca"
553 536
554## 537##
555# A list of available servers is available here: 538# A list of available servers is available here:
556# http://www.eecis.udel.edu/~mills/ntp/servers.html 539# http://www.eecis.udel.edu/~mills/ntp/servers.html
564#NTPD_OPTS="" 547#NTPD_OPTS=""
565 548
566</pre> 549</pre>
567 550
568<p> 551<p>
569Edit your <path>/etc/ntp.conf</path> file on the master to setup an external 552Edit your <path>/etc/ntp.conf</path> file on the master to setup an external
570synchronization source: 553synchronization source:
571</p> 554</p>
572 555
573<pre caption="Master ntp.conf"> 556<pre caption="Master ntp.conf">
574# Adelie Linux Research &amp; Development Center 557# Adelie Linux Research &amp; Development Center
580# Synchronization source #2 563# Synchronization source #2
581server ntp2.cmc.ec.gc.ca 564server ntp2.cmc.ec.gc.ca
582restrict ntp2.cmc.ec.gc.ca 565restrict ntp2.cmc.ec.gc.ca
583stratum 10 566stratum 10
584driftfile /etc/ntp.drift.server 567driftfile /etc/ntp.drift.server
585logfile /var/log/ntp 568logfile /var/log/ntp
586broadcast 192.168.1.255 569broadcast 192.168.1.255
587restrict default kod 570restrict default kod
588restrict 127.0.0.1 571restrict 127.0.0.1
589restrict 192.168.1.0 mask 255.255.255.0 572restrict 192.168.1.0 mask 255.255.255.0
590</pre> 573</pre>
591 574
592<p> 575<p>
593And on all your slave nodes, setup your synchronization source as your master 576And on all your slave nodes, setup your synchronization source as your master
594node. 577node.
595</p> 578</p>
596 579
597<pre caption="Node /etc/conf.d/ntp"> 580<pre caption="Node /etc/conf.d/ntp">
598# Copyright 1999-2002 Gentoo Technologies, Inc.
599# Distributed under the terms of the GNU General Public License v2
600# /etc/conf.d/ntpd 581# /etc/conf.d/ntpd
601 582
602NTPDATE_WARN="n" 583NTPDATE_WARN="n"
603NTPDATE_CMD="ntpdate" 584NTPDATE_CMD="ntpdate"
604NTPDATE_OPTS="-b master" 585NTPDATE_OPTS="-b master"
611# Synchronization source #1 592# Synchronization source #1
612server master 593server master
613restrict master 594restrict master
614stratum 11 595stratum 11
615driftfile /etc/ntp.drift.server 596driftfile /etc/ntp.drift.server
616logfile /var/log/ntp 597logfile /var/log/ntp
617restrict default kod 598restrict default kod
618restrict 127.0.0.1 599restrict 127.0.0.1
619</pre> 600</pre>
620 601
621<p> 602<p>
625<pre caption="Adding ntpd to the default runlevel"> 606<pre caption="Adding ntpd to the default runlevel">
626# <i>rc-update add ntpd default</i> 607# <i>rc-update add ntpd default</i>
627</pre> 608</pre>
628 609
629<note> 610<note>
630NTP will not update the local clock if the time difference between your 611NTP will not update the local clock if the time difference between your
631synchronization source and the local clock is too great. 612synchronization source and the local clock is too great.
632</note> 613</note>
633 614
634</body> 615</body>
635</section> 616</section>
640<p> 621<p>
641To setup a firewall on your cluster, you will need iptables. 622To setup a firewall on your cluster, you will need iptables.
642</p> 623</p>
643 624
644<pre caption="Installing iptables"> 625<pre caption="Installing iptables">
645# <i>emerge -p iptables</i> 626# <i>emerge -a iptables</i>
646# <i>emerge iptables</i>
647</pre> 627</pre>
648 628
649<p> 629<p>
650Required kernel configuration: 630Required kernel configuration:
651</p> 631</p>
667And the rules required for this firewall: 647And the rules required for this firewall:
668</p> 648</p>
669 649
670<pre caption="rule-save"> 650<pre caption="rule-save">
671# Adelie Linux Research &amp; Development Center 651# Adelie Linux Research &amp; Development Center
672# /var/lib/iptbles/rule-save 652# /var/lib/iptables/rule-save
673 653
674*filter 654*filter
675:INPUT ACCEPT [0:0] 655:INPUT ACCEPT [0:0]
676:FORWARD ACCEPT [0:0] 656:FORWARD ACCEPT [0:0]
677:OUTPUT ACCEPT [0:0] 657:OUTPUT ACCEPT [0:0]
708<section> 688<section>
709<title>OpenPBS</title> 689<title>OpenPBS</title>
710<body> 690<body>
711 691
712<p> 692<p>
713The Portable Batch System (PBS) is a flexible batch queueing and workload 693The Portable Batch System (PBS) is a flexible batch queueing and workload
714management system originally developed for NASA. It operates on networked, 694management system originally developed for NASA. It operates on networked,
715multi-platform UNIX environments, including heterogeneous clusters of 695multi-platform UNIX environments, including heterogeneous clusters of
716workstations, supercomputers, and massively parallel systems. Development of 696workstations, supercomputers, and massively parallel systems. Development of
717PBS is provided by Altair Grid Technologies. 697PBS is provided by Altair Grid Technologies.
718</p> 698</p>
719 699
720<pre caption="Installing openpbs"> 700<pre caption="Installing openpbs">
721# <i>emerge -p openpbs</i> 701# <i>emerge -a openpbs</i>
722</pre> 702</pre>
723 703
724<note> 704<note>
725OpenPBS ebuild does not currently set proper permissions on var-directories 705OpenPBS ebuild does not currently set proper permissions on var-directories
726used by OpenPBS. 706used by OpenPBS.
727</note> 707</note>
728 708
729<p> 709<p>
730Before starting using OpenPBS, some configurations are required. The files 710Before starting using OpenPBS, some configurations are required. The files
731you will need to personalize for your system are: 711you will need to personalize for your system are:
732</p> 712</p>
733 713
734<ul> 714<ul>
735 <li>/etc/pbs_environment</li> 715 <li>/etc/pbs_environment</li>
736 <li>/var/spool/PBS/server_name</li> 716 <li>/var/spool/PBS/server_name</li>
737 <li>/var/spool/PBS/server_priv/nodes</li> 717 <li>/var/spool/PBS/server_priv/nodes</li>
738 <li>/var/spool/PBS/mom_priv/config</li> 718 <li>/var/spool/PBS/mom_priv/config</li>
739 <li>/var/spool/PBS/sched_priv/sched_config</li> 719 <li>/var/spool/PBS/sched_priv/sched_config</li>
740</ul> 720</ul>
741 721
742<p> 722<p>
743Here is a sample sched_config: 723Here is a sample sched_config:
744</p> 724</p>
779set server resources_default.nodes = 1 759set server resources_default.nodes = 1
780set server scheduler_iteration = 60 760set server scheduler_iteration = 60
781</pre> 761</pre>
782 762
783<p> 763<p>
784To submit a task to OpenPBS, the command <c>qsub</c> is used with some 764To submit a task to OpenPBS, the command <c>qsub</c> is used with some
785optional parameters. In the exemple below, "-l" allows you to specify 765optional parameters. In the example below, "-l" allows you to specify
786the resources required, "-j" provides for redirection of standard out and 766the resources required, "-j" provides for redirection of standard out and
787standard error, and the "-m" will e-mail the user at begining (b), end (e) 767standard error, and the "-m" will e-mail the user at beginning (b), end (e)
788and on abort (a) of the job. 768and on abort (a) of the job.
789</p> 769</p>
790 770
791<pre caption="Submitting a task"> 771<pre caption="Submitting a task">
792<comment>(submit and request from OpenPBS that myscript be executed on 2 nodes)</comment> 772<comment>(submit and request from OpenPBS that myscript be executed on 2 nodes)</comment>
793# <i>qsub -l nodes=2 -j oe -m abe myscript</i> 773# <i>qsub -l nodes=2 -j oe -m abe myscript</i>
794</pre> 774</pre>
795 775
796<p> 776<p>
797Normally jobs submitted to OpenPBS are in the form of scripts. Sometimes, you 777Normally jobs submitted to OpenPBS are in the form of scripts. Sometimes, you
798may want to try a task manually. To request an interactive shell from OpenPBS, 778may want to try a task manually. To request an interactive shell from OpenPBS,
799use the "-I" parameter. 779use the "-I" parameter.
800</p> 780</p>
801 781
802<pre caption="Requesting an interactive shell"> 782<pre caption="Requesting an interactive shell">
803# <i>qsub -I</i> 783# <i>qsub -I</i>
819<section> 799<section>
820<title>MPICH</title> 800<title>MPICH</title>
821<body> 801<body>
822 802
823<p> 803<p>
824Message passing is a paradigm used widely on certain classes of parallel 804Message passing is a paradigm used widely on certain classes of parallel
825machines, especially those with distributed memory. MPICH is a freely 805machines, especially those with distributed memory. MPICH is a freely
826available, portable implementation of MPI, the Standard for message-passing 806available, portable implementation of MPI, the Standard for message-passing
827libraries. 807libraries.
828</p> 808</p>
829 809
830<p> 810<p>
831The mpich ebuild provided by Adelie Linux allows for two USE flags: 811The mpich ebuild provided by Adelie Linux allows for two USE flags:
832<e>doc</e> and <e>crypt</e>. <e>doc</e> will cause documentation to be 812<e>doc</e> and <e>crypt</e>. <e>doc</e> will cause documentation to be
833installed, while <e>crypt</e> will configure MPICH to use <c>ssh</c> instead 813installed, while <e>crypt</e> will configure MPICH to use <c>ssh</c> instead
834of <c>rsh</c>. 814of <c>rsh</c>.
835</p> 815</p>
836 816
837<pre caption="Installing the mpich application"> 817<pre caption="Installing the mpich application">
838# <i>emerge -p mpich</i> 818# <i>emerge -a mpich</i>
839# <i>emerge mpich</i>
840</pre> 819</pre>
841 820
842<p> 821<p>
843You may need to export a mpich work directory to all your slave nodes in 822You may need to export a mpich work directory to all your slave nodes in
844<path>/etc/exports</path>: 823<path>/etc/exports</path>:
845</p> 824</p>
846 825
847<pre caption="/etc/exports"> 826<pre caption="/etc/exports">
848/home *(rw) 827/home *(rw)
849</pre> 828</pre>
850 829
851<p> 830<p>
852Most massively parallel processors (MPPs) provide a way to start a program on 831Most massively parallel processors (MPPs) provide a way to start a program on
853a requested number of processors; <c>mpirun</c> makes use of the appropriate 832a requested number of processors; <c>mpirun</c> makes use of the appropriate
854command whenever possible. In contrast, workstation clusters require that each 833command whenever possible. In contrast, workstation clusters require that each
855process in a parallel job be started individually, though programs to help 834process in a parallel job be started individually, though programs to help
856start these processes exist. Because workstation clusters are not already 835start these processes exist. Because workstation clusters are not already
857organized as an MPP, additional information is required to make use of them. 836organized as an MPP, additional information is required to make use of them.
858Mpich should be installed with a list of participating workstations in the 837Mpich should be installed with a list of participating workstations in the
859file <path>machines.LINUX</path> in the directory 838file <path>machines.LINUX</path> in the directory
860<path>/usr/share/mpich/</path>. This file is used by <c>mpirun</c> to choose 839<path>/usr/share/mpich/</path>. This file is used by <c>mpirun</c> to choose
861processors to run on. 840processors to run on.
862</p> 841</p>
863 842
864<p> 843<p>
865Edit this file to reflect your cluster-lan configuration: 844Edit this file to reflect your cluster-lan configuration:
866</p> 845</p>
867 846
868<pre caption="/usr/share/mpich/machines.LINUX"> 847<pre caption="/usr/share/mpich/machines.LINUX">
869# Change this file to contain the machines that you want to use 848# Change this file to contain the machines that you want to use
870# to run MPI jobs on. The format is one host name per line, with either 849# to run MPI jobs on. The format is one host name per line, with either
871# hostname 850# hostname
872# or 851# or
873# hostname:n 852# hostname:n
874# where n is the number of processors in an SMP. The hostname should 853# where n is the number of processors in an SMP. The hostname should
875# be the same as the result from the command "hostname" 854# be the same as the result from the command "hostname"
876master 855master
877node01 856node01
878node02 857node02
879# node03 858# node03
880# node04 859# node04
881# ... 860# ...
882</pre> 861</pre>
883 862
884<p> 863<p>
885Use the script <c>tstmachines</c> in <path>/usr/sbin/</path> to ensure that 864Use the script <c>tstmachines</c> in <path>/usr/sbin/</path> to ensure that
886you can use all of the machines that you have listed. This script performs 865you can use all of the machines that you have listed. This script performs
887an <c>rsh</c> and a short directory listing; this tests that you both have 866an <c>rsh</c> and a short directory listing; this tests that you both have
888access to the node and that a program in the current directory is visible on 867access to the node and that a program in the current directory is visible on
889the remote node. If there are any problems, they will be listed. These 868the remote node. If there are any problems, they will be listed. These
890problems must be fixed before proceeding. 869problems must be fixed before proceeding.
891</p> 870</p>
892 871
893<p> 872<p>
894The only argument to <c>tstmachines</c> is the name of the architecture; this 873The only argument to <c>tstmachines</c> is the name of the architecture; this
895is the same name as the extension on the machines file. For example, the 874is the same name as the extension on the machines file. For example, the
896following tests that a program in the current directory can be executed by 875following tests that a program in the current directory can be executed by
897all of the machines in the LINUX machines list. 876all of the machines in the LINUX machines list.
898</p> 877</p>
899 878
900<pre caption="Running a test"> 879<pre caption="Running a test">
901# <i>/usr/local/mpich/sbin/tstmachines LINUX</i> 880# <i>/usr/local/mpich/sbin/tstmachines LINUX</i>
902</pre> 881</pre>
903 882
904<note> 883<note>
905This program is silent if all is well; if you want to see what it is doing, 884This program is silent if all is well; if you want to see what it is doing,
906use the -v (for verbose) argument: 885use the -v (for verbose) argument:
907</note> 886</note>
908 887
909<pre caption="Running a test verbosively"> 888<pre caption="Running a test verbosively">
910# <i>/usr/local/mpich/sbin/tstmachines -v LINUX</i> 889# <i>/usr/local/mpich/sbin/tstmachines -v LINUX</i>
922Trying user program on host1.uoffoo.edu ... 901Trying user program on host1.uoffoo.edu ...
923Trying user program on host2.uoffoo.edu ... 902Trying user program on host2.uoffoo.edu ...
924</pre> 903</pre>
925 904
926<p> 905<p>
927If <c>tstmachines</c> finds a problem, it will suggest possible reasons and 906If <c>tstmachines</c> finds a problem, it will suggest possible reasons and
928solutions. In brief, there are three tests: 907solutions. In brief, there are three tests:
929</p> 908</p>
930 909
931<ul> 910<ul>
932 <li> 911 <li>
933 <e>Can processes be started on remote machines?</e> tstmachines attempts 912 <e>Can processes be started on remote machines?</e> tstmachines attempts
934 to run the shell command true on each machine in the machines files by 913 to run the shell command true on each machine in the machines files by
935 using the remote shell command. 914 using the remote shell command.
936 </li> 915 </li>
937 <li> 916 <li>
938 <e>Is current working directory available to all machines?</e> This 917 <e>Is current working directory available to all machines?</e> This
939 attempts to ls a file that tstmachines creates by running ls using the 918 attempts to ls a file that tstmachines creates by running ls using the
940 remote shell command. 919 remote shell command.
941 </li> 920 </li>
942 <li> 921 <li>
943 <e>Can user programs be run on remote systems?</e> This checks that shared 922 <e>Can user programs be run on remote systems?</e> This checks that shared
944 libraries and other components have been properly installed on all 923 libraries and other components have been properly installed on all
945 machines. 924 machines.
946 </li> 925 </li>
947</ul> 926</ul>
948 927
949<p> 928<p>
956# <i>make hello++</i> 935# <i>make hello++</i>
957# <i>mpirun -machinefile /usr/share/mpich/machines.LINUX -np 1 hello++</i> 936# <i>mpirun -machinefile /usr/share/mpich/machines.LINUX -np 1 hello++</i>
958</pre> 937</pre>
959 938
960<p> 939<p>
961For further information on MPICH, consult the documentation at <uri 940For further information on MPICH, consult the documentation at <uri
962link="http://www-unix.mcs.anl.gov/mpi/mpich/docs/mpichman-chp4/mpichman-chp4.htm">http://www-unix.mcs.anl.gov/mpi/mpich/docs/mpichman-chp4/mpichman-chp4.htm</uri>. 941link="http://www-unix.mcs.anl.gov/mpi/mpich/docs/mpichman-chp4/mpichman-chp4.htm">http://www-unix.mcs.anl.gov/mpi/mpich/docs/mpichman-chp4/mpichman-chp4.htm</uri>.
963</p> 942</p>
964 943
965</body> 944</body>
966</section> 945</section>
990<title>Bibliography</title> 969<title>Bibliography</title>
991<section> 970<section>
992<body> 971<body>
993 972
994<p> 973<p>
995The original document is published at the <uri 974The original document is published at the <uri
996link="http://www.adelielinux.com">Adelie Linux R&amp;D Centre</uri> web site, 975link="http://www.adelielinux.com">Adelie Linux R&amp;D Centre</uri> web site,
997and is reproduced here with the permission of the authors and <uri 976and is reproduced here with the permission of the authors and <uri
998link="http://www.cyberlogic.ca">Cyberlogic</uri>'s Adelie Linux R&amp;D 977link="http://www.cyberlogic.ca">Cyberlogic</uri>'s Adelie Linux R&amp;D
999Centre. 978Centre.
1000</p> 979</p>
1001 980
1002<ul> 981<ul>
1003 <li><uri>http://www.gentoo.org</uri>, Gentoo Technologies, Inc.</li> 982 <li><uri>http://www.gentoo.org</uri>, Gentoo Foundation, Inc.</li>
1004 <li> 983 <li>
1005 <uri link="http://www.adelielinux.com">http://www.adelielinux.com</uri>, 984 <uri link="http://www.adelielinux.com">http://www.adelielinux.com</uri>,
1006 Adelie Linux Research and Development Centre 985 Adelie Linux Research and Development Centre
1007 </li> 986 </li>
1008 <li> 987 <li>
1009 <uri link="http://nfs.sourceforge.net/">http://nfs.sourceforge.net</uri>, 988 <uri link="http://nfs.sourceforge.net/">http://nfs.sourceforge.net</uri>,
1010 Linux NFS Project 989 Linux NFS Project
1011 </li> 990 </li>
1012 <li> 991 <li>
1013 <uri link="http://www-unix.mcs.anl.gov/mpi/mpich/">http://www-unix.mcs.anl.gov/mpi/mpich/</uri>, 992 <uri link="http://www-unix.mcs.anl.gov/mpi/mpich/">http://www-unix.mcs.anl.gov/mpi/mpich/</uri>,
1014 Mathematics and Computer Science Division, Argonne National Laboratory 993 Mathematics and Computer Science Division, Argonne National Laboratory
1015 </li> 994 </li>
1016 <li> 995 <li>
1017 <uri link="http://www.ntp.org/">http://ntp.org</uri> 996 <uri link="http://www.ntp.org/">http://ntp.org</uri>
1018 </li> 997 </li>
1019 <li> 998 <li>
1020 <uri link="http://www.eecis.udel.edu/~mills/">http://www.eecis.udel.edu/~mills/</uri>, 999 <uri link="http://www.eecis.udel.edu/~mills/">http://www.eecis.udel.edu/~mills/</uri>,
1021 David L. Mills, University of Delaware 1000 David L. Mills, University of Delaware
1022 </li> 1001 </li>
1023 <li> 1002 <li>
1024 <uri link="http://www.ietf.org/html.charters/secsh-charter.html">http://www.ietf.org/html.charters/secsh-charter.html</uri>, 1003 <uri link="http://www.ietf.org/html.charters/secsh-charter.html">http://www.ietf.org/html.charters/secsh-charter.html</uri>,
1025 Secure Shell Working Group, IETF, Internet Society 1004 Secure Shell Working Group, IETF, Internet Society
1026 </li> 1005 </li>
1027 <li> 1006 <li>
1028 <uri link="http://www.linuxsecurity.com/">http://www.linuxsecurity.com/</uri>, 1007 <uri link="http://www.linuxsecurity.com/">http://www.linuxsecurity.com/</uri>,
1029 Guardian Digital 1008 Guardian Digital
1030 </li> 1009 </li>
1031 <li> 1010 <li>
1032 <uri link="http://www.openpbs.org/">http://www.openpbs.org/</uri>, 1011 <uri link="http://www.openpbs.org/">http://www.openpbs.org/</uri>,
1033 Altair Grid Technologies, LLC. 1012 Altair Grid Technologies, LLC.
1034 </li> 1013 </li>
1035</ul> 1014</ul>
1036 1015
1037</body> 1016</body>

Legend:
Removed from v.1.2  
changed lines
  Added in v.1.15

  ViewVC Help
Powered by ViewVC 1.1.20