/[gentoo]/xml/htdocs/doc/en/hpc-howto.xml
Gentoo

Diff of /xml/htdocs/doc/en/hpc-howto.xml

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

Revision 1.13 Revision 1.15
1<?xml version='1.0' encoding="UTF-8"?> 1<?xml version='1.0' encoding="UTF-8"?>
2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/hpc-howto.xml,v 1.13 2006/12/18 21:47:19 nightmorph Exp $ --> 2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/hpc-howto.xml,v 1.15 2010/06/07 09:08:37 nightmorph Exp $ -->
3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> 3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd">
4 4
5<guide link="/doc/en/hpc-howto.xml"> 5<guide>
6<title>High Performance Computing on Gentoo Linux</title> 6<title>High Performance Computing on Gentoo Linux</title>
7 7
8<author title="Author"> 8<author title="Author">
9 <mail link="marc@adelielinux.com">Marc St-Pierre</mail> 9 <mail link="marc@adelielinux.com">Marc St-Pierre</mail>
10</author> 10</author>
18 <mail link="olivier@adelielinux.com">Olivier Crete</mail> 18 <mail link="olivier@adelielinux.com">Olivier Crete</mail>
19</author> 19</author>
20<author title="Reviewer"> 20<author title="Reviewer">
21 <mail link="dberkholz@gentoo.org">Donnie Berkholz</mail> 21 <mail link="dberkholz@gentoo.org">Donnie Berkholz</mail>
22</author> 22</author>
23<author title="Editor">
24 <mail link="nightmorph"/>
25</author>
23 26
24<!-- No licensing information; this document has been written by a third-party 27<!-- No licensing information; this document has been written by a third-party
25 organisation without additional licensing information. 28 organisation without additional licensing information.
26 29
27 In other words, this is copyright adelielinux R&D; Gentoo only has 30 In other words, this is copyright adelielinux R&D; Gentoo only has
28 permission to distribute this document as-is and update it when appropriate 31 permission to distribute this document as-is and update it when appropriate
29 as long as the adelie linux R&D notice stays 32 as long as the adelie linux R&D notice stays
30--> 33-->
31 34
32<abstract> 35<abstract>
33This document was written by people at the Adelie Linux R&amp;D Center 36This document was written by people at the Adelie Linux R&amp;D Center
34&lt;http://www.adelielinux.com&gt; as a step-by-step guide to turn a Gentoo 37&lt;http://www.adelielinux.com&gt; as a step-by-step guide to turn a Gentoo
35System into a High Performance Computing (HPC) system. 38System into a High Performance Computing (HPC) system.
36</abstract> 39</abstract>
37 40
38<version>1.6</version> 41<version>1.7</version>
39<date>2006-12-18</date> 42<date>2010-06-07</date>
40 43
41<chapter> 44<chapter>
42<title>Introduction</title> 45<title>Introduction</title>
43<section> 46<section>
44<body> 47<body>
45 48
46<p> 49<p>
47Gentoo Linux, a special flavor of Linux that can be automatically optimized 50Gentoo Linux, a special flavor of Linux that can be automatically optimized
48and customized for just about any application or need. Extreme performance, 51and customized for just about any application or need. Extreme performance,
49configurability and a top-notch user and developer community are all hallmarks 52configurability and a top-notch user and developer community are all hallmarks
50of the Gentoo experience. 53of the Gentoo experience.
51</p> 54</p>
52 55
53<p> 56<p>
54Thanks to a technology called Portage, Gentoo Linux can become an ideal secure 57Thanks to a technology called Portage, Gentoo Linux can become an ideal secure
55server, development workstation, professional desktop, gaming system, embedded 58server, development workstation, professional desktop, gaming system, embedded
56solution or... a High Performance Computing system. Because of its 59solution or... a High Performance Computing system. Because of its
57near-unlimited adaptability, we call Gentoo Linux a metadistribution. 60near-unlimited adaptability, we call Gentoo Linux a metadistribution.
58</p> 61</p>
59 62
60<p> 63<p>
61This document explains how to turn a Gentoo system into a High Performance 64This document explains how to turn a Gentoo system into a High Performance
62Computing system. Step by step, it explains what packages one may want to 65Computing system. Step by step, it explains what packages one may want to
63install and helps configure them. 66install and helps configure them.
64</p> 67</p>
65 68
66<p> 69<p>
67Obtain Gentoo Linux from the website <uri>http://www.gentoo.org</uri>, and 70Obtain Gentoo Linux from the website <uri>http://www.gentoo.org</uri>, and
84this section. 87this section.
85</note> 88</note>
86 89
87<p> 90<p>
88During the installation process, you will have to set your USE variables in 91During the installation process, you will have to set your USE variables in
89<path>/etc/make.conf</path>. We recommended that you deactivate all the 92<path>/etc/make.conf</path>. We recommended that you deactivate all the
90defaults (see <path>/etc/make.profile/make.defaults</path>) by negating them in 93defaults (see <path>/etc/make.profile/make.defaults</path>) by negating them in
91make.conf. However, you may want to keep such use variables as x86, 3dnow, gpm, 94make.conf. However, you may want to keep such use variables as 3dnow, gpm,
92mmx, nptl, nptlonly, sse, ncurses, pam and tcpd. Refer to the USE documentation 95mmx, nptl, nptlonly, sse, ncurses, pam and tcpd. Refer to the USE documentation
93for more information. 96for more information.
94</p> 97</p>
95 98
96<pre caption="USE Flags"> 99<pre caption="USE Flags">
97USE="-oss 3dnow -apm -arts -avi -berkdb -crypt -cups -encode -gdbm -gif gpm -gtk 100USE="-oss 3dnow -apm -avi -berkdb -crypt -cups -encode -gdbm -gif gpm -gtk
98-imlib -java -jpeg -kde -gnome -libg++ -libwww -mikmod mmx -motif -mpeg ncurses 101-imlib -java -jpeg -kde -gnome -libg++ -libwww -mikmod mmx -motif -mpeg ncurses
99-nls nptl nptlonly -oggvorbis -opengl pam -pdflib -png -python -qt3 -qt4 -qtmt 102-nls nptl nptlonly -ogg -opengl pam -pdflib -png -python -qt4 -qtmt
100-quicktime -readline -sdl -slang -spell -ssl -svga tcpd -truetype -X -xml2 -xv 103-quicktime -readline -sdl -slang -spell -ssl -svga tcpd -truetype -vorbis -X
101-zlib" 104-xml2 -xv -zlib"
102</pre> 105</pre>
103 106
104<p> 107<p>
105Or simply: 108Or simply:
106</p> 109</p>
112<note> 115<note>
113The <e>tcpd</e> USE flag increases security for packages such as xinetd. 116The <e>tcpd</e> USE flag increases security for packages such as xinetd.
114</note> 117</note>
115 118
116<p> 119<p>
117In step 15 ("Installing the kernel and a System Logger") for stability 120In step 15 ("Installing the kernel and a System Logger") for stability
118reasons, we recommend the vanilla-sources, the official kernel sources 121reasons, we recommend the vanilla-sources, the official kernel sources
119released on <uri>http://www.kernel.org/</uri>, unless you require special 122released on <uri>http://www.kernel.org/</uri>, unless you require special
120support such as xfs. 123support such as xfs.
121</p> 124</p>
122 125
123<pre caption="Installing vanilla-sources"> 126<pre caption="Installing vanilla-sources">
124# <i>emerge -p syslog-ng vanilla-sources</i> 127# <i>emerge -a syslog-ng vanilla-sources</i>
125</pre> 128</pre>
126 129
127<p> 130<p>
128When you install miscellaneous packages, we recommend installing the 131When you install miscellaneous packages, we recommend installing the
129following: 132following:
130</p> 133</p>
131 134
132<pre caption="Installing necessary packages"> 135<pre caption="Installing necessary packages">
133# <i>emerge -p nfs-utils portmap tcpdump ssmtp iptables xinetd</i> 136# <i>emerge -a nfs-utils portmap tcpdump ssmtp iptables xinetd</i>
134</pre> 137</pre>
135 138
136</body> 139</body>
137</section> 140</section>
138<section> 141<section>
139<title>Communication Layer (TCP/IP Network)</title> 142<title>Communication Layer (TCP/IP Network)</title>
140<body> 143<body>
141 144
142<p> 145<p>
143A cluster requires a communication layer to interconnect the slave nodes to 146A cluster requires a communication layer to interconnect the slave nodes to
144the master node. Typically, a FastEthernet or GigaEthernet LAN can be used 147the master node. Typically, a FastEthernet or GigaEthernet LAN can be used
145since they have a good price/performance ratio. Other possibilities include 148since they have a good price/performance ratio. Other possibilities include
146use of products like <uri link="http://www.myricom.com/">Myrinet</uri>, <uri 149use of products like <uri link="http://www.myricom.com/">Myrinet</uri>, <uri
147link="http://quadrics.com/">QsNet</uri> or others. 150link="http://quadrics.com/">QsNet</uri> or others.
148</p> 151</p>
149 152
150<p> 153<p>
151A cluster is composed of two node types: master and slave. Typically, your 154A cluster is composed of two node types: master and slave. Typically, your
152cluster will have one master node and several slave nodes. 155cluster will have one master node and several slave nodes.
153</p> 156</p>
154 157
155<p> 158<p>
156The master node is the cluster's server. It is responsible for telling the 159The master node is the cluster's server. It is responsible for telling the
157slave nodes what to do. This server will typically run such daemons as dhcpd, 160slave nodes what to do. This server will typically run such daemons as dhcpd,
158nfs, pbs-server, and pbs-sched. Your master node will allow interactive 161nfs, pbs-server, and pbs-sched. Your master node will allow interactive
159sessions for users, and accept job executions. 162sessions for users, and accept job executions.
160</p> 163</p>
161 164
162<p> 165<p>
163The slave nodes listen for instructions (via ssh/rsh perhaps) from the master 166The slave nodes listen for instructions (via ssh/rsh perhaps) from the master
164node. They should be dedicated to crunching results and therefore should not 167node. They should be dedicated to crunching results and therefore should not
165run any unnecessary services. 168run any unnecessary services.
166</p> 169</p>
167 170
168<p> 171<p>
169The rest of this documentation will assume a cluster configuration as per the 172The rest of this documentation will assume a cluster configuration as per the
170hosts file below. You should maintain on every node such a hosts file 173hosts file below. You should maintain on every node such a hosts file
171(<path>/etc/hosts</path>) with entries for each node participating node in the 174(<path>/etc/hosts</path>) with entries for each node participating node in the
172cluster. 175cluster.
173</p> 176</p>
174 177
175<pre caption="/etc/hosts"> 178<pre caption="/etc/hosts">
176# Adelie Linux Research &amp; Development Center 179# Adelie Linux Research &amp; Development Center
183192.168.1.1 node01.adelie node01 186192.168.1.1 node01.adelie node01
184192.168.1.2 node02.adelie node02 187192.168.1.2 node02.adelie node02
185</pre> 188</pre>
186 189
187<p> 190<p>
188To setup your cluster dedicated LAN, edit your <path>/etc/conf.d/net</path> 191To setup your cluster dedicated LAN, edit your <path>/etc/conf.d/net</path>
189file on the master node. 192file on the master node.
190</p> 193</p>
191 194
192<pre caption="/etc/conf.d/net"> 195<pre caption="/etc/conf.d/net">
193# Global config file for net.* rc-scripts 196# Global config file for net.* rc-scripts
200iface_eth1="dhcp" 203iface_eth1="dhcp"
201</pre> 204</pre>
202 205
203 206
204<p> 207<p>
205Finally, setup a DHCP daemon on the master node to avoid having to maintain a 208Finally, setup a DHCP daemon on the master node to avoid having to maintain a
206network configuration on each slave node. 209network configuration on each slave node.
207</p> 210</p>
208 211
209<pre caption="/etc/dhcp/dhcpd.conf"> 212<pre caption="/etc/dhcp/dhcpd.conf">
210# Adelie Linux Research &amp; Development Center 213# Adelie Linux Research &amp; Development Center
237<section> 240<section>
238<title>NFS/NIS</title> 241<title>NFS/NIS</title>
239<body> 242<body>
240 243
241<p> 244<p>
242The Network File System (NFS) was developed to allow machines to mount a disk 245The Network File System (NFS) was developed to allow machines to mount a disk
243partition on a remote machine as if it were on a local hard drive. This allows 246partition on a remote machine as if it were on a local hard drive. This allows
244for fast, seamless sharing of files across a network. 247for fast, seamless sharing of files across a network.
245</p> 248</p>
246 249
247<p> 250<p>
248There are other systems that provide similar functionality to NFS which could 251There are other systems that provide similar functionality to NFS which could
249be used in a cluster environment. The <uri 252be used in a cluster environment. The <uri
250link="http://www.openafs.org">Andrew File System 253link="http://www.openafs.org">Andrew File System
251from IBM</uri>, recently open-sourced, provides a file sharing mechanism with 254from IBM</uri>, recently open-sourced, provides a file sharing mechanism with
252some additional security and performance features. The <uri 255some additional security and performance features. The <uri
253link="http://www.coda.cs.cmu.edu/">Coda File System</uri> is still in 256link="http://www.coda.cs.cmu.edu/">Coda File System</uri> is still in
254development, but is designed to work well with disconnected clients. Many 257development, but is designed to work well with disconnected clients. Many
255of the features of the Andrew and Coda file systems are slated for inclusion 258of the features of the Andrew and Coda file systems are slated for inclusion
256in the next version of <uri link="http://www.nfsv4.org">NFS (Version 4)</uri>. 259in the next version of <uri link="http://www.nfsv4.org">NFS (Version 4)</uri>.
257The advantage of NFS today is that it is mature, standard, well understood, 260The advantage of NFS today is that it is mature, standard, well understood,
258and supported robustly across a variety of platforms. 261and supported robustly across a variety of platforms.
259</p> 262</p>
260 263
261<pre caption="Ebuilds for NFS-support"> 264<pre caption="Ebuilds for NFS-support">
262# <i>emerge -p nfs-utils portmap</i> 265# <i>emerge -a nfs-utils portmap</i>
263# <i>emerge nfs-utils portmap</i>
264</pre> 266</pre>
265 267
266<p> 268<p>
267Configure and install a kernel to support NFS v3 on all nodes: 269Configure and install a kernel to support NFS v3 on all nodes:
268</p> 270</p>
275CONFIG_NFSD_V3=y 277CONFIG_NFSD_V3=y
276CONFIG_LOCKD_V4=y 278CONFIG_LOCKD_V4=y
277</pre> 279</pre>
278 280
279<p> 281<p>
280On the master node, edit your <path>/etc/hosts.allow</path> file to allow 282On the master node, edit your <path>/etc/hosts.allow</path> file to allow
281connections from slave nodes. If your cluster LAN is on 192.168.1.0/24, 283connections from slave nodes. If your cluster LAN is on 192.168.1.0/24,
282your <path>hosts.allow</path> will look like: 284your <path>hosts.allow</path> will look like:
283</p> 285</p>
284 286
285<pre caption="hosts.allow"> 287<pre caption="hosts.allow">
286portmap:192.168.1.0/255.255.255.0 288portmap:192.168.1.0/255.255.255.0
287</pre> 289</pre>
288 290
289<p> 291<p>
290Edit the <path>/etc/exports</path> file of the master node to export a work 292Edit the <path>/etc/exports</path> file of the master node to export a work
291directory structure (/home is good for this). 293directory structure (/home is good for this).
292</p> 294</p>
293 295
294<pre caption="/etc/exports"> 296<pre caption="/etc/exports">
295/home/ *(rw) 297/home/ *(rw)
302<pre caption="Adding NFS to the default runlevel"> 304<pre caption="Adding NFS to the default runlevel">
303# <i>rc-update add nfs default</i> 305# <i>rc-update add nfs default</i>
304</pre> 306</pre>
305 307
306<p> 308<p>
307To mount the nfs exported filesystem from the master, you also have to 309To mount the nfs exported filesystem from the master, you also have to
308configure your salve nodes' <path>/etc/fstab</path>. Add a line like this 310configure your salve nodes' <path>/etc/fstab</path>. Add a line like this
309one: 311one:
310</p> 312</p>
311 313
312<pre caption="/etc/fstab"> 314<pre caption="/etc/fstab">
313master:/home/ /home nfs rw,exec,noauto,nouser,async 0 0 315master:/home/ /home nfs rw,exec,noauto,nouser,async 0 0
314</pre> 316</pre>
315 317
316<p> 318<p>
317You'll also need to set up your nodes so that they mount the nfs filesystem by 319You'll also need to set up your nodes so that they mount the nfs filesystem by
318issuing this command: 320issuing this command:
319</p> 321</p>
320 322
321<pre caption="Adding nfsmount to the default runlevel"> 323<pre caption="Adding nfsmount to the default runlevel">
322# <i>rc-update add nfsmount default</i> 324# <i>rc-update add nfsmount default</i>
327<section> 329<section>
328<title>RSH/SSH</title> 330<title>RSH/SSH</title>
329<body> 331<body>
330 332
331<p> 333<p>
332SSH is a protocol for secure remote login and other secure network services 334SSH is a protocol for secure remote login and other secure network services
333over an insecure network. OpenSSH uses public key cryptography to provide 335over an insecure network. OpenSSH uses public key cryptography to provide
334secure authorization. Generating the public key, which is shared with remote 336secure authorization. Generating the public key, which is shared with remote
335systems, and the private key which is kept on the local system, is done first 337systems, and the private key which is kept on the local system, is done first
336to configure OpenSSH on the cluster. 338to configure OpenSSH on the cluster.
337</p> 339</p>
338 340
339<p> 341<p>
340For transparent cluster usage, private/public keys may be used. This process 342For transparent cluster usage, private/public keys may be used. This process
341has two steps: 343has two steps:
342</p> 344</p>
343 345
344<ul> 346<ul>
345 <li>Generate public and private keys</li> 347 <li>Generate public and private keys</li>
372root@master's password: 374root@master's password:
373id_dsa.pub 100% 234 2.0MB/s 00:00 375id_dsa.pub 100% 234 2.0MB/s 00:00
374</pre> 376</pre>
375 377
376<note> 378<note>
377Host keys must have an empty passphrase. RSA is required for host-based 379Host keys must have an empty passphrase. RSA is required for host-based
378authentication. 380authentication.
379</note> 381</note>
380 382
381<p> 383<p>
382For host based authentication, you will also need to edit your 384For host based authentication, you will also need to edit your
383<path>/etc/ssh/shosts.equiv</path>. 385<path>/etc/ssh/shosts.equiv</path>.
384</p> 386</p>
385 387
386<pre caption="/etc/ssh/shosts.equiv"> 388<pre caption="/etc/ssh/shosts.equiv">
387node01.adelie 389node01.adelie
395 397
396<pre caption="sshd configurations"> 398<pre caption="sshd configurations">
397# $OpenBSD: sshd_config,v 1.42 2001/09/20 20:57:51 mouring Exp $ 399# $OpenBSD: sshd_config,v 1.42 2001/09/20 20:57:51 mouring Exp $
398# This sshd was compiled with PATH=/usr/bin:/bin:/usr/sbin:/sbin 400# This sshd was compiled with PATH=/usr/bin:/bin:/usr/sbin:/sbin
399 401
400# This is the sshd server system-wide configuration file. See sshd(8) 402# This is the sshd server system-wide configuration file. See sshd(8)
401# for more information. 403# for more information.
402 404
403# HostKeys for protocol version 2 405# HostKeys for protocol version 2
404HostKey /etc/ssh/ssh_host_rsa_key 406HostKey /etc/ssh/ssh_host_rsa_key
405</pre> 407</pre>
406 408
407<p> 409<p>
408If your application require RSH communications, you will need to emerge 410If your application require RSH communications, you will need to emerge
409net-misc/netkit-rsh and sys-apps/xinetd. 411<c>net-misc/netkit-rsh</c> and <c>sys-apps/xinetd</c>.
410</p> 412</p>
411 413
412<pre caption="Installing necessary applicaitons"> 414<pre caption="Installing necessary applicaitons">
413# <i>emerge -p xinetd</i> 415# <i>emerge -a xinetd</i>
414# <i>emerge xinetd</i>
415# <i>emerge -p netkit-rsh</i> 416# <i>emerge -a netkit-rsh</i>
416# <i>emerge netkit-rsh</i>
417</pre> 417</pre>
418 418
419<p> 419<p>
420Then configure the rsh deamon. Edit your <path>/etc/xinet.d/rsh</path> file. 420Then configure the rsh deamon. Edit your <path>/etc/xinet.d/rsh</path> file.
421</p> 421</p>
422 422
423<pre caption="rsh"> 423<pre caption="rsh">
424# Adelie Linux Research &amp; Development Center 424# Adelie Linux Research &amp; Development Center
425# /etc/xinetd.d/rsh 425# /etc/xinetd.d/rsh
454Or you can simply trust your cluster LAN: 454Or you can simply trust your cluster LAN:
455</p> 455</p>
456 456
457<pre caption="hosts.allow"> 457<pre caption="hosts.allow">
458# Adelie Linux Research &amp; Development Center 458# Adelie Linux Research &amp; Development Center
459# /etc/hosts.allow 459# /etc/hosts.allow
460 460
461ALL:192.168.1.0/255.255.255.0 461ALL:192.168.1.0/255.255.255.0
462</pre> 462</pre>
463 463
464<p> 464<p>
487<section> 487<section>
488<title>NTP</title> 488<title>NTP</title>
489<body> 489<body>
490 490
491<p> 491<p>
492The Network Time Protocol (NTP) is used to synchronize the time of a computer 492The Network Time Protocol (NTP) is used to synchronize the time of a computer
493client or server to another server or reference time source, such as a radio 493client or server to another server or reference time source, such as a radio
494or satellite receiver or modem. It provides accuracies typically within a 494or satellite receiver or modem. It provides accuracies typically within a
495millisecond on LANs and up to a few tens of milliseconds on WANs relative to 495millisecond on LANs and up to a few tens of milliseconds on WANs relative to
496Coordinated Universal Time (UTC) via a Global Positioning Service (GPS) 496Coordinated Universal Time (UTC) via a Global Positioning Service (GPS)
497receiver, for example. Typical NTP configurations utilize multiple redundant 497receiver, for example. Typical NTP configurations utilize multiple redundant
498servers and diverse network paths in order to achieve high accuracy and 498servers and diverse network paths in order to achieve high accuracy and
499reliability. 499reliability.
500</p> 500</p>
501 501
502<p> 502<p>
503Select a NTP server geographically close to you from <uri 503Select a NTP server geographically close to you from <uri
504link="http://www.eecis.udel.edu/~mills/ntp/servers.html">Public NTP Time 504link="http://www.eecis.udel.edu/~mills/ntp/servers.html">Public NTP Time
505Servers</uri>, and configure your <path>/etc/conf.d/ntp</path> and 505Servers</uri>, and configure your <path>/etc/conf.d/ntp</path> and
506<path>/etc/ntp.conf</path> files on the master node. 506<path>/etc/ntp.conf</path> files on the master node.
507</p> 507</p>
508 508
509<pre caption="Master /etc/conf.d/ntp"> 509<pre caption="Master /etc/conf.d/ntp">
510# /etc/conf.d/ntpd 510# /etc/conf.d/ntpd
547#NTPD_OPTS="" 547#NTPD_OPTS=""
548 548
549</pre> 549</pre>
550 550
551<p> 551<p>
552Edit your <path>/etc/ntp.conf</path> file on the master to setup an external 552Edit your <path>/etc/ntp.conf</path> file on the master to setup an external
553synchronization source: 553synchronization source:
554</p> 554</p>
555 555
556<pre caption="Master ntp.conf"> 556<pre caption="Master ntp.conf">
557# Adelie Linux Research &amp; Development Center 557# Adelie Linux Research &amp; Development Center
563# Synchronization source #2 563# Synchronization source #2
564server ntp2.cmc.ec.gc.ca 564server ntp2.cmc.ec.gc.ca
565restrict ntp2.cmc.ec.gc.ca 565restrict ntp2.cmc.ec.gc.ca
566stratum 10 566stratum 10
567driftfile /etc/ntp.drift.server 567driftfile /etc/ntp.drift.server
568logfile /var/log/ntp 568logfile /var/log/ntp
569broadcast 192.168.1.255 569broadcast 192.168.1.255
570restrict default kod 570restrict default kod
571restrict 127.0.0.1 571restrict 127.0.0.1
572restrict 192.168.1.0 mask 255.255.255.0 572restrict 192.168.1.0 mask 255.255.255.0
573</pre> 573</pre>
574 574
575<p> 575<p>
576And on all your slave nodes, setup your synchronization source as your master 576And on all your slave nodes, setup your synchronization source as your master
577node. 577node.
578</p> 578</p>
579 579
580<pre caption="Node /etc/conf.d/ntp"> 580<pre caption="Node /etc/conf.d/ntp">
581# /etc/conf.d/ntpd 581# /etc/conf.d/ntpd
592# Synchronization source #1 592# Synchronization source #1
593server master 593server master
594restrict master 594restrict master
595stratum 11 595stratum 11
596driftfile /etc/ntp.drift.server 596driftfile /etc/ntp.drift.server
597logfile /var/log/ntp 597logfile /var/log/ntp
598restrict default kod 598restrict default kod
599restrict 127.0.0.1 599restrict 127.0.0.1
600</pre> 600</pre>
601 601
602<p> 602<p>
606<pre caption="Adding ntpd to the default runlevel"> 606<pre caption="Adding ntpd to the default runlevel">
607# <i>rc-update add ntpd default</i> 607# <i>rc-update add ntpd default</i>
608</pre> 608</pre>
609 609
610<note> 610<note>
611NTP will not update the local clock if the time difference between your 611NTP will not update the local clock if the time difference between your
612synchronization source and the local clock is too great. 612synchronization source and the local clock is too great.
613</note> 613</note>
614 614
615</body> 615</body>
616</section> 616</section>
621<p> 621<p>
622To setup a firewall on your cluster, you will need iptables. 622To setup a firewall on your cluster, you will need iptables.
623</p> 623</p>
624 624
625<pre caption="Installing iptables"> 625<pre caption="Installing iptables">
626# <i>emerge -p iptables</i> 626# <i>emerge -a iptables</i>
627# <i>emerge iptables</i>
628</pre> 627</pre>
629 628
630<p> 629<p>
631Required kernel configuration: 630Required kernel configuration:
632</p> 631</p>
689<section> 688<section>
690<title>OpenPBS</title> 689<title>OpenPBS</title>
691<body> 690<body>
692 691
693<p> 692<p>
694The Portable Batch System (PBS) is a flexible batch queueing and workload 693The Portable Batch System (PBS) is a flexible batch queueing and workload
695management system originally developed for NASA. It operates on networked, 694management system originally developed for NASA. It operates on networked,
696multi-platform UNIX environments, including heterogeneous clusters of 695multi-platform UNIX environments, including heterogeneous clusters of
697workstations, supercomputers, and massively parallel systems. Development of 696workstations, supercomputers, and massively parallel systems. Development of
698PBS is provided by Altair Grid Technologies. 697PBS is provided by Altair Grid Technologies.
699</p> 698</p>
700 699
701<pre caption="Installing openpbs"> 700<pre caption="Installing openpbs">
702# <i>emerge -p openpbs</i> 701# <i>emerge -a openpbs</i>
703</pre> 702</pre>
704 703
705<note> 704<note>
706OpenPBS ebuild does not currently set proper permissions on var-directories 705OpenPBS ebuild does not currently set proper permissions on var-directories
707used by OpenPBS. 706used by OpenPBS.
708</note> 707</note>
709 708
710<p> 709<p>
711Before starting using OpenPBS, some configurations are required. The files 710Before starting using OpenPBS, some configurations are required. The files
712you will need to personalize for your system are: 711you will need to personalize for your system are:
713</p> 712</p>
714 713
715<ul> 714<ul>
716 <li>/etc/pbs_environment</li> 715 <li>/etc/pbs_environment</li>
760set server resources_default.nodes = 1 759set server resources_default.nodes = 1
761set server scheduler_iteration = 60 760set server scheduler_iteration = 60
762</pre> 761</pre>
763 762
764<p> 763<p>
765To submit a task to OpenPBS, the command <c>qsub</c> is used with some 764To submit a task to OpenPBS, the command <c>qsub</c> is used with some
766optional parameters. In the example below, "-l" allows you to specify 765optional parameters. In the example below, "-l" allows you to specify
767the resources required, "-j" provides for redirection of standard out and 766the resources required, "-j" provides for redirection of standard out and
768standard error, and the "-m" will e-mail the user at beginning (b), end (e) 767standard error, and the "-m" will e-mail the user at beginning (b), end (e)
769and on abort (a) of the job. 768and on abort (a) of the job.
770</p> 769</p>
771 770
772<pre caption="Submitting a task"> 771<pre caption="Submitting a task">
773<comment>(submit and request from OpenPBS that myscript be executed on 2 nodes)</comment> 772<comment>(submit and request from OpenPBS that myscript be executed on 2 nodes)</comment>
774# <i>qsub -l nodes=2 -j oe -m abe myscript</i> 773# <i>qsub -l nodes=2 -j oe -m abe myscript</i>
775</pre> 774</pre>
776 775
777<p> 776<p>
778Normally jobs submitted to OpenPBS are in the form of scripts. Sometimes, you 777Normally jobs submitted to OpenPBS are in the form of scripts. Sometimes, you
779may want to try a task manually. To request an interactive shell from OpenPBS, 778may want to try a task manually. To request an interactive shell from OpenPBS,
780use the "-I" parameter. 779use the "-I" parameter.
781</p> 780</p>
782 781
783<pre caption="Requesting an interactive shell"> 782<pre caption="Requesting an interactive shell">
784# <i>qsub -I</i> 783# <i>qsub -I</i>
800<section> 799<section>
801<title>MPICH</title> 800<title>MPICH</title>
802<body> 801<body>
803 802
804<p> 803<p>
805Message passing is a paradigm used widely on certain classes of parallel 804Message passing is a paradigm used widely on certain classes of parallel
806machines, especially those with distributed memory. MPICH is a freely 805machines, especially those with distributed memory. MPICH is a freely
807available, portable implementation of MPI, the Standard for message-passing 806available, portable implementation of MPI, the Standard for message-passing
808libraries. 807libraries.
809</p> 808</p>
810 809
811<p> 810<p>
812The mpich ebuild provided by Adelie Linux allows for two USE flags: 811The mpich ebuild provided by Adelie Linux allows for two USE flags:
813<e>doc</e> and <e>crypt</e>. <e>doc</e> will cause documentation to be 812<e>doc</e> and <e>crypt</e>. <e>doc</e> will cause documentation to be
814installed, while <e>crypt</e> will configure MPICH to use <c>ssh</c> instead 813installed, while <e>crypt</e> will configure MPICH to use <c>ssh</c> instead
815of <c>rsh</c>. 814of <c>rsh</c>.
816</p> 815</p>
817 816
818<pre caption="Installing the mpich application"> 817<pre caption="Installing the mpich application">
819# <i>emerge -p mpich</i> 818# <i>emerge -a mpich</i>
820# <i>emerge mpich</i>
821</pre> 819</pre>
822 820
823<p> 821<p>
824You may need to export a mpich work directory to all your slave nodes in 822You may need to export a mpich work directory to all your slave nodes in
825<path>/etc/exports</path>: 823<path>/etc/exports</path>:
826</p> 824</p>
827 825
828<pre caption="/etc/exports"> 826<pre caption="/etc/exports">
829/home *(rw) 827/home *(rw)
830</pre> 828</pre>
831 829
832<p> 830<p>
833Most massively parallel processors (MPPs) provide a way to start a program on 831Most massively parallel processors (MPPs) provide a way to start a program on
834a requested number of processors; <c>mpirun</c> makes use of the appropriate 832a requested number of processors; <c>mpirun</c> makes use of the appropriate
835command whenever possible. In contrast, workstation clusters require that each 833command whenever possible. In contrast, workstation clusters require that each
836process in a parallel job be started individually, though programs to help 834process in a parallel job be started individually, though programs to help
837start these processes exist. Because workstation clusters are not already 835start these processes exist. Because workstation clusters are not already
838organized as an MPP, additional information is required to make use of them. 836organized as an MPP, additional information is required to make use of them.
839Mpich should be installed with a list of participating workstations in the 837Mpich should be installed with a list of participating workstations in the
840file <path>machines.LINUX</path> in the directory 838file <path>machines.LINUX</path> in the directory
841<path>/usr/share/mpich/</path>. This file is used by <c>mpirun</c> to choose 839<path>/usr/share/mpich/</path>. This file is used by <c>mpirun</c> to choose
842processors to run on. 840processors to run on.
843</p> 841</p>
844 842
845<p> 843<p>
846Edit this file to reflect your cluster-lan configuration: 844Edit this file to reflect your cluster-lan configuration:
847</p> 845</p>
848 846
849<pre caption="/usr/share/mpich/machines.LINUX"> 847<pre caption="/usr/share/mpich/machines.LINUX">
850# Change this file to contain the machines that you want to use 848# Change this file to contain the machines that you want to use
851# to run MPI jobs on. The format is one host name per line, with either 849# to run MPI jobs on. The format is one host name per line, with either
852# hostname 850# hostname
853# or 851# or
854# hostname:n 852# hostname:n
855# where n is the number of processors in an SMP. The hostname should 853# where n is the number of processors in an SMP. The hostname should
856# be the same as the result from the command "hostname" 854# be the same as the result from the command "hostname"
857master 855master
858node01 856node01
859node02 857node02
860# node03 858# node03
861# node04 859# node04
862# ... 860# ...
863</pre> 861</pre>
864 862
865<p> 863<p>
866Use the script <c>tstmachines</c> in <path>/usr/sbin/</path> to ensure that 864Use the script <c>tstmachines</c> in <path>/usr/sbin/</path> to ensure that
867you can use all of the machines that you have listed. This script performs 865you can use all of the machines that you have listed. This script performs
868an <c>rsh</c> and a short directory listing; this tests that you both have 866an <c>rsh</c> and a short directory listing; this tests that you both have
869access to the node and that a program in the current directory is visible on 867access to the node and that a program in the current directory is visible on
870the remote node. If there are any problems, they will be listed. These 868the remote node. If there are any problems, they will be listed. These
871problems must be fixed before proceeding. 869problems must be fixed before proceeding.
872</p> 870</p>
873 871
874<p> 872<p>
875The only argument to <c>tstmachines</c> is the name of the architecture; this 873The only argument to <c>tstmachines</c> is the name of the architecture; this
876is the same name as the extension on the machines file. For example, the 874is the same name as the extension on the machines file. For example, the
877following tests that a program in the current directory can be executed by 875following tests that a program in the current directory can be executed by
878all of the machines in the LINUX machines list. 876all of the machines in the LINUX machines list.
879</p> 877</p>
880 878
881<pre caption="Running a test"> 879<pre caption="Running a test">
882# <i>/usr/local/mpich/sbin/tstmachines LINUX</i> 880# <i>/usr/local/mpich/sbin/tstmachines LINUX</i>
883</pre> 881</pre>
884 882
885<note> 883<note>
886This program is silent if all is well; if you want to see what it is doing, 884This program is silent if all is well; if you want to see what it is doing,
887use the -v (for verbose) argument: 885use the -v (for verbose) argument:
888</note> 886</note>
889 887
890<pre caption="Running a test verbosively"> 888<pre caption="Running a test verbosively">
891# <i>/usr/local/mpich/sbin/tstmachines -v LINUX</i> 889# <i>/usr/local/mpich/sbin/tstmachines -v LINUX</i>
903Trying user program on host1.uoffoo.edu ... 901Trying user program on host1.uoffoo.edu ...
904Trying user program on host2.uoffoo.edu ... 902Trying user program on host2.uoffoo.edu ...
905</pre> 903</pre>
906 904
907<p> 905<p>
908If <c>tstmachines</c> finds a problem, it will suggest possible reasons and 906If <c>tstmachines</c> finds a problem, it will suggest possible reasons and
909solutions. In brief, there are three tests: 907solutions. In brief, there are three tests:
910</p> 908</p>
911 909
912<ul> 910<ul>
913 <li> 911 <li>
914 <e>Can processes be started on remote machines?</e> tstmachines attempts 912 <e>Can processes be started on remote machines?</e> tstmachines attempts
915 to run the shell command true on each machine in the machines files by 913 to run the shell command true on each machine in the machines files by
916 using the remote shell command. 914 using the remote shell command.
917 </li> 915 </li>
918 <li> 916 <li>
919 <e>Is current working directory available to all machines?</e> This 917 <e>Is current working directory available to all machines?</e> This
920 attempts to ls a file that tstmachines creates by running ls using the 918 attempts to ls a file that tstmachines creates by running ls using the
921 remote shell command. 919 remote shell command.
922 </li> 920 </li>
923 <li> 921 <li>
924 <e>Can user programs be run on remote systems?</e> This checks that shared 922 <e>Can user programs be run on remote systems?</e> This checks that shared
925 libraries and other components have been properly installed on all 923 libraries and other components have been properly installed on all
926 machines. 924 machines.
927 </li> 925 </li>
928</ul> 926</ul>
929 927
930<p> 928<p>
937# <i>make hello++</i> 935# <i>make hello++</i>
938# <i>mpirun -machinefile /usr/share/mpich/machines.LINUX -np 1 hello++</i> 936# <i>mpirun -machinefile /usr/share/mpich/machines.LINUX -np 1 hello++</i>
939</pre> 937</pre>
940 938
941<p> 939<p>
942For further information on MPICH, consult the documentation at <uri 940For further information on MPICH, consult the documentation at <uri
943link="http://www-unix.mcs.anl.gov/mpi/mpich/docs/mpichman-chp4/mpichman-chp4.htm">http://www-unix.mcs.anl.gov/mpi/mpich/docs/mpichman-chp4/mpichman-chp4.htm</uri>. 941link="http://www-unix.mcs.anl.gov/mpi/mpich/docs/mpichman-chp4/mpichman-chp4.htm">http://www-unix.mcs.anl.gov/mpi/mpich/docs/mpichman-chp4/mpichman-chp4.htm</uri>.
944</p> 942</p>
945 943
946</body> 944</body>
947</section> 945</section>
971<title>Bibliography</title> 969<title>Bibliography</title>
972<section> 970<section>
973<body> 971<body>
974 972
975<p> 973<p>
976The original document is published at the <uri 974The original document is published at the <uri
977link="http://www.adelielinux.com">Adelie Linux R&amp;D Centre</uri> web site, 975link="http://www.adelielinux.com">Adelie Linux R&amp;D Centre</uri> web site,
978and is reproduced here with the permission of the authors and <uri 976and is reproduced here with the permission of the authors and <uri
979link="http://www.cyberlogic.ca">Cyberlogic</uri>'s Adelie Linux R&amp;D 977link="http://www.cyberlogic.ca">Cyberlogic</uri>'s Adelie Linux R&amp;D
980Centre. 978Centre.
981</p> 979</p>
982 980
983<ul> 981<ul>
984 <li><uri>http://www.gentoo.org</uri>, Gentoo Foundation, Inc.</li> 982 <li><uri>http://www.gentoo.org</uri>, Gentoo Foundation, Inc.</li>
985 <li> 983 <li>
986 <uri link="http://www.adelielinux.com">http://www.adelielinux.com</uri>, 984 <uri link="http://www.adelielinux.com">http://www.adelielinux.com</uri>,
987 Adelie Linux Research and Development Centre 985 Adelie Linux Research and Development Centre
988 </li> 986 </li>
989 <li> 987 <li>
990 <uri link="http://nfs.sourceforge.net/">http://nfs.sourceforge.net</uri>, 988 <uri link="http://nfs.sourceforge.net/">http://nfs.sourceforge.net</uri>,
991 Linux NFS Project 989 Linux NFS Project
992 </li> 990 </li>
993 <li> 991 <li>
994 <uri link="http://www-unix.mcs.anl.gov/mpi/mpich/">http://www-unix.mcs.anl.gov/mpi/mpich/</uri>, 992 <uri link="http://www-unix.mcs.anl.gov/mpi/mpich/">http://www-unix.mcs.anl.gov/mpi/mpich/</uri>,
995 Mathematics and Computer Science Division, Argonne National Laboratory 993 Mathematics and Computer Science Division, Argonne National Laboratory
996 </li> 994 </li>
997 <li> 995 <li>
998 <uri link="http://www.ntp.org/">http://ntp.org</uri> 996 <uri link="http://www.ntp.org/">http://ntp.org</uri>
999 </li> 997 </li>
1000 <li> 998 <li>
1001 <uri link="http://www.eecis.udel.edu/~mills/">http://www.eecis.udel.edu/~mills/</uri>, 999 <uri link="http://www.eecis.udel.edu/~mills/">http://www.eecis.udel.edu/~mills/</uri>,
1002 David L. Mills, University of Delaware 1000 David L. Mills, University of Delaware
1003 </li> 1001 </li>
1004 <li> 1002 <li>
1005 <uri link="http://www.ietf.org/html.charters/secsh-charter.html">http://www.ietf.org/html.charters/secsh-charter.html</uri>, 1003 <uri link="http://www.ietf.org/html.charters/secsh-charter.html">http://www.ietf.org/html.charters/secsh-charter.html</uri>,
1006 Secure Shell Working Group, IETF, Internet Society 1004 Secure Shell Working Group, IETF, Internet Society
1007 </li> 1005 </li>
1008 <li> 1006 <li>
1009 <uri link="http://www.linuxsecurity.com/">http://www.linuxsecurity.com/</uri>, 1007 <uri link="http://www.linuxsecurity.com/">http://www.linuxsecurity.com/</uri>,
1010 Guardian Digital 1008 Guardian Digital
1011 </li> 1009 </li>
1012 <li> 1010 <li>
1013 <uri link="http://www.openpbs.org/">http://www.openpbs.org/</uri>, 1011 <uri link="http://www.openpbs.org/">http://www.openpbs.org/</uri>,
1014 Altair Grid Technologies, LLC. 1012 Altair Grid Technologies, LLC.
1015 </li> 1013 </li>
1016</ul> 1014</ul>
1017 1015
1018</body> 1016</body>

Legend:
Removed from v.1.13  
changed lines
  Added in v.1.15

  ViewVC Help
Powered by ViewVC 1.1.20