/[gentoo]/xml/htdocs/doc/en/hpc-howto.xml
Gentoo

Diff of /xml/htdocs/doc/en/hpc-howto.xml

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

Revision 1.13 Revision 1.14
1<?xml version='1.0' encoding="UTF-8"?> 1<?xml version='1.0' encoding="UTF-8"?>
2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/hpc-howto.xml,v 1.13 2006/12/18 21:47:19 nightmorph Exp $ --> 2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/hpc-howto.xml,v 1.14 2008/05/19 20:56:20 swift Exp $ -->
3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> 3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd">
4 4
5<guide link="/doc/en/hpc-howto.xml"> 5<guide link="/doc/en/hpc-howto.xml">
6<title>High Performance Computing on Gentoo Linux</title> 6<title>High Performance Computing on Gentoo Linux</title>
7 7
26 26
27 In other words, this is copyright adelielinux R&D; Gentoo only has 27 In other words, this is copyright adelielinux R&D; Gentoo only has
28 permission to distribute this document as-is and update it when appropriate 28 permission to distribute this document as-is and update it when appropriate
29 as long as the adelie linux R&D notice stays 29 as long as the adelie linux R&D notice stays
30--> 30-->
31 31
32<abstract> 32<abstract>
33This document was written by people at the Adelie Linux R&amp;D Center 33This document was written by people at the Adelie Linux R&amp;D Center
34&lt;http://www.adelielinux.com&gt; as a step-by-step guide to turn a Gentoo 34&lt;http://www.adelielinux.com&gt; as a step-by-step guide to turn a Gentoo
35System into a High Performance Computing (HPC) system. 35System into a High Performance Computing (HPC) system.
36</abstract> 36</abstract>
42<title>Introduction</title> 42<title>Introduction</title>
43<section> 43<section>
44<body> 44<body>
45 45
46<p> 46<p>
47Gentoo Linux, a special flavor of Linux that can be automatically optimized 47Gentoo Linux, a special flavor of Linux that can be automatically optimized
48and customized for just about any application or need. Extreme performance, 48and customized for just about any application or need. Extreme performance,
49configurability and a top-notch user and developer community are all hallmarks 49configurability and a top-notch user and developer community are all hallmarks
50of the Gentoo experience. 50of the Gentoo experience.
51</p> 51</p>
52 52
53<p> 53<p>
54Thanks to a technology called Portage, Gentoo Linux can become an ideal secure 54Thanks to a technology called Portage, Gentoo Linux can become an ideal secure
55server, development workstation, professional desktop, gaming system, embedded 55server, development workstation, professional desktop, gaming system, embedded
56solution or... a High Performance Computing system. Because of its 56solution or... a High Performance Computing system. Because of its
57near-unlimited adaptability, we call Gentoo Linux a metadistribution. 57near-unlimited adaptability, we call Gentoo Linux a metadistribution.
58</p> 58</p>
59 59
60<p> 60<p>
61This document explains how to turn a Gentoo system into a High Performance 61This document explains how to turn a Gentoo system into a High Performance
62Computing system. Step by step, it explains what packages one may want to 62Computing system. Step by step, it explains what packages one may want to
63install and helps configure them. 63install and helps configure them.
64</p> 64</p>
65 65
66<p> 66<p>
67Obtain Gentoo Linux from the website <uri>http://www.gentoo.org</uri>, and 67Obtain Gentoo Linux from the website <uri>http://www.gentoo.org</uri>, and
84this section. 84this section.
85</note> 85</note>
86 86
87<p> 87<p>
88During the installation process, you will have to set your USE variables in 88During the installation process, you will have to set your USE variables in
89<path>/etc/make.conf</path>. We recommended that you deactivate all the 89<path>/etc/make.conf</path>. We recommended that you deactivate all the
90defaults (see <path>/etc/make.profile/make.defaults</path>) by negating them in 90defaults (see <path>/etc/make.profile/make.defaults</path>) by negating them in
91make.conf. However, you may want to keep such use variables as x86, 3dnow, gpm, 91make.conf. However, you may want to keep such use variables as x86, 3dnow, gpm,
92mmx, nptl, nptlonly, sse, ncurses, pam and tcpd. Refer to the USE documentation 92mmx, nptl, nptlonly, sse, ncurses, pam and tcpd. Refer to the USE documentation
93for more information. 93for more information.
94</p> 94</p>
95 95
96<pre caption="USE Flags"> 96<pre caption="USE Flags">
97USE="-oss 3dnow -apm -arts -avi -berkdb -crypt -cups -encode -gdbm -gif gpm -gtk 97USE="-oss 3dnow -apm -arts -avi -berkdb -crypt -cups -encode -gdbm -gif gpm -gtk
112<note> 112<note>
113The <e>tcpd</e> USE flag increases security for packages such as xinetd. 113The <e>tcpd</e> USE flag increases security for packages such as xinetd.
114</note> 114</note>
115 115
116<p> 116<p>
117In step 15 ("Installing the kernel and a System Logger") for stability 117In step 15 ("Installing the kernel and a System Logger") for stability
118reasons, we recommend the vanilla-sources, the official kernel sources 118reasons, we recommend the vanilla-sources, the official kernel sources
119released on <uri>http://www.kernel.org/</uri>, unless you require special 119released on <uri>http://www.kernel.org/</uri>, unless you require special
120support such as xfs. 120support such as xfs.
121</p> 121</p>
122 122
123<pre caption="Installing vanilla-sources"> 123<pre caption="Installing vanilla-sources">
124# <i>emerge -p syslog-ng vanilla-sources</i> 124# <i>emerge -p syslog-ng vanilla-sources</i>
125</pre> 125</pre>
126 126
127<p> 127<p>
128When you install miscellaneous packages, we recommend installing the 128When you install miscellaneous packages, we recommend installing the
129following: 129following:
130</p> 130</p>
131 131
132<pre caption="Installing necessary packages"> 132<pre caption="Installing necessary packages">
133# <i>emerge -p nfs-utils portmap tcpdump ssmtp iptables xinetd</i> 133# <i>emerge -p nfs-utils portmap tcpdump ssmtp iptables xinetd</i>
138<section> 138<section>
139<title>Communication Layer (TCP/IP Network)</title> 139<title>Communication Layer (TCP/IP Network)</title>
140<body> 140<body>
141 141
142<p> 142<p>
143A cluster requires a communication layer to interconnect the slave nodes to 143A cluster requires a communication layer to interconnect the slave nodes to
144the master node. Typically, a FastEthernet or GigaEthernet LAN can be used 144the master node. Typically, a FastEthernet or GigaEthernet LAN can be used
145since they have a good price/performance ratio. Other possibilities include 145since they have a good price/performance ratio. Other possibilities include
146use of products like <uri link="http://www.myricom.com/">Myrinet</uri>, <uri 146use of products like <uri link="http://www.myricom.com/">Myrinet</uri>, <uri
147link="http://quadrics.com/">QsNet</uri> or others. 147link="http://quadrics.com/">QsNet</uri> or others.
148</p> 148</p>
149 149
150<p> 150<p>
151A cluster is composed of two node types: master and slave. Typically, your 151A cluster is composed of two node types: master and slave. Typically, your
152cluster will have one master node and several slave nodes. 152cluster will have one master node and several slave nodes.
153</p> 153</p>
154 154
155<p> 155<p>
156The master node is the cluster's server. It is responsible for telling the 156The master node is the cluster's server. It is responsible for telling the
157slave nodes what to do. This server will typically run such daemons as dhcpd, 157slave nodes what to do. This server will typically run such daemons as dhcpd,
158nfs, pbs-server, and pbs-sched. Your master node will allow interactive 158nfs, pbs-server, and pbs-sched. Your master node will allow interactive
159sessions for users, and accept job executions. 159sessions for users, and accept job executions.
160</p> 160</p>
161 161
162<p> 162<p>
163The slave nodes listen for instructions (via ssh/rsh perhaps) from the master 163The slave nodes listen for instructions (via ssh/rsh perhaps) from the master
164node. They should be dedicated to crunching results and therefore should not 164node. They should be dedicated to crunching results and therefore should not
165run any unnecessary services. 165run any unnecessary services.
166</p> 166</p>
167 167
168<p> 168<p>
169The rest of this documentation will assume a cluster configuration as per the 169The rest of this documentation will assume a cluster configuration as per the
170hosts file below. You should maintain on every node such a hosts file 170hosts file below. You should maintain on every node such a hosts file
171(<path>/etc/hosts</path>) with entries for each node participating node in the 171(<path>/etc/hosts</path>) with entries for each node participating node in the
172cluster. 172cluster.
173</p> 173</p>
174 174
175<pre caption="/etc/hosts"> 175<pre caption="/etc/hosts">
176# Adelie Linux Research &amp; Development Center 176# Adelie Linux Research &amp; Development Center
183192.168.1.1 node01.adelie node01 183192.168.1.1 node01.adelie node01
184192.168.1.2 node02.adelie node02 184192.168.1.2 node02.adelie node02
185</pre> 185</pre>
186 186
187<p> 187<p>
188To setup your cluster dedicated LAN, edit your <path>/etc/conf.d/net</path> 188To setup your cluster dedicated LAN, edit your <path>/etc/conf.d/net</path>
189file on the master node. 189file on the master node.
190</p> 190</p>
191 191
192<pre caption="/etc/conf.d/net"> 192<pre caption="/etc/conf.d/net">
193# Global config file for net.* rc-scripts 193# Global config file for net.* rc-scripts
200iface_eth1="dhcp" 200iface_eth1="dhcp"
201</pre> 201</pre>
202 202
203 203
204<p> 204<p>
205Finally, setup a DHCP daemon on the master node to avoid having to maintain a 205Finally, setup a DHCP daemon on the master node to avoid having to maintain a
206network configuration on each slave node. 206network configuration on each slave node.
207</p> 207</p>
208 208
209<pre caption="/etc/dhcp/dhcpd.conf"> 209<pre caption="/etc/dhcp/dhcpd.conf">
210# Adelie Linux Research &amp; Development Center 210# Adelie Linux Research &amp; Development Center
237<section> 237<section>
238<title>NFS/NIS</title> 238<title>NFS/NIS</title>
239<body> 239<body>
240 240
241<p> 241<p>
242The Network File System (NFS) was developed to allow machines to mount a disk 242The Network File System (NFS) was developed to allow machines to mount a disk
243partition on a remote machine as if it were on a local hard drive. This allows 243partition on a remote machine as if it were on a local hard drive. This allows
244for fast, seamless sharing of files across a network. 244for fast, seamless sharing of files across a network.
245</p> 245</p>
246 246
247<p> 247<p>
248There are other systems that provide similar functionality to NFS which could 248There are other systems that provide similar functionality to NFS which could
249be used in a cluster environment. The <uri 249be used in a cluster environment. The <uri
250link="http://www.openafs.org">Andrew File System 250link="http://www.openafs.org">Andrew File System
251from IBM</uri>, recently open-sourced, provides a file sharing mechanism with 251from IBM</uri>, recently open-sourced, provides a file sharing mechanism with
252some additional security and performance features. The <uri 252some additional security and performance features. The <uri
253link="http://www.coda.cs.cmu.edu/">Coda File System</uri> is still in 253link="http://www.coda.cs.cmu.edu/">Coda File System</uri> is still in
254development, but is designed to work well with disconnected clients. Many 254development, but is designed to work well with disconnected clients. Many
255of the features of the Andrew and Coda file systems are slated for inclusion 255of the features of the Andrew and Coda file systems are slated for inclusion
256in the next version of <uri link="http://www.nfsv4.org">NFS (Version 4)</uri>. 256in the next version of <uri link="http://www.nfsv4.org">NFS (Version 4)</uri>.
257The advantage of NFS today is that it is mature, standard, well understood, 257The advantage of NFS today is that it is mature, standard, well understood,
258and supported robustly across a variety of platforms. 258and supported robustly across a variety of platforms.
259</p> 259</p>
260 260
261<pre caption="Ebuilds for NFS-support"> 261<pre caption="Ebuilds for NFS-support">
262# <i>emerge -p nfs-utils portmap</i> 262# <i>emerge -p nfs-utils portmap</i>
275CONFIG_NFSD_V3=y 275CONFIG_NFSD_V3=y
276CONFIG_LOCKD_V4=y 276CONFIG_LOCKD_V4=y
277</pre> 277</pre>
278 278
279<p> 279<p>
280On the master node, edit your <path>/etc/hosts.allow</path> file to allow 280On the master node, edit your <path>/etc/hosts.allow</path> file to allow
281connections from slave nodes. If your cluster LAN is on 192.168.1.0/24, 281connections from slave nodes. If your cluster LAN is on 192.168.1.0/24,
282your <path>hosts.allow</path> will look like: 282your <path>hosts.allow</path> will look like:
283</p> 283</p>
284 284
285<pre caption="hosts.allow"> 285<pre caption="hosts.allow">
286portmap:192.168.1.0/255.255.255.0 286portmap:192.168.1.0/255.255.255.0
287</pre> 287</pre>
288 288
289<p> 289<p>
290Edit the <path>/etc/exports</path> file of the master node to export a work 290Edit the <path>/etc/exports</path> file of the master node to export a work
291directory structure (/home is good for this). 291directory structure (/home is good for this).
292</p> 292</p>
293 293
294<pre caption="/etc/exports"> 294<pre caption="/etc/exports">
295/home/ *(rw) 295/home/ *(rw)
302<pre caption="Adding NFS to the default runlevel"> 302<pre caption="Adding NFS to the default runlevel">
303# <i>rc-update add nfs default</i> 303# <i>rc-update add nfs default</i>
304</pre> 304</pre>
305 305
306<p> 306<p>
307To mount the nfs exported filesystem from the master, you also have to 307To mount the nfs exported filesystem from the master, you also have to
308configure your salve nodes' <path>/etc/fstab</path>. Add a line like this 308configure your salve nodes' <path>/etc/fstab</path>. Add a line like this
309one: 309one:
310</p> 310</p>
311 311
312<pre caption="/etc/fstab"> 312<pre caption="/etc/fstab">
313master:/home/ /home nfs rw,exec,noauto,nouser,async 0 0 313master:/home/ /home nfs rw,exec,noauto,nouser,async 0 0
314</pre> 314</pre>
315 315
316<p> 316<p>
317You'll also need to set up your nodes so that they mount the nfs filesystem by 317You'll also need to set up your nodes so that they mount the nfs filesystem by
318issuing this command: 318issuing this command:
319</p> 319</p>
320 320
321<pre caption="Adding nfsmount to the default runlevel"> 321<pre caption="Adding nfsmount to the default runlevel">
322# <i>rc-update add nfsmount default</i> 322# <i>rc-update add nfsmount default</i>
327<section> 327<section>
328<title>RSH/SSH</title> 328<title>RSH/SSH</title>
329<body> 329<body>
330 330
331<p> 331<p>
332SSH is a protocol for secure remote login and other secure network services 332SSH is a protocol for secure remote login and other secure network services
333over an insecure network. OpenSSH uses public key cryptography to provide 333over an insecure network. OpenSSH uses public key cryptography to provide
334secure authorization. Generating the public key, which is shared with remote 334secure authorization. Generating the public key, which is shared with remote
335systems, and the private key which is kept on the local system, is done first 335systems, and the private key which is kept on the local system, is done first
336to configure OpenSSH on the cluster. 336to configure OpenSSH on the cluster.
337</p> 337</p>
338 338
339<p> 339<p>
340For transparent cluster usage, private/public keys may be used. This process 340For transparent cluster usage, private/public keys may be used. This process
341has two steps: 341has two steps:
342</p> 342</p>
343 343
344<ul> 344<ul>
345 <li>Generate public and private keys</li> 345 <li>Generate public and private keys</li>
372root@master's password: 372root@master's password:
373id_dsa.pub 100% 234 2.0MB/s 00:00 373id_dsa.pub 100% 234 2.0MB/s 00:00
374</pre> 374</pre>
375 375
376<note> 376<note>
377Host keys must have an empty passphrase. RSA is required for host-based 377Host keys must have an empty passphrase. RSA is required for host-based
378authentication. 378authentication.
379</note> 379</note>
380 380
381<p> 381<p>
382For host based authentication, you will also need to edit your 382For host based authentication, you will also need to edit your
383<path>/etc/ssh/shosts.equiv</path>. 383<path>/etc/ssh/shosts.equiv</path>.
384</p> 384</p>
385 385
386<pre caption="/etc/ssh/shosts.equiv"> 386<pre caption="/etc/ssh/shosts.equiv">
387node01.adelie 387node01.adelie
395 395
396<pre caption="sshd configurations"> 396<pre caption="sshd configurations">
397# $OpenBSD: sshd_config,v 1.42 2001/09/20 20:57:51 mouring Exp $ 397# $OpenBSD: sshd_config,v 1.42 2001/09/20 20:57:51 mouring Exp $
398# This sshd was compiled with PATH=/usr/bin:/bin:/usr/sbin:/sbin 398# This sshd was compiled with PATH=/usr/bin:/bin:/usr/sbin:/sbin
399 399
400# This is the sshd server system-wide configuration file. See sshd(8) 400# This is the sshd server system-wide configuration file. See sshd(8)
401# for more information. 401# for more information.
402 402
403# HostKeys for protocol version 2 403# HostKeys for protocol version 2
404HostKey /etc/ssh/ssh_host_rsa_key 404HostKey /etc/ssh/ssh_host_rsa_key
405</pre> 405</pre>
406 406
407<p> 407<p>
408If your application require RSH communications, you will need to emerge 408If your application require RSH communications, you will need to emerge
409net-misc/netkit-rsh and sys-apps/xinetd. 409net-misc/netkit-rsh and sys-apps/xinetd.
410</p> 410</p>
411 411
412<pre caption="Installing necessary applicaitons"> 412<pre caption="Installing necessary applicaitons">
413# <i>emerge -p xinetd</i> 413# <i>emerge -p xinetd</i>
415# <i>emerge -p netkit-rsh</i> 415# <i>emerge -p netkit-rsh</i>
416# <i>emerge netkit-rsh</i> 416# <i>emerge netkit-rsh</i>
417</pre> 417</pre>
418 418
419<p> 419<p>
420Then configure the rsh deamon. Edit your <path>/etc/xinet.d/rsh</path> file. 420Then configure the rsh deamon. Edit your <path>/etc/xinet.d/rsh</path> file.
421</p> 421</p>
422 422
423<pre caption="rsh"> 423<pre caption="rsh">
424# Adelie Linux Research &amp; Development Center 424# Adelie Linux Research &amp; Development Center
425# /etc/xinetd.d/rsh 425# /etc/xinetd.d/rsh
454Or you can simply trust your cluster LAN: 454Or you can simply trust your cluster LAN:
455</p> 455</p>
456 456
457<pre caption="hosts.allow"> 457<pre caption="hosts.allow">
458# Adelie Linux Research &amp; Development Center 458# Adelie Linux Research &amp; Development Center
459# /etc/hosts.allow 459# /etc/hosts.allow
460 460
461ALL:192.168.1.0/255.255.255.0 461ALL:192.168.1.0/255.255.255.0
462</pre> 462</pre>
463 463
464<p> 464<p>
487<section> 487<section>
488<title>NTP</title> 488<title>NTP</title>
489<body> 489<body>
490 490
491<p> 491<p>
492The Network Time Protocol (NTP) is used to synchronize the time of a computer 492The Network Time Protocol (NTP) is used to synchronize the time of a computer
493client or server to another server or reference time source, such as a radio 493client or server to another server or reference time source, such as a radio
494or satellite receiver or modem. It provides accuracies typically within a 494or satellite receiver or modem. It provides accuracies typically within a
495millisecond on LANs and up to a few tens of milliseconds on WANs relative to 495millisecond on LANs and up to a few tens of milliseconds on WANs relative to
496Coordinated Universal Time (UTC) via a Global Positioning Service (GPS) 496Coordinated Universal Time (UTC) via a Global Positioning Service (GPS)
497receiver, for example. Typical NTP configurations utilize multiple redundant 497receiver, for example. Typical NTP configurations utilize multiple redundant
498servers and diverse network paths in order to achieve high accuracy and 498servers and diverse network paths in order to achieve high accuracy and
499reliability. 499reliability.
500</p> 500</p>
501 501
502<p> 502<p>
503Select a NTP server geographically close to you from <uri 503Select a NTP server geographically close to you from <uri
504link="http://www.eecis.udel.edu/~mills/ntp/servers.html">Public NTP Time 504link="http://www.eecis.udel.edu/~mills/ntp/servers.html">Public NTP Time
505Servers</uri>, and configure your <path>/etc/conf.d/ntp</path> and 505Servers</uri>, and configure your <path>/etc/conf.d/ntp</path> and
506<path>/etc/ntp.conf</path> files on the master node. 506<path>/etc/ntp.conf</path> files on the master node.
507</p> 507</p>
508 508
509<pre caption="Master /etc/conf.d/ntp"> 509<pre caption="Master /etc/conf.d/ntp">
510# /etc/conf.d/ntpd 510# /etc/conf.d/ntpd
547#NTPD_OPTS="" 547#NTPD_OPTS=""
548 548
549</pre> 549</pre>
550 550
551<p> 551<p>
552Edit your <path>/etc/ntp.conf</path> file on the master to setup an external 552Edit your <path>/etc/ntp.conf</path> file on the master to setup an external
553synchronization source: 553synchronization source:
554</p> 554</p>
555 555
556<pre caption="Master ntp.conf"> 556<pre caption="Master ntp.conf">
557# Adelie Linux Research &amp; Development Center 557# Adelie Linux Research &amp; Development Center
563# Synchronization source #2 563# Synchronization source #2
564server ntp2.cmc.ec.gc.ca 564server ntp2.cmc.ec.gc.ca
565restrict ntp2.cmc.ec.gc.ca 565restrict ntp2.cmc.ec.gc.ca
566stratum 10 566stratum 10
567driftfile /etc/ntp.drift.server 567driftfile /etc/ntp.drift.server
568logfile /var/log/ntp 568logfile /var/log/ntp
569broadcast 192.168.1.255 569broadcast 192.168.1.255
570restrict default kod 570restrict default kod
571restrict 127.0.0.1 571restrict 127.0.0.1
572restrict 192.168.1.0 mask 255.255.255.0 572restrict 192.168.1.0 mask 255.255.255.0
573</pre> 573</pre>
574 574
575<p> 575<p>
576And on all your slave nodes, setup your synchronization source as your master 576And on all your slave nodes, setup your synchronization source as your master
577node. 577node.
578</p> 578</p>
579 579
580<pre caption="Node /etc/conf.d/ntp"> 580<pre caption="Node /etc/conf.d/ntp">
581# /etc/conf.d/ntpd 581# /etc/conf.d/ntpd
592# Synchronization source #1 592# Synchronization source #1
593server master 593server master
594restrict master 594restrict master
595stratum 11 595stratum 11
596driftfile /etc/ntp.drift.server 596driftfile /etc/ntp.drift.server
597logfile /var/log/ntp 597logfile /var/log/ntp
598restrict default kod 598restrict default kod
599restrict 127.0.0.1 599restrict 127.0.0.1
600</pre> 600</pre>
601 601
602<p> 602<p>
606<pre caption="Adding ntpd to the default runlevel"> 606<pre caption="Adding ntpd to the default runlevel">
607# <i>rc-update add ntpd default</i> 607# <i>rc-update add ntpd default</i>
608</pre> 608</pre>
609 609
610<note> 610<note>
611NTP will not update the local clock if the time difference between your 611NTP will not update the local clock if the time difference between your
612synchronization source and the local clock is too great. 612synchronization source and the local clock is too great.
613</note> 613</note>
614 614
615</body> 615</body>
616</section> 616</section>
689<section> 689<section>
690<title>OpenPBS</title> 690<title>OpenPBS</title>
691<body> 691<body>
692 692
693<p> 693<p>
694The Portable Batch System (PBS) is a flexible batch queueing and workload 694The Portable Batch System (PBS) is a flexible batch queueing and workload
695management system originally developed for NASA. It operates on networked, 695management system originally developed for NASA. It operates on networked,
696multi-platform UNIX environments, including heterogeneous clusters of 696multi-platform UNIX environments, including heterogeneous clusters of
697workstations, supercomputers, and massively parallel systems. Development of 697workstations, supercomputers, and massively parallel systems. Development of
698PBS is provided by Altair Grid Technologies. 698PBS is provided by Altair Grid Technologies.
699</p> 699</p>
700 700
701<pre caption="Installing openpbs"> 701<pre caption="Installing openpbs">
702# <i>emerge -p openpbs</i> 702# <i>emerge -p openpbs</i>
703</pre> 703</pre>
704 704
705<note> 705<note>
706OpenPBS ebuild does not currently set proper permissions on var-directories 706OpenPBS ebuild does not currently set proper permissions on var-directories
707used by OpenPBS. 707used by OpenPBS.
708</note> 708</note>
709 709
710<p> 710<p>
711Before starting using OpenPBS, some configurations are required. The files 711Before starting using OpenPBS, some configurations are required. The files
712you will need to personalize for your system are: 712you will need to personalize for your system are:
713</p> 713</p>
714 714
715<ul> 715<ul>
716 <li>/etc/pbs_environment</li> 716 <li>/etc/pbs_environment</li>
760set server resources_default.nodes = 1 760set server resources_default.nodes = 1
761set server scheduler_iteration = 60 761set server scheduler_iteration = 60
762</pre> 762</pre>
763 763
764<p> 764<p>
765To submit a task to OpenPBS, the command <c>qsub</c> is used with some 765To submit a task to OpenPBS, the command <c>qsub</c> is used with some
766optional parameters. In the example below, "-l" allows you to specify 766optional parameters. In the example below, "-l" allows you to specify
767the resources required, "-j" provides for redirection of standard out and 767the resources required, "-j" provides for redirection of standard out and
768standard error, and the "-m" will e-mail the user at beginning (b), end (e) 768standard error, and the "-m" will e-mail the user at beginning (b), end (e)
769and on abort (a) of the job. 769and on abort (a) of the job.
770</p> 770</p>
771 771
772<pre caption="Submitting a task"> 772<pre caption="Submitting a task">
773<comment>(submit and request from OpenPBS that myscript be executed on 2 nodes)</comment> 773<comment>(submit and request from OpenPBS that myscript be executed on 2 nodes)</comment>
774# <i>qsub -l nodes=2 -j oe -m abe myscript</i> 774# <i>qsub -l nodes=2 -j oe -m abe myscript</i>
775</pre> 775</pre>
776 776
777<p> 777<p>
778Normally jobs submitted to OpenPBS are in the form of scripts. Sometimes, you 778Normally jobs submitted to OpenPBS are in the form of scripts. Sometimes, you
779may want to try a task manually. To request an interactive shell from OpenPBS, 779may want to try a task manually. To request an interactive shell from OpenPBS,
780use the "-I" parameter. 780use the "-I" parameter.
781</p> 781</p>
782 782
783<pre caption="Requesting an interactive shell"> 783<pre caption="Requesting an interactive shell">
784# <i>qsub -I</i> 784# <i>qsub -I</i>
800<section> 800<section>
801<title>MPICH</title> 801<title>MPICH</title>
802<body> 802<body>
803 803
804<p> 804<p>
805Message passing is a paradigm used widely on certain classes of parallel 805Message passing is a paradigm used widely on certain classes of parallel
806machines, especially those with distributed memory. MPICH is a freely 806machines, especially those with distributed memory. MPICH is a freely
807available, portable implementation of MPI, the Standard for message-passing 807available, portable implementation of MPI, the Standard for message-passing
808libraries. 808libraries.
809</p> 809</p>
810 810
811<p> 811<p>
812The mpich ebuild provided by Adelie Linux allows for two USE flags: 812The mpich ebuild provided by Adelie Linux allows for two USE flags:
813<e>doc</e> and <e>crypt</e>. <e>doc</e> will cause documentation to be 813<e>doc</e> and <e>crypt</e>. <e>doc</e> will cause documentation to be
814installed, while <e>crypt</e> will configure MPICH to use <c>ssh</c> instead 814installed, while <e>crypt</e> will configure MPICH to use <c>ssh</c> instead
815of <c>rsh</c>. 815of <c>rsh</c>.
816</p> 816</p>
817 817
818<pre caption="Installing the mpich application"> 818<pre caption="Installing the mpich application">
819# <i>emerge -p mpich</i> 819# <i>emerge -p mpich</i>
820# <i>emerge mpich</i> 820# <i>emerge mpich</i>
821</pre> 821</pre>
822 822
823<p> 823<p>
824You may need to export a mpich work directory to all your slave nodes in 824You may need to export a mpich work directory to all your slave nodes in
825<path>/etc/exports</path>: 825<path>/etc/exports</path>:
826</p> 826</p>
827 827
828<pre caption="/etc/exports"> 828<pre caption="/etc/exports">
829/home *(rw) 829/home *(rw)
830</pre> 830</pre>
831 831
832<p> 832<p>
833Most massively parallel processors (MPPs) provide a way to start a program on 833Most massively parallel processors (MPPs) provide a way to start a program on
834a requested number of processors; <c>mpirun</c> makes use of the appropriate 834a requested number of processors; <c>mpirun</c> makes use of the appropriate
835command whenever possible. In contrast, workstation clusters require that each 835command whenever possible. In contrast, workstation clusters require that each
836process in a parallel job be started individually, though programs to help 836process in a parallel job be started individually, though programs to help
837start these processes exist. Because workstation clusters are not already 837start these processes exist. Because workstation clusters are not already
838organized as an MPP, additional information is required to make use of them. 838organized as an MPP, additional information is required to make use of them.
839Mpich should be installed with a list of participating workstations in the 839Mpich should be installed with a list of participating workstations in the
840file <path>machines.LINUX</path> in the directory 840file <path>machines.LINUX</path> in the directory
841<path>/usr/share/mpich/</path>. This file is used by <c>mpirun</c> to choose 841<path>/usr/share/mpich/</path>. This file is used by <c>mpirun</c> to choose
842processors to run on. 842processors to run on.
843</p> 843</p>
844 844
845<p> 845<p>
846Edit this file to reflect your cluster-lan configuration: 846Edit this file to reflect your cluster-lan configuration:
847</p> 847</p>
848 848
849<pre caption="/usr/share/mpich/machines.LINUX"> 849<pre caption="/usr/share/mpich/machines.LINUX">
850# Change this file to contain the machines that you want to use 850# Change this file to contain the machines that you want to use
851# to run MPI jobs on. The format is one host name per line, with either 851# to run MPI jobs on. The format is one host name per line, with either
852# hostname 852# hostname
853# or 853# or
854# hostname:n 854# hostname:n
855# where n is the number of processors in an SMP. The hostname should 855# where n is the number of processors in an SMP. The hostname should
856# be the same as the result from the command "hostname" 856# be the same as the result from the command "hostname"
857master 857master
858node01 858node01
859node02 859node02
860# node03 860# node03
861# node04 861# node04
862# ... 862# ...
863</pre> 863</pre>
864 864
865<p> 865<p>
866Use the script <c>tstmachines</c> in <path>/usr/sbin/</path> to ensure that 866Use the script <c>tstmachines</c> in <path>/usr/sbin/</path> to ensure that
867you can use all of the machines that you have listed. This script performs 867you can use all of the machines that you have listed. This script performs
868an <c>rsh</c> and a short directory listing; this tests that you both have 868an <c>rsh</c> and a short directory listing; this tests that you both have
869access to the node and that a program in the current directory is visible on 869access to the node and that a program in the current directory is visible on
870the remote node. If there are any problems, they will be listed. These 870the remote node. If there are any problems, they will be listed. These
871problems must be fixed before proceeding. 871problems must be fixed before proceeding.
872</p> 872</p>
873 873
874<p> 874<p>
875The only argument to <c>tstmachines</c> is the name of the architecture; this 875The only argument to <c>tstmachines</c> is the name of the architecture; this
876is the same name as the extension on the machines file. For example, the 876is the same name as the extension on the machines file. For example, the
877following tests that a program in the current directory can be executed by 877following tests that a program in the current directory can be executed by
878all of the machines in the LINUX machines list. 878all of the machines in the LINUX machines list.
879</p> 879</p>
880 880
881<pre caption="Running a test"> 881<pre caption="Running a test">
882# <i>/usr/local/mpich/sbin/tstmachines LINUX</i> 882# <i>/usr/local/mpich/sbin/tstmachines LINUX</i>
883</pre> 883</pre>
884 884
885<note> 885<note>
886This program is silent if all is well; if you want to see what it is doing, 886This program is silent if all is well; if you want to see what it is doing,
887use the -v (for verbose) argument: 887use the -v (for verbose) argument:
888</note> 888</note>
889 889
890<pre caption="Running a test verbosively"> 890<pre caption="Running a test verbosively">
891# <i>/usr/local/mpich/sbin/tstmachines -v LINUX</i> 891# <i>/usr/local/mpich/sbin/tstmachines -v LINUX</i>
903Trying user program on host1.uoffoo.edu ... 903Trying user program on host1.uoffoo.edu ...
904Trying user program on host2.uoffoo.edu ... 904Trying user program on host2.uoffoo.edu ...
905</pre> 905</pre>
906 906
907<p> 907<p>
908If <c>tstmachines</c> finds a problem, it will suggest possible reasons and 908If <c>tstmachines</c> finds a problem, it will suggest possible reasons and
909solutions. In brief, there are three tests: 909solutions. In brief, there are three tests:
910</p> 910</p>
911 911
912<ul> 912<ul>
913 <li> 913 <li>
914 <e>Can processes be started on remote machines?</e> tstmachines attempts 914 <e>Can processes be started on remote machines?</e> tstmachines attempts
915 to run the shell command true on each machine in the machines files by 915 to run the shell command true on each machine in the machines files by
916 using the remote shell command. 916 using the remote shell command.
917 </li> 917 </li>
918 <li> 918 <li>
919 <e>Is current working directory available to all machines?</e> This 919 <e>Is current working directory available to all machines?</e> This
920 attempts to ls a file that tstmachines creates by running ls using the 920 attempts to ls a file that tstmachines creates by running ls using the
921 remote shell command. 921 remote shell command.
922 </li> 922 </li>
923 <li> 923 <li>
924 <e>Can user programs be run on remote systems?</e> This checks that shared 924 <e>Can user programs be run on remote systems?</e> This checks that shared
925 libraries and other components have been properly installed on all 925 libraries and other components have been properly installed on all
926 machines. 926 machines.
927 </li> 927 </li>
928</ul> 928</ul>
929 929
930<p> 930<p>
937# <i>make hello++</i> 937# <i>make hello++</i>
938# <i>mpirun -machinefile /usr/share/mpich/machines.LINUX -np 1 hello++</i> 938# <i>mpirun -machinefile /usr/share/mpich/machines.LINUX -np 1 hello++</i>
939</pre> 939</pre>
940 940
941<p> 941<p>
942For further information on MPICH, consult the documentation at <uri 942For further information on MPICH, consult the documentation at <uri
943link="http://www-unix.mcs.anl.gov/mpi/mpich/docs/mpichman-chp4/mpichman-chp4.htm">http://www-unix.mcs.anl.gov/mpi/mpich/docs/mpichman-chp4/mpichman-chp4.htm</uri>. 943link="http://www-unix.mcs.anl.gov/mpi/mpich/docs/mpichman-chp4/mpichman-chp4.htm">http://www-unix.mcs.anl.gov/mpi/mpich/docs/mpichman-chp4/mpichman-chp4.htm</uri>.
944</p> 944</p>
945 945
946</body> 946</body>
947</section> 947</section>
971<title>Bibliography</title> 971<title>Bibliography</title>
972<section> 972<section>
973<body> 973<body>
974 974
975<p> 975<p>
976The original document is published at the <uri 976The original document is published at the <uri
977link="http://www.adelielinux.com">Adelie Linux R&amp;D Centre</uri> web site, 977link="http://www.adelielinux.com">Adelie Linux R&amp;D Centre</uri> web site,
978and is reproduced here with the permission of the authors and <uri 978and is reproduced here with the permission of the authors and <uri
979link="http://www.cyberlogic.ca">Cyberlogic</uri>'s Adelie Linux R&amp;D 979link="http://www.cyberlogic.ca">Cyberlogic</uri>'s Adelie Linux R&amp;D
980Centre. 980Centre.
981</p> 981</p>
982 982
983<ul> 983<ul>
984 <li><uri>http://www.gentoo.org</uri>, Gentoo Foundation, Inc.</li> 984 <li><uri>http://www.gentoo.org</uri>, Gentoo Foundation, Inc.</li>
985 <li> 985 <li>
986 <uri link="http://www.adelielinux.com">http://www.adelielinux.com</uri>, 986 <uri link="http://www.adelielinux.com">http://www.adelielinux.com</uri>,
987 Adelie Linux Research and Development Centre 987 Adelie Linux Research and Development Centre
988 </li> 988 </li>
989 <li> 989 <li>
990 <uri link="http://nfs.sourceforge.net/">http://nfs.sourceforge.net</uri>, 990 <uri link="http://nfs.sourceforge.net/">http://nfs.sourceforge.net</uri>,
991 Linux NFS Project 991 Linux NFS Project
992 </li> 992 </li>
993 <li> 993 <li>
994 <uri link="http://www-unix.mcs.anl.gov/mpi/mpich/">http://www-unix.mcs.anl.gov/mpi/mpich/</uri>, 994 <uri link="http://www-unix.mcs.anl.gov/mpi/mpich/">http://www-unix.mcs.anl.gov/mpi/mpich/</uri>,
995 Mathematics and Computer Science Division, Argonne National Laboratory 995 Mathematics and Computer Science Division, Argonne National Laboratory
996 </li> 996 </li>
997 <li> 997 <li>
998 <uri link="http://www.ntp.org/">http://ntp.org</uri> 998 <uri link="http://www.ntp.org/">http://ntp.org</uri>
999 </li> 999 </li>
1000 <li> 1000 <li>
1001 <uri link="http://www.eecis.udel.edu/~mills/">http://www.eecis.udel.edu/~mills/</uri>, 1001 <uri link="http://www.eecis.udel.edu/~mills/">http://www.eecis.udel.edu/~mills/</uri>,
1002 David L. Mills, University of Delaware 1002 David L. Mills, University of Delaware
1003 </li> 1003 </li>
1004 <li> 1004 <li>
1005 <uri link="http://www.ietf.org/html.charters/secsh-charter.html">http://www.ietf.org/html.charters/secsh-charter.html</uri>, 1005 <uri link="http://www.ietf.org/html.charters/secsh-charter.html">http://www.ietf.org/html.charters/secsh-charter.html</uri>,
1006 Secure Shell Working Group, IETF, Internet Society 1006 Secure Shell Working Group, IETF, Internet Society
1007 </li> 1007 </li>
1008 <li> 1008 <li>
1009 <uri link="http://www.linuxsecurity.com/">http://www.linuxsecurity.com/</uri>, 1009 <uri link="http://www.linuxsecurity.com/">http://www.linuxsecurity.com/</uri>,
1010 Guardian Digital 1010 Guardian Digital
1011 </li> 1011 </li>
1012 <li> 1012 <li>
1013 <uri link="http://www.openpbs.org/">http://www.openpbs.org/</uri>, 1013 <uri link="http://www.openpbs.org/">http://www.openpbs.org/</uri>,
1014 Altair Grid Technologies, LLC. 1014 Altair Grid Technologies, LLC.
1015 </li> 1015 </li>
1016</ul> 1016</ul>
1017 1017
1018</body> 1018</body>

Legend:
Removed from v.1.13  
changed lines
  Added in v.1.14

  ViewVC Help
Powered by ViewVC 1.1.20