/[gentoo]/xml/htdocs/doc/en/hpc-howto.xml
Gentoo

Diff of /xml/htdocs/doc/en/hpc-howto.xml

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

Revision 1.5 Revision 1.6
1<?xml version='1.0' encoding="UTF-8"?> 1<?xml version='1.0' encoding="UTF-8"?>
2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/hpc-howto.xml,v 1.5 2005/10/04 19:05:41 rane Exp $ --> 2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/hpc-howto.xml,v 1.6 2005/10/04 22:39:47 rane Exp $ -->
3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> 3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd">
4 4
5<guide link="/doc/en/hpc-howto.xml"> 5<guide link="/doc/en/hpc-howto.xml">
6<title>High Performance Computing on Gentoo Linux</title> 6<title>High Performance Computing on Gentoo Linux</title>
7 7
8<author title="Author"> 8<author title="Author">
9 <mail link="marc@adelielinux.com">Marc St-Pierre</mail> 9 <mail link="marc@adelielinux.com">Marc St-Pierre</mail>
10</author> 10</author>
11<author title="Author"> 11<author title="Author">
12 <mail link="benoit@adelielinux.com">Benoit Morin</mail> 12 <mail link="benoit@adelielinux.com">Benoit Morin</mail>
13</author> 13</author>
14<author title="Assistant/Research"> 14<author title="Assistant/Research">
15 <mail link="jean-francois@adelielinux.com">Jean-Francois Richard</mail> 15 <mail link="jean-francois@adelielinux.com">Jean-Francois Richard</mail>
16</author> 16</author>
17<author title="Assistant/Research"> 17<author title="Assistant/Research">
150<p> 150<p>
151A cluster is composed of two node types: master and slave. Typically, your 151A cluster is composed of two node types: master and slave. Typically, your
152cluster will have one master node and several slave nodes. 152cluster will have one master node and several slave nodes.
153</p> 153</p>
154 154
155<p> 155<p>
156The master node is the cluster's server. It is responsible for telling the 156The master node is the cluster's server. It is responsible for telling the
157slave nodes what to do. This server will typically run such daemons as dhcpd, 157slave nodes what to do. This server will typically run such daemons as dhcpd,
158nfs, pbs-server, and pbs-sched. Your master node will allow interactive 158nfs, pbs-server, and pbs-sched. Your master node will allow interactive
159sessions for users, and accept job executions. 159sessions for users, and accept job executions.
160</p> 160</p>
161 161
162<p> 162<p>
163The slave nodes listen for instructions (via ssh/rsh perhaps) from the master 163The slave nodes listen for instructions (via ssh/rsh perhaps) from the master
164node. They should be dedicated to crunching results and therefore should not 164node. They should be dedicated to crunching results and therefore should not
165run any unecessary services. 165run any unnecessary services.
166</p> 166</p>
167 167
168<p> 168<p>
169The rest of this documentation will assume a cluster configuration as per the 169The rest of this documentation will assume a cluster configuration as per the
170hosts file below. You should maintain on every node such a hosts file 170hosts file below. You should maintain on every node such a hosts file
171(<path>/etc/hosts</path>) with entries for each node participating node in the 171(<path>/etc/hosts</path>) with entries for each node participating node in the
172cluster. 172cluster.
173</p> 173</p>
174 174
175<pre caption="/etc/hosts"> 175<pre caption="/etc/hosts">
176# Adelie Linux Research &amp; Development Center 176# Adelie Linux Research &amp; Development Center
177# /etc/hosts 177# /etc/hosts
178 178
179127.0.0.1 localhost 179127.0.0.1 localhost
180 180
181192.168.1.100 master.adelie master 181192.168.1.100 master.adelie master
182 182
183192.168.1.1 node01.adelie node01 183192.168.1.1 node01.adelie node01
184192.168.1.2 node02.adelie node02 184192.168.1.2 node02.adelie node02
185</pre> 185</pre>
186 186
187<p> 187<p>
188To setup your cluster dedicated LAN, edit your <path>/etc/conf.d/net</path> 188To setup your cluster dedicated LAN, edit your <path>/etc/conf.d/net</path>
189file on the master node. 189file on the master node.
190</p> 190</p>
191 191
192<pre caption="/etc/conf.d/net"> 192<pre caption="/etc/conf.d/net">
193# Copyright 1999-2002 Gentoo Technologies, Inc. 193# Copyright 1999-2002 Gentoo Technologies, Inc.
194# Distributed under the terms of the GNU General Public License, v2 or later 194# Distributed under the terms of the GNU General Public License, v2 or later
195 195
196# Global config file for net.* rc-scripts 196# Global config file for net.* rc-scripts
197 197
198# This is basically the ifconfig argument without the ifconfig $iface 198# This is basically the ifconfig argument without the ifconfig $iface
199# 199#
211 211
212<pre caption="/etc/dhcp/dhcpd.conf"> 212<pre caption="/etc/dhcp/dhcpd.conf">
213# Adelie Linux Research &amp; Development Center 213# Adelie Linux Research &amp; Development Center
214# /etc/dhcp/dhcpd.conf 214# /etc/dhcp/dhcpd.conf
215 215
216log-facility local7; 216log-facility local7;
217ddns-update-style none; 217ddns-update-style none;
218use-host-decl-names on; 218use-host-decl-names on;
219 219
220subnet 192.168.1.0 netmask 255.255.255.0 { 220subnet 192.168.1.0 netmask 255.255.255.0 {
221 option domain-name "adelie"; 221 option domain-name "adelie";
222 range 192.168.1.10 192.168.1.99; 222 range 192.168.1.10 192.168.1.99;
223 option routers 192.168.1.100; 223 option routers 192.168.1.100;
224 224
225 host node01.adelie { 225 host node01.adelie {
226 # MAC address of network card on node 01 226 # MAC address of network card on node 01
227 hardware ethernet 00:07:e9:0f:e2:d4; 227 hardware ethernet 00:07:e9:0f:e2:d4;
228 fixed-address 192.168.1.1; 228 fixed-address 192.168.1.1;
229 } 229 }
230 host node02.adelie { 230 host node02.adelie {
231 # MAC address of network card on node 02 231 # MAC address of network card on node 02
232 hardware ethernet 00:07:e9:0f:e2:6b; 232 hardware ethernet 00:07:e9:0f:e2:6b;
233 fixed-address 192.168.1.2; 233 fixed-address 192.168.1.2;
234 } 234 }
235} 235}
236</pre> 236</pre>
237 237
238</body> 238</body>
239</section> 239</section>
240<section> 240<section>
241<title>NFS/NIS</title> 241<title>NFS/NIS</title>
242<body> 242<body>
243 243
244<p> 244<p>
245The Network File System (NFS) was developed to allow machines to mount a disk 245The Network File System (NFS) was developed to allow machines to mount a disk
246partition on a remote machine as if it were on a local hard drive. This allows 246partition on a remote machine as if it were on a local hard drive. This allows
279CONFIG_LOCKD_V4=y 279CONFIG_LOCKD_V4=y
280</pre> 280</pre>
281 281
282<p> 282<p>
283On the master node, edit your <path>/etc/hosts.allow</path> file to allow 283On the master node, edit your <path>/etc/hosts.allow</path> file to allow
284connections from slave nodes. If your cluster LAN is on 192.168.1.0/24, 284connections from slave nodes. If your cluster LAN is on 192.168.1.0/24,
285your <path>hosts.allow</path> will look like: 285your <path>hosts.allow</path> will look like:
286</p> 286</p>
287 287
288<pre caption="hosts.allow"> 288<pre caption="hosts.allow">
289portmap:192.168.1.0/255.255.255.0 289portmap:192.168.1.0/255.255.255.0
290</pre> 290</pre>
291 291
292<p> 292<p>
293Edit the <path>/etc/exports</path> file of the master node to export a work 293Edit the <path>/etc/exports</path> file of the master node to export a work
294directory struture (/home is good for this). 294directory structure (/home is good for this).
295</p> 295</p>
296 296
297<pre caption="/etc/exports"> 297<pre caption="/etc/exports">
298/home/ *(rw) 298/home/ *(rw)
299</pre> 299</pre>
300 300
301<p> 301<p>
302Add nfs to your master node's default runlevel: 302Add nfs to your master node's default runlevel:
303</p> 303</p>
304 304
305<pre caption="Adding NFS to the default runlevel"> 305<pre caption="Adding NFS to the default runlevel">
306# <i>rc-update add nfs default</i> 306# <i>rc-update add nfs default</i>
307</pre> 307</pre>
308 308
309<p> 309<p>
310To mount the nfs exported filesystem from the master, you also have to 310To mount the nfs exported filesystem from the master, you also have to
311configure your salve nodes' <path>/etc/fstab</path>. Add a line like this 311configure your salve nodes' <path>/etc/fstab</path>. Add a line like this
312one: 312one:
313</p> 313</p>
314 314
315<pre caption="/etc/fstab"> 315<pre caption="/etc/fstab">
316master:/home/ /home nfs rw,exec,noauto,nouser,async 0 0 316master:/home/ /home nfs rw,exec,noauto,nouser,async 0 0
317</pre> 317</pre>
318 318
319<p> 319<p>
320You'll also need to set up your nodes so that they mount the nfs filesystem by 320You'll also need to set up your nodes so that they mount the nfs filesystem by
321issuing this command: 321issuing this command:
322</p> 322</p>
323 323
324<pre caption="Adding nfsmount to the default runlevel"> 324<pre caption="Adding nfsmount to the default runlevel">
325# <i>rc-update add nfsmount default</i> 325# <i>rc-update add nfsmount default</i>
326</pre> 326</pre>
327 327
328</body> 328</body>
329</section> 329</section>
330<section> 330<section>
331<title>RSH/SSH</title> 331<title>RSH/SSH</title>
338systems, and the private key which is kept on the local system, is done first 338systems, and the private key which is kept on the local system, is done first
339to configure OpenSSH on the cluster. 339to configure OpenSSH on the cluster.
340</p> 340</p>
341 341
342<p> 342<p>
343For transparent cluster usage, private/public keys may be used. This process 343For transparent cluster usage, private/public keys may be used. This process
344has two steps: 344has two steps:
345</p> 345</p>
346 346
347<ul> 347<ul>
348 <li>Generate public and private keys</li> 348 <li>Generate public and private keys</li>
349 <li>Copy public key to slave nodes</li> 349 <li>Copy public key to slave nodes</li>
350</ul> 350</ul>
351 351
352<p> 352<p>
353For user based authentification, generate and copy as follows: 353For user based authentication, generate and copy as follows:
354</p> 354</p>
355 355
356<pre caption="SSH key authentication"> 356<pre caption="SSH key authentication">
357# <i>ssh-keygen -t dsa</i> 357# <i>ssh-keygen -t dsa</i>
358Generating public/private dsa key pair. 358Generating public/private dsa key pair.
359Enter file in which to save the key (/root/.ssh/id_dsa): /root/.ssh/id_dsa 359Enter file in which to save the key (/root/.ssh/id_dsa): /root/.ssh/id_dsa
360Enter passphrase (empty for no passphrase): 360Enter passphrase (empty for no passphrase):
361Enter same passphrase again: 361Enter same passphrase again:
362Your identification has been saved in /root/.ssh/id_dsa. 362Your identification has been saved in /root/.ssh/id_dsa.
363Your public key has been saved in /root/.ssh/id_dsa.pub. 363Your public key has been saved in /root/.ssh/id_dsa.pub.
364The key fingerprint is: 364The key fingerprint is:
365f1:45:15:40:fd:3c:2d:f7:9f:ea:55:df:76:2f:a4:1f root@master 365f1:45:15:40:fd:3c:2d:f7:9f:ea:55:df:76:2f:a4:1f root@master
366 366
367<comment>WARNING! If you already have an "authorized_keys" file, 367<comment>WARNING! If you already have an "authorized_keys" file,
368please append to it, do not use the following command.</comment> 368please append to it, do not use the following command.</comment>
369 369
370# <i>scp /root/.ssh/id_dsa.pub node01:/root/.ssh/authorized_keys</i> 370# <i>scp /root/.ssh/id_dsa.pub node01:/root/.ssh/authorized_keys</i>
371root@master's password: 371root@master's password:
372id_dsa.pub 100% 234 2.0MB/s 00:00 372id_dsa.pub 100% 234 2.0MB/s 00:00
373 373
374# <i>scp /root/.ssh/id_dsa.pub node02:/root/.ssh/authorized_keys</i> 374# <i>scp /root/.ssh/id_dsa.pub node02:/root/.ssh/authorized_keys</i>
375root@master's password: 375root@master's password:
376id_dsa.pub 100% 234 2.0MB/s 00:00 376id_dsa.pub 100% 234 2.0MB/s 00:00
377</pre> 377</pre>
378 378
379<note> 379<note>
380Host keys must have an empty passphrase. RSA is required for host-based 380Host keys must have an empty passphrase. RSA is required for host-based
381authentification. 381authentication.
382</note> 382</note>
383 383
384<p> 384<p>
385For host based authentication, you will also need to edit your 385For host based authentication, you will also need to edit your
386<path>/etc/ssh/shosts.equiv</path>. 386<path>/etc/ssh/shosts.equiv</path>.
387</p> 387</p>
388 388
389<pre caption="/etc/ssh/shosts.equiv"> 389<pre caption="/etc/ssh/shosts.equiv">
390node01.adelie 390node01.adelie
391node02.adelie 391node02.adelie
392master.adelie 392master.adelie
393</pre> 393</pre>
394 394
395<p> 395<p>
396And a few modifications to the <path>/etc/ssh/sshd_config</path> file: 396And a few modifications to the <path>/etc/ssh/sshd_config</path> file:
453in.rshd:192.168.1.0/255.255.255.0 453in.rshd:192.168.1.0/255.255.255.0
454</pre> 454</pre>
455 455
456<p> 456<p>
457Or you can simply trust your cluster LAN: 457Or you can simply trust your cluster LAN:
458</p> 458</p>
459 459
460<pre caption="hosts.allow"> 460<pre caption="hosts.allow">
461# Adelie Linux Research &amp; Development Center 461# Adelie Linux Research &amp; Development Center
462# /etc/hosts.allow 462# /etc/hosts.allow
463 463
464ALL:192.168.1.0/255.255.255.0 464ALL:192.168.1.0/255.255.255.0
465</pre> 465</pre>
466 466
467<p> 467<p>
468Finally, configure host authentification from <path>/etc/hosts.equiv</path>. 468Finally, configure host authentication from <path>/etc/hosts.equiv</path>.
469</p> 469</p>
470 470
471<pre caption="hosts.equiv"> 471<pre caption="hosts.equiv">
472# Adelie Linux Research &amp; Development Center 472# Adelie Linux Research &amp; Development Center
473# /etc/hosts.equiv 473# /etc/hosts.equiv
474 474
475master 475master
476node01 476node01
477node02 477node02
478</pre> 478</pre>
479 479
480<p> 480<p>
481And, add xinetd to your default runlevel: 481And, add xinetd to your default runlevel:
482</p> 482</p>
483 483
524# - read each of the comments above each of the variable 524# - read each of the comments above each of the variable
525 525
526# Comment this out if you dont want the init script to warn 526# Comment this out if you dont want the init script to warn
527# about not having ntpdate setup 527# about not having ntpdate setup
528NTPDATE_WARN="n" 528NTPDATE_WARN="n"
529 529
530# Command to run to set the clock initially 530# Command to run to set the clock initially
531# Most people should just uncomment this line ... 531# Most people should just uncomment this line ...
532# however, if you know what you're doing, and you 532# however, if you know what you're doing, and you
533# want to use ntpd to set the clock, change this to 'ntpd' 533# want to use ntpd to set the clock, change this to 'ntpd'
534NTPDATE_CMD="ntpdate" 534NTPDATE_CMD="ntpdate"
535 535
536# Options to pass to the above command 536# Options to pass to the above command
537# Most people should just uncomment this variable and 537# Most people should just uncomment this variable and
538# change 'someserver' to a valid hostname which you 538# change 'someserver' to a valid hostname which you
539# can aquire from the URL's below 539# can acquire from the URL's below
540NTPDATE_OPTS="-b ntp1.cmc.ec.gc.ca" 540NTPDATE_OPTS="-b ntp1.cmc.ec.gc.ca"
541 541
542## 542##
543# A list of available servers is available here: 543# A list of available servers is available here:
544# http://www.eecis.udel.edu/~mills/ntp/servers.html 544# http://www.eecis.udel.edu/~mills/ntp/servers.html
545# Please follow the rules of engagement and use a 545# Please follow the rules of engagement and use a
546# Stratum 2 server (unless you qualify for Stratum 1) 546# Stratum 2 server (unless you qualify for Stratum 1)
547## 547##
548 548
549# Options to pass to the ntpd process that will *always* be run 549# Options to pass to the ntpd process that will *always* be run
550# Most people should not uncomment this line ... 550# Most people should not uncomment this line ...
551# however, if you know what you're doing, feel free to tweak 551# however, if you know what you're doing, feel free to tweak
552#NTPD_OPTS="" 552#NTPD_OPTS=""
553 553
554</pre> 554</pre>
645CONFIG_IP_NF_MATCH_STATE=y 645CONFIG_IP_NF_MATCH_STATE=y
646CONFIG_IP_NF_FILTER=y 646CONFIG_IP_NF_FILTER=y
647CONFIG_IP_NF_TARGET_REJECT=y 647CONFIG_IP_NF_TARGET_REJECT=y
648CONFIG_IP_NF_NAT=y 648CONFIG_IP_NF_NAT=y
649CONFIG_IP_NF_NAT_NEEDED=y 649CONFIG_IP_NF_NAT_NEEDED=y
650CONFIG_IP_NF_TARGET_MASQUERADE=y 650CONFIG_IP_NF_TARGET_MASQUERADE=y
651CONFIG_IP_NF_TARGET_LOG=y 651CONFIG_IP_NF_TARGET_LOG=y
652</pre> 652</pre>
653 653
654<p> 654<p>
655And the rules required for this firewall: 655And the rules required for this firewall:
656</p> 656</p>
657 657
658<pre caption="rule-save"> 658<pre caption="rule-save">
659# Adelie Linux Research &amp; Development Center 659# Adelie Linux Research &amp; Development Center
660# /var/lib/iptbles/rule-save 660# /var/lib/iptables/rule-save
661 661
662*filter 662*filter
663:INPUT ACCEPT [0:0] 663:INPUT ACCEPT [0:0]
664:FORWARD ACCEPT [0:0] 664:FORWARD ACCEPT [0:0]
665:OUTPUT ACCEPT [0:0] 665:OUTPUT ACCEPT [0:0]
666-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT 666-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
667-A INPUT -p tcp -m tcp --dport 22 -j ACCEPT 667-A INPUT -p tcp -m tcp --dport 22 -j ACCEPT
668-A INPUT -s 192.168.1.0/255.255.255.0 -i eth1 -j ACCEPT 668-A INPUT -s 192.168.1.0/255.255.255.0 -i eth1 -j ACCEPT
669-A INPUT -s 127.0.0.1 -i lo -j ACCEPT 669-A INPUT -s 127.0.0.1 -i lo -j ACCEPT
670-A INPUT -p icmp -j ACCEPT 670-A INPUT -p icmp -j ACCEPT
671-A INPUT -j LOG 671-A INPUT -j LOG
672-A INPUT -j REJECT --reject-with icmp-port-unreachable 672-A INPUT -j REJECT --reject-with icmp-port-unreachable
673COMMIT 673COMMIT
674*nat 674*nat
675:PREROUTING ACCEPT [0:0] 675:PREROUTING ACCEPT [0:0]
708<pre caption="Installing openpbs"> 708<pre caption="Installing openpbs">
709# <i>emerge -p openpbs</i> 709# <i>emerge -p openpbs</i>
710</pre> 710</pre>
711 711
712<note> 712<note>
713OpenPBS ebuild does not currently set proper permissions on var-directories 713OpenPBS ebuild does not currently set proper permissions on var-directories
714used by OpenPBS. 714used by OpenPBS.
715</note> 715</note>
716 716
717<p> 717<p>
718Before starting using OpenPBS, some configurations are required. The files 718Before starting using OpenPBS, some configurations are required. The files
719you will need to personalize for your system are: 719you will need to personalize for your system are:
720</p> 720</p>
721 721
722<ul> 722<ul>
723 <li>/etc/pbs_environment</li> 723 <li>/etc/pbs_environment</li>
724 <li>/var/spool/PBS/server_name</li> 724 <li>/var/spool/PBS/server_name</li>
725 <li>/var/spool/PBS/server_priv/nodes</li> 725 <li>/var/spool/PBS/server_priv/nodes</li>
726 <li>/var/spool/PBS/mom_priv/config</li> 726 <li>/var/spool/PBS/mom_priv/config</li>
727 <li>/var/spool/PBS/sched_priv/sched_config</li> 727 <li>/var/spool/PBS/sched_priv/sched_config</li>
728</ul> 728</ul>
729 729
730<p> 730<p>
731Here is a sample sched_config: 731Here is a sample sched_config:
732</p> 732</p>
733 733
734<pre caption="/var/spool/PBS/sched_priv/sched_config"> 734<pre caption="/var/spool/PBS/sched_priv/sched_config">
735# 735#
736# Create queues and set their attributes. 736# Create queues and set their attributes.
737# 737#
738# 738#
739# Create and define queue upto4nodes 739# Create and define queue upto4nodes
740# 740#
741create queue upto4nodes 741create queue upto4nodes
742set queue upto4nodes queue_type = Execution 742set queue upto4nodes queue_type = Execution
758# 758#
759set server scheduling = True 759set server scheduling = True
760set server acl_host_enable = True 760set server acl_host_enable = True
761set server default_queue = default 761set server default_queue = default
762set server log_events = 511 762set server log_events = 511
763set server mail_from = adm 763set server mail_from = adm
764set server query_other_jobs = True 764set server query_other_jobs = True
765set server resources_default.neednodes = 1 765set server resources_default.neednodes = 1
766set server resources_default.nodect = 1 766set server resources_default.nodect = 1
767set server resources_default.nodes = 1 767set server resources_default.nodes = 1
768set server scheduler_iteration = 60 768set server scheduler_iteration = 60
769</pre> 769</pre>
770 770
771<p> 771<p>
772To submit a task to OpenPBS, the command <c>qsub</c> is used with some 772To submit a task to OpenPBS, the command <c>qsub</c> is used with some
773optional parameters. In the exemple below, "-l" allows you to specify 773optional parameters. In the example below, "-l" allows you to specify
774the resources required, "-j" provides for redirection of standard out and 774the resources required, "-j" provides for redirection of standard out and
775standard error, and the "-m" will e-mail the user at begining (b), end (e) 775standard error, and the "-m" will e-mail the user at beginning (b), end (e)
776and on abort (a) of the job. 776and on abort (a) of the job.
777</p> 777</p>
778 778
779<pre caption="Submitting a task"> 779<pre caption="Submitting a task">
780<comment>(submit and request from OpenPBS that myscript be executed on 2 nodes)</comment> 780<comment>(submit and request from OpenPBS that myscript be executed on 2 nodes)</comment>
781# <i>qsub -l nodes=2 -j oe -m abe myscript</i> 781# <i>qsub -l nodes=2 -j oe -m abe myscript</i>
782</pre> 782</pre>
783 783
784<p> 784<p>
785Normally jobs submitted to OpenPBS are in the form of scripts. Sometimes, you 785Normally jobs submitted to OpenPBS are in the form of scripts. Sometimes, you
786may want to try a task manually. To request an interactive shell from OpenPBS, 786may want to try a task manually. To request an interactive shell from OpenPBS,
787use the "-I" parameter. 787use the "-I" parameter.
788</p> 788</p>
789 789
790<pre caption="Requesting an interactive shell"> 790<pre caption="Requesting an interactive shell">
821installed, while <e>crypt</e> will configure MPICH to use <c>ssh</c> instead 821installed, while <e>crypt</e> will configure MPICH to use <c>ssh</c> instead
822of <c>rsh</c>. 822of <c>rsh</c>.
823</p> 823</p>
824 824
825<pre caption="Installing the mpich application"> 825<pre caption="Installing the mpich application">
826# <i>emerge -p mpich</i> 826# <i>emerge -p mpich</i>
827# <i>emerge mpich</i> 827# <i>emerge mpich</i>
828</pre> 828</pre>
829 829
830<p> 830<p>
831You may need to export a mpich work directory to all your slave nodes in 831You may need to export a mpich work directory to all your slave nodes in
832<path>/etc/exports</path>: 832<path>/etc/exports</path>:
833</p> 833</p>
834 834
835<pre caption="/etc/exports"> 835<pre caption="/etc/exports">
836/home *(rw) 836/home *(rw)
837</pre> 837</pre>
838 838
839<p> 839<p>
840Most massively parallel processors (MPPs) provide a way to start a program on 840Most massively parallel processors (MPPs) provide a way to start a program on
841a requested number of processors; <c>mpirun</c> makes use of the appropriate 841a requested number of processors; <c>mpirun</c> makes use of the appropriate
842command whenever possible. In contrast, workstation clusters require that each 842command whenever possible. In contrast, workstation clusters require that each
843process in a parallel job be started individually, though programs to help 843process in a parallel job be started individually, though programs to help
844start these processes exist. Because workstation clusters are not already 844start these processes exist. Because workstation clusters are not already
845organized as an MPP, additional information is required to make use of them. 845organized as an MPP, additional information is required to make use of them.
846Mpich should be installed with a list of participating workstations in the 846Mpich should be installed with a list of participating workstations in the
847file <path>machines.LINUX</path> in the directory 847file <path>machines.LINUX</path> in the directory
848<path>/usr/share/mpich/</path>. This file is used by <c>mpirun</c> to choose 848<path>/usr/share/mpich/</path>. This file is used by <c>mpirun</c> to choose
849processors to run on. 849processors to run on.
850</p> 850</p>
851 851
905<pre caption="Output of the above command"> 905<pre caption="Output of the above command">
906Trying true on host1.uoffoo.edu ... 906Trying true on host1.uoffoo.edu ...
907Trying true on host2.uoffoo.edu ... 907Trying true on host2.uoffoo.edu ...
908Trying ls on host1.uoffoo.edu ... 908Trying ls on host1.uoffoo.edu ...
909Trying ls on host2.uoffoo.edu ... 909Trying ls on host2.uoffoo.edu ...
910Trying user program on host1.uoffoo.edu ... 910Trying user program on host1.uoffoo.edu ...
911Trying user program on host2.uoffoo.edu ... 911Trying user program on host2.uoffoo.edu ...
912</pre> 912</pre>
913 913
914<p> 914<p>
915If <c>tstmachines</c> finds a problem, it will suggest possible reasons and 915If <c>tstmachines</c> finds a problem, it will suggest possible reasons and
916solutions. In brief, there are three tests: 916solutions. In brief, there are three tests:
917</p> 917</p>
918 918
919<ul> 919<ul>
920 <li> 920 <li>
921 <e>Can processes be started on remote machines?</e> tstmachines attempts 921 <e>Can processes be started on remote machines?</e> tstmachines attempts
922 to run the shell command true on each machine in the machines files by 922 to run the shell command true on each machine in the machines files by
923 using the remote shell command. 923 using the remote shell command.
924 </li> 924 </li>
925 <li> 925 <li>
926 <e>Is current working directory available to all machines?</e> This 926 <e>Is current working directory available to all machines?</e> This
927 attempts to ls a file that tstmachines creates by running ls using the 927 attempts to ls a file that tstmachines creates by running ls using the
928 remote shell command. 928 remote shell command.
929 </li> 929 </li>
930 <li> 930 <li>
931 <e>Can user programs be run on remote systems?</e> This checks that shared 931 <e>Can user programs be run on remote systems?</e> This checks that shared
932 libraries and other components have been properly installed on all 932 libraries and other components have been properly installed on all
933 machines. 933 machines.
934 </li> 934 </li>
935</ul> 935</ul>
936 936
937<p> 937<p>
938And the required test for every development tool: 938And the required test for every development tool:
939</p> 939</p>
940 940
941<pre caption="Testing a development tool"> 941<pre caption="Testing a development tool">
942# <i>cd ~</i> 942# <i>cd ~</i>
943# <i>cp /usr/share/mpich/examples1/hello++.c ~</i> 943# <i>cp /usr/share/mpich/examples1/hello++.c ~</i>
944# <i>make hello++</i> 944# <i>make hello++</i>
945# <i>mpirun -machinefile /usr/share/mpich/machines.LINUX -np 1 hello++</i> 945# <i>mpirun -machinefile /usr/share/mpich/machines.LINUX -np 1 hello++</i>
976 976
977<chapter> 977<chapter>
978<title>Bibliography</title> 978<title>Bibliography</title>
979<section> 979<section>
980<body> 980<body>
981 981
982<p> 982<p>
983The original document is published at the <uri 983The original document is published at the <uri
984link="http://www.adelielinux.com">Adelie Linux R&amp;D Centre</uri> web site, 984link="http://www.adelielinux.com">Adelie Linux R&amp;D Centre</uri> web site,
985and is reproduced here with the permission of the authors and <uri 985and is reproduced here with the permission of the authors and <uri
986link="http://www.cyberlogic.ca">Cyberlogic</uri>'s Adelie Linux R&amp;D 986link="http://www.cyberlogic.ca">Cyberlogic</uri>'s Adelie Linux R&amp;D
987Centre. 987Centre.
988</p> 988</p>
989 989
990<ul> 990<ul>
991 <li><uri>http://www.gentoo.org</uri>, Gentoo Technologies, Inc.</li> 991 <li><uri>http://www.gentoo.org</uri>, Gentoo Technologies, Inc.</li>
992 <li> 992 <li>
993 <uri link="http://www.adelielinux.com">http://www.adelielinux.com</uri>, 993 <uri link="http://www.adelielinux.com">http://www.adelielinux.com</uri>,
994 Adelie Linux Research and Development Centre 994 Adelie Linux Research and Development Centre
995 </li> 995 </li>
996 <li> 996 <li>
997 <uri link="http://nfs.sourceforge.net/">http://nfs.sourceforge.net</uri>, 997 <uri link="http://nfs.sourceforge.net/">http://nfs.sourceforge.net</uri>,
998 Linux NFS Project 998 Linux NFS Project
999 </li> 999 </li>
1000 <li> 1000 <li>
1001 <uri link="http://www-unix.mcs.anl.gov/mpi/mpich/">http://www-unix.mcs.anl.gov/mpi/mpich/</uri>, 1001 <uri link="http://www-unix.mcs.anl.gov/mpi/mpich/">http://www-unix.mcs.anl.gov/mpi/mpich/</uri>,
1002 Mathematics and Computer Science Division, Argonne National Laboratory 1002 Mathematics and Computer Science Division, Argonne National Laboratory
1003 </li> 1003 </li>
1004 <li> 1004 <li>
1005 <uri link="http://www.ntp.org/">http://ntp.org</uri> 1005 <uri link="http://www.ntp.org/">http://ntp.org</uri>
1006 </li> 1006 </li>
1007 <li> 1007 <li>
1008 <uri link="http://www.eecis.udel.edu/~mills/">http://www.eecis.udel.edu/~mills/</uri>, 1008 <uri link="http://www.eecis.udel.edu/~mills/">http://www.eecis.udel.edu/~mills/</uri>,
1009 David L. Mills, University of Delaware 1009 David L. Mills, University of Delaware
1010 </li> 1010 </li>
1011 <li> 1011 <li>
1012 <uri link="http://www.ietf.org/html.charters/secsh-charter.html">http://www.ietf.org/html.charters/secsh-charter.html</uri>, 1012 <uri link="http://www.ietf.org/html.charters/secsh-charter.html">http://www.ietf.org/html.charters/secsh-charter.html</uri>,
1013 Secure Shell Working Group, IETF, Internet Society 1013 Secure Shell Working Group, IETF, Internet Society
1014 </li> 1014 </li>
1015 <li> 1015 <li>
1016 <uri link="http://www.linuxsecurity.com/">http://www.linuxsecurity.com/</uri>, 1016 <uri link="http://www.linuxsecurity.com/">http://www.linuxsecurity.com/</uri>,
1017 Guardian Digital 1017 Guardian Digital
1018 </li> 1018 </li>
1019 <li> 1019 <li>
1020 <uri link="http://www.openpbs.org/">http://www.openpbs.org/</uri>, 1020 <uri link="http://www.openpbs.org/">http://www.openpbs.org/</uri>,
1021 Altair Grid Technologies, LLC. 1021 Altair Grid Technologies, LLC.
1022 </li> 1022 </li>
1023</ul> 1023</ul>
1024 1024
1025</body> 1025</body>
1026</section> 1026</section>
1027</chapter> 1027</chapter>
1028 1028
1029</guide> 1029</guide>

Legend:
Removed from v.1.5  
changed lines
  Added in v.1.6

  ViewVC Help
Powered by ViewVC 1.1.20