/[gentoo]/xml/htdocs/doc/en/hpc-howto.xml
Gentoo

Diff of /xml/htdocs/doc/en/hpc-howto.xml

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

Revision 1.5 Revision 1.6
1<?xml version='1.0' encoding="UTF-8"?> 1<?xml version='1.0' encoding="UTF-8"?>
2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/hpc-howto.xml,v 1.5 2005/10/04 19:05:41 rane Exp $ --> 2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/hpc-howto.xml,v 1.6 2005/10/04 22:39:47 rane Exp $ -->
3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> 3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd">
4 4
5<guide link="/doc/en/hpc-howto.xml"> 5<guide link="/doc/en/hpc-howto.xml">
6<title>High Performance Computing on Gentoo Linux</title> 6<title>High Performance Computing on Gentoo Linux</title>
7 7
160</p> 160</p>
161 161
162<p> 162<p>
163The slave nodes listen for instructions (via ssh/rsh perhaps) from the master 163The slave nodes listen for instructions (via ssh/rsh perhaps) from the master
164node. They should be dedicated to crunching results and therefore should not 164node. They should be dedicated to crunching results and therefore should not
165run any unecessary services. 165run any unnecessary services.
166</p> 166</p>
167 167
168<p> 168<p>
169The rest of this documentation will assume a cluster configuration as per the 169The rest of this documentation will assume a cluster configuration as per the
170hosts file below. You should maintain on every node such a hosts file 170hosts file below. You should maintain on every node such a hosts file
174 174
175<pre caption="/etc/hosts"> 175<pre caption="/etc/hosts">
176# Adelie Linux Research &amp; Development Center 176# Adelie Linux Research &amp; Development Center
177# /etc/hosts 177# /etc/hosts
178 178
179127.0.0.1 localhost 179127.0.0.1 localhost
180 180
181192.168.1.100 master.adelie master 181192.168.1.100 master.adelie master
182 182
183192.168.1.1 node01.adelie node01 183192.168.1.1 node01.adelie node01
184192.168.1.2 node02.adelie node02 184192.168.1.2 node02.adelie node02
185</pre> 185</pre>
186 186
187<p> 187<p>
188To setup your cluster dedicated LAN, edit your <path>/etc/conf.d/net</path> 188To setup your cluster dedicated LAN, edit your <path>/etc/conf.d/net</path>
189file on the master node. 189file on the master node.
221 option domain-name "adelie"; 221 option domain-name "adelie";
222 range 192.168.1.10 192.168.1.99; 222 range 192.168.1.10 192.168.1.99;
223 option routers 192.168.1.100; 223 option routers 192.168.1.100;
224 224
225 host node01.adelie { 225 host node01.adelie {
226 # MAC address of network card on node 01 226 # MAC address of network card on node 01
227 hardware ethernet 00:07:e9:0f:e2:d4; 227 hardware ethernet 00:07:e9:0f:e2:d4;
228 fixed-address 192.168.1.1; 228 fixed-address 192.168.1.1;
229 } 229 }
230 host node02.adelie { 230 host node02.adelie {
231 # MAC address of network card on node 02 231 # MAC address of network card on node 02
232 hardware ethernet 00:07:e9:0f:e2:6b; 232 hardware ethernet 00:07:e9:0f:e2:6b;
233 fixed-address 192.168.1.2; 233 fixed-address 192.168.1.2;
234 } 234 }
235} 235}
236</pre> 236</pre>
289portmap:192.168.1.0/255.255.255.0 289portmap:192.168.1.0/255.255.255.0
290</pre> 290</pre>
291 291
292<p> 292<p>
293Edit the <path>/etc/exports</path> file of the master node to export a work 293Edit the <path>/etc/exports</path> file of the master node to export a work
294directory struture (/home is good for this). 294directory structure (/home is good for this).
295</p> 295</p>
296 296
297<pre caption="/etc/exports"> 297<pre caption="/etc/exports">
298/home/ *(rw) 298/home/ *(rw)
299</pre> 299</pre>
300 300
301<p> 301<p>
302Add nfs to your master node's default runlevel: 302Add nfs to your master node's default runlevel:
303</p> 303</p>
311configure your salve nodes' <path>/etc/fstab</path>. Add a line like this 311configure your salve nodes' <path>/etc/fstab</path>. Add a line like this
312one: 312one:
313</p> 313</p>
314 314
315<pre caption="/etc/fstab"> 315<pre caption="/etc/fstab">
316master:/home/ /home nfs rw,exec,noauto,nouser,async 0 0 316master:/home/ /home nfs rw,exec,noauto,nouser,async 0 0
317</pre> 317</pre>
318 318
319<p> 319<p>
320You'll also need to set up your nodes so that they mount the nfs filesystem by 320You'll also need to set up your nodes so that they mount the nfs filesystem by
321issuing this command: 321issuing this command:
348 <li>Generate public and private keys</li> 348 <li>Generate public and private keys</li>
349 <li>Copy public key to slave nodes</li> 349 <li>Copy public key to slave nodes</li>
350</ul> 350</ul>
351 351
352<p> 352<p>
353For user based authentification, generate and copy as follows: 353For user based authentication, generate and copy as follows:
354</p> 354</p>
355 355
356<pre caption="SSH key authentication"> 356<pre caption="SSH key authentication">
357# <i>ssh-keygen -t dsa</i> 357# <i>ssh-keygen -t dsa</i>
358Generating public/private dsa key pair. 358Generating public/private dsa key pair.
376id_dsa.pub 100% 234 2.0MB/s 00:00 376id_dsa.pub 100% 234 2.0MB/s 00:00
377</pre> 377</pre>
378 378
379<note> 379<note>
380Host keys must have an empty passphrase. RSA is required for host-based 380Host keys must have an empty passphrase. RSA is required for host-based
381authentification. 381authentication.
382</note> 382</note>
383 383
384<p> 384<p>
385For host based authentication, you will also need to edit your 385For host based authentication, you will also need to edit your
386<path>/etc/ssh/shosts.equiv</path>. 386<path>/etc/ssh/shosts.equiv</path>.
463 463
464ALL:192.168.1.0/255.255.255.0 464ALL:192.168.1.0/255.255.255.0
465</pre> 465</pre>
466 466
467<p> 467<p>
468Finally, configure host authentification from <path>/etc/hosts.equiv</path>. 468Finally, configure host authentication from <path>/etc/hosts.equiv</path>.
469</p> 469</p>
470 470
471<pre caption="hosts.equiv"> 471<pre caption="hosts.equiv">
472# Adelie Linux Research &amp; Development Center 472# Adelie Linux Research &amp; Development Center
473# /etc/hosts.equiv 473# /etc/hosts.equiv
534NTPDATE_CMD="ntpdate" 534NTPDATE_CMD="ntpdate"
535 535
536# Options to pass to the above command 536# Options to pass to the above command
537# Most people should just uncomment this variable and 537# Most people should just uncomment this variable and
538# change 'someserver' to a valid hostname which you 538# change 'someserver' to a valid hostname which you
539# can aquire from the URL's below 539# can acquire from the URL's below
540NTPDATE_OPTS="-b ntp1.cmc.ec.gc.ca" 540NTPDATE_OPTS="-b ntp1.cmc.ec.gc.ca"
541 541
542## 542##
543# A list of available servers is available here: 543# A list of available servers is available here:
544# http://www.eecis.udel.edu/~mills/ntp/servers.html 544# http://www.eecis.udel.edu/~mills/ntp/servers.html
655And the rules required for this firewall: 655And the rules required for this firewall:
656</p> 656</p>
657 657
658<pre caption="rule-save"> 658<pre caption="rule-save">
659# Adelie Linux Research &amp; Development Center 659# Adelie Linux Research &amp; Development Center
660# /var/lib/iptbles/rule-save 660# /var/lib/iptables/rule-save
661 661
662*filter 662*filter
663:INPUT ACCEPT [0:0] 663:INPUT ACCEPT [0:0]
664:FORWARD ACCEPT [0:0] 664:FORWARD ACCEPT [0:0]
665:OUTPUT ACCEPT [0:0] 665:OUTPUT ACCEPT [0:0]
718Before starting using OpenPBS, some configurations are required. The files 718Before starting using OpenPBS, some configurations are required. The files
719you will need to personalize for your system are: 719you will need to personalize for your system are:
720</p> 720</p>
721 721
722<ul> 722<ul>
723 <li>/etc/pbs_environment</li> 723 <li>/etc/pbs_environment</li>
724 <li>/var/spool/PBS/server_name</li> 724 <li>/var/spool/PBS/server_name</li>
725 <li>/var/spool/PBS/server_priv/nodes</li> 725 <li>/var/spool/PBS/server_priv/nodes</li>
726 <li>/var/spool/PBS/mom_priv/config</li> 726 <li>/var/spool/PBS/mom_priv/config</li>
727 <li>/var/spool/PBS/sched_priv/sched_config</li> 727 <li>/var/spool/PBS/sched_priv/sched_config</li>
728</ul> 728</ul>
729 729
730<p> 730<p>
731Here is a sample sched_config: 731Here is a sample sched_config:
732</p> 732</p>
768set server scheduler_iteration = 60 768set server scheduler_iteration = 60
769</pre> 769</pre>
770 770
771<p> 771<p>
772To submit a task to OpenPBS, the command <c>qsub</c> is used with some 772To submit a task to OpenPBS, the command <c>qsub</c> is used with some
773optional parameters. In the exemple below, "-l" allows you to specify 773optional parameters. In the example below, "-l" allows you to specify
774the resources required, "-j" provides for redirection of standard out and 774the resources required, "-j" provides for redirection of standard out and
775standard error, and the "-m" will e-mail the user at begining (b), end (e) 775standard error, and the "-m" will e-mail the user at beginning (b), end (e)
776and on abort (a) of the job. 776and on abort (a) of the job.
777</p> 777</p>
778 778
779<pre caption="Submitting a task"> 779<pre caption="Submitting a task">
780<comment>(submit and request from OpenPBS that myscript be executed on 2 nodes)</comment> 780<comment>(submit and request from OpenPBS that myscript be executed on 2 nodes)</comment>
831You may need to export a mpich work directory to all your slave nodes in 831You may need to export a mpich work directory to all your slave nodes in
832<path>/etc/exports</path>: 832<path>/etc/exports</path>:
833</p> 833</p>
834 834
835<pre caption="/etc/exports"> 835<pre caption="/etc/exports">
836/home *(rw) 836/home *(rw)
837</pre> 837</pre>
838 838
839<p> 839<p>
840Most massively parallel processors (MPPs) provide a way to start a program on 840Most massively parallel processors (MPPs) provide a way to start a program on
841a requested number of processors; <c>mpirun</c> makes use of the appropriate 841a requested number of processors; <c>mpirun</c> makes use of the appropriate
915If <c>tstmachines</c> finds a problem, it will suggest possible reasons and 915If <c>tstmachines</c> finds a problem, it will suggest possible reasons and
916solutions. In brief, there are three tests: 916solutions. In brief, there are three tests:
917</p> 917</p>
918 918
919<ul> 919<ul>
920 <li> 920 <li>
921 <e>Can processes be started on remote machines?</e> tstmachines attempts 921 <e>Can processes be started on remote machines?</e> tstmachines attempts
922 to run the shell command true on each machine in the machines files by 922 to run the shell command true on each machine in the machines files by
923 using the remote shell command. 923 using the remote shell command.
924 </li> 924 </li>
925 <li> 925 <li>
926 <e>Is current working directory available to all machines?</e> This 926 <e>Is current working directory available to all machines?</e> This
927 attempts to ls a file that tstmachines creates by running ls using the 927 attempts to ls a file that tstmachines creates by running ls using the
928 remote shell command. 928 remote shell command.
929 </li> 929 </li>
930 <li> 930 <li>
931 <e>Can user programs be run on remote systems?</e> This checks that shared 931 <e>Can user programs be run on remote systems?</e> This checks that shared
932 libraries and other components have been properly installed on all 932 libraries and other components have been properly installed on all
933 machines. 933 machines.
934 </li> 934 </li>
935</ul> 935</ul>
986link="http://www.cyberlogic.ca">Cyberlogic</uri>'s Adelie Linux R&amp;D 986link="http://www.cyberlogic.ca">Cyberlogic</uri>'s Adelie Linux R&amp;D
987Centre. 987Centre.
988</p> 988</p>
989 989
990<ul> 990<ul>
991 <li><uri>http://www.gentoo.org</uri>, Gentoo Technologies, Inc.</li> 991 <li><uri>http://www.gentoo.org</uri>, Gentoo Technologies, Inc.</li>
992 <li> 992 <li>
993 <uri link="http://www.adelielinux.com">http://www.adelielinux.com</uri>, 993 <uri link="http://www.adelielinux.com">http://www.adelielinux.com</uri>,
994 Adelie Linux Research and Development Centre 994 Adelie Linux Research and Development Centre
995 </li> 995 </li>
996 <li> 996 <li>
997 <uri link="http://nfs.sourceforge.net/">http://nfs.sourceforge.net</uri>, 997 <uri link="http://nfs.sourceforge.net/">http://nfs.sourceforge.net</uri>,
998 Linux NFS Project 998 Linux NFS Project
999 </li> 999 </li>
1000 <li> 1000 <li>
1001 <uri link="http://www-unix.mcs.anl.gov/mpi/mpich/">http://www-unix.mcs.anl.gov/mpi/mpich/</uri>, 1001 <uri link="http://www-unix.mcs.anl.gov/mpi/mpich/">http://www-unix.mcs.anl.gov/mpi/mpich/</uri>,
1002 Mathematics and Computer Science Division, Argonne National Laboratory 1002 Mathematics and Computer Science Division, Argonne National Laboratory
1003 </li> 1003 </li>
1004 <li> 1004 <li>
1005 <uri link="http://www.ntp.org/">http://ntp.org</uri> 1005 <uri link="http://www.ntp.org/">http://ntp.org</uri>
1006 </li> 1006 </li>
1007 <li> 1007 <li>
1008 <uri link="http://www.eecis.udel.edu/~mills/">http://www.eecis.udel.edu/~mills/</uri>, 1008 <uri link="http://www.eecis.udel.edu/~mills/">http://www.eecis.udel.edu/~mills/</uri>,
1009 David L. Mills, University of Delaware 1009 David L. Mills, University of Delaware
1010 </li> 1010 </li>
1011 <li> 1011 <li>
1012 <uri link="http://www.ietf.org/html.charters/secsh-charter.html">http://www.ietf.org/html.charters/secsh-charter.html</uri>, 1012 <uri link="http://www.ietf.org/html.charters/secsh-charter.html">http://www.ietf.org/html.charters/secsh-charter.html</uri>,
1013 Secure Shell Working Group, IETF, Internet Society 1013 Secure Shell Working Group, IETF, Internet Society
1014 </li> 1014 </li>
1015 <li> 1015 <li>
1016 <uri link="http://www.linuxsecurity.com/">http://www.linuxsecurity.com/</uri>, 1016 <uri link="http://www.linuxsecurity.com/">http://www.linuxsecurity.com/</uri>,
1017 Guardian Digital 1017 Guardian Digital
1018 </li> 1018 </li>
1019 <li> 1019 <li>
1020 <uri link="http://www.openpbs.org/">http://www.openpbs.org/</uri>, 1020 <uri link="http://www.openpbs.org/">http://www.openpbs.org/</uri>,
1021 Altair Grid Technologies, LLC. 1021 Altair Grid Technologies, LLC.
1022 </li> 1022 </li>
1023</ul> 1023</ul>
1024 1024

Legend:
Removed from v.1.5  
changed lines
  Added in v.1.6

  ViewVC Help
Powered by ViewVC 1.1.20