/[gentoo]/xml/htdocs/doc/en/hpc-howto.xml
Gentoo

Diff of /xml/htdocs/doc/en/hpc-howto.xml

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

Revision 1.5 Revision 1.13
1<?xml version='1.0' encoding="UTF-8"?> 1<?xml version='1.0' encoding="UTF-8"?>
2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/hpc-howto.xml,v 1.5 2005/10/04 19:05:41 rane Exp $ --> 2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/hpc-howto.xml,v 1.13 2006/12/18 21:47:19 nightmorph Exp $ -->
3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> 3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd">
4 4
5<guide link="/doc/en/hpc-howto.xml"> 5<guide link="/doc/en/hpc-howto.xml">
6<title>High Performance Computing on Gentoo Linux</title> 6<title>High Performance Computing on Gentoo Linux</title>
7 7
16</author> 16</author>
17<author title="Assistant/Research"> 17<author title="Assistant/Research">
18 <mail link="olivier@adelielinux.com">Olivier Crete</mail> 18 <mail link="olivier@adelielinux.com">Olivier Crete</mail>
19</author> 19</author>
20<author title="Reviewer"> 20<author title="Reviewer">
21 <mail link="spyderous@gentoo.org">Donnie Berkholz</mail> 21 <mail link="dberkholz@gentoo.org">Donnie Berkholz</mail>
22</author> 22</author>
23 23
24<!-- No licensing information; this document has been written by a third-party 24<!-- No licensing information; this document has been written by a third-party
25 organisation without additional licensing information. 25 organisation without additional licensing information.
26 26
30--> 30-->
31 31
32<abstract> 32<abstract>
33This document was written by people at the Adelie Linux R&amp;D Center 33This document was written by people at the Adelie Linux R&amp;D Center
34&lt;http://www.adelielinux.com&gt; as a step-by-step guide to turn a Gentoo 34&lt;http://www.adelielinux.com&gt; as a step-by-step guide to turn a Gentoo
35System into an High Performance Computing (HPC) system. 35System into a High Performance Computing (HPC) system.
36</abstract> 36</abstract>
37 37
38<version>1.2</version> 38<version>1.6</version>
39<date>2003-08-01</date> 39<date>2006-12-18</date>
40 40
41<chapter> 41<chapter>
42<title>Introduction</title> 42<title>Introduction</title>
43<section> 43<section>
44<body> 44<body>
83We refer to the <uri link="/doc/en/handbook/">Gentoo Linux Handbooks</uri> in 83We refer to the <uri link="/doc/en/handbook/">Gentoo Linux Handbooks</uri> in
84this section. 84this section.
85</note> 85</note>
86 86
87<p> 87<p>
88During the installation process, you will have to set your USE variables in 88During the installation process, you will have to set your USE variables in
89<path>/etc/make.conf</path>. We recommended that you deactivate all the 89<path>/etc/make.conf</path>. We recommended that you deactivate all the
90defaults (see <path>/etc/make.profile/make.defaults</path>) by negating them 90defaults (see <path>/etc/make.profile/make.defaults</path>) by negating them in
91in make.conf. However, you may want to keep such use variables as x86, 3dnow, 91make.conf. However, you may want to keep such use variables as x86, 3dnow, gpm,
92gpm, mmx, sse, ncurses, pam and tcpd. Refer to the USE documentation for more 92mmx, nptl, nptlonly, sse, ncurses, pam and tcpd. Refer to the USE documentation
93information. 93for more information.
94</p> 94</p>
95 95
96<pre caption="USE Flags"> 96<pre caption="USE Flags">
97USE="-oss 3dnow -apm -arts -avi -berkdb -crypt -cups -encode -gdbm 97USE="-oss 3dnow -apm -arts -avi -berkdb -crypt -cups -encode -gdbm -gif gpm -gtk
98-gif gpm -gtk -imlib -java -jpeg -kde -gnome -libg++ -libwww -mikmod 98-imlib -java -jpeg -kde -gnome -libg++ -libwww -mikmod mmx -motif -mpeg ncurses
99mmx -motif -mpeg ncurses -nls -oggvorbis -opengl pam -pdflib -png 99-nls nptl nptlonly -oggvorbis -opengl pam -pdflib -png -python -qt3 -qt4 -qtmt
100-python -qt -qtmt -quicktime -readline -sdl -slang -spell -ssl 100-quicktime -readline -sdl -slang -spell -ssl -svga tcpd -truetype -X -xml2 -xv
101-svga tcpd -truetype -X -xml2 -xmms -xv -zlib" 101-zlib"
102</pre> 102</pre>
103 103
104<p> 104<p>
105Or simply: 105Or simply:
106</p> 106</p>
160</p> 160</p>
161 161
162<p> 162<p>
163The slave nodes listen for instructions (via ssh/rsh perhaps) from the master 163The slave nodes listen for instructions (via ssh/rsh perhaps) from the master
164node. They should be dedicated to crunching results and therefore should not 164node. They should be dedicated to crunching results and therefore should not
165run any unecessary services. 165run any unnecessary services.
166</p> 166</p>
167 167
168<p> 168<p>
169The rest of this documentation will assume a cluster configuration as per the 169The rest of this documentation will assume a cluster configuration as per the
170hosts file below. You should maintain on every node such a hosts file 170hosts file below. You should maintain on every node such a hosts file
174 174
175<pre caption="/etc/hosts"> 175<pre caption="/etc/hosts">
176# Adelie Linux Research &amp; Development Center 176# Adelie Linux Research &amp; Development Center
177# /etc/hosts 177# /etc/hosts
178 178
179127.0.0.1 localhost 179127.0.0.1 localhost
180 180
181192.168.1.100 master.adelie master 181192.168.1.100 master.adelie master
182 182
183192.168.1.1 node01.adelie node01 183192.168.1.1 node01.adelie node01
184192.168.1.2 node02.adelie node02 184192.168.1.2 node02.adelie node02
185</pre> 185</pre>
186 186
187<p> 187<p>
188To setup your cluster dedicated LAN, edit your <path>/etc/conf.d/net</path> 188To setup your cluster dedicated LAN, edit your <path>/etc/conf.d/net</path>
189file on the master node. 189file on the master node.
190</p> 190</p>
191 191
192<pre caption="/etc/conf.d/net"> 192<pre caption="/etc/conf.d/net">
193# Copyright 1999-2002 Gentoo Technologies, Inc.
194# Distributed under the terms of the GNU General Public License, v2 or later
195
196# Global config file for net.* rc-scripts 193# Global config file for net.* rc-scripts
197 194
198# This is basically the ifconfig argument without the ifconfig $iface 195# This is basically the ifconfig argument without the ifconfig $iface
199# 196#
200 197
221 option domain-name "adelie"; 218 option domain-name "adelie";
222 range 192.168.1.10 192.168.1.99; 219 range 192.168.1.10 192.168.1.99;
223 option routers 192.168.1.100; 220 option routers 192.168.1.100;
224 221
225 host node01.adelie { 222 host node01.adelie {
226 # MAC address of network card on node 01 223 # MAC address of network card on node 01
227 hardware ethernet 00:07:e9:0f:e2:d4; 224 hardware ethernet 00:07:e9:0f:e2:d4;
228 fixed-address 192.168.1.1; 225 fixed-address 192.168.1.1;
229 } 226 }
230 host node02.adelie { 227 host node02.adelie {
231 # MAC address of network card on node 02 228 # MAC address of network card on node 02
232 hardware ethernet 00:07:e9:0f:e2:6b; 229 hardware ethernet 00:07:e9:0f:e2:6b;
233 fixed-address 192.168.1.2; 230 fixed-address 192.168.1.2;
234 } 231 }
235} 232}
236</pre> 233</pre>
289portmap:192.168.1.0/255.255.255.0 286portmap:192.168.1.0/255.255.255.0
290</pre> 287</pre>
291 288
292<p> 289<p>
293Edit the <path>/etc/exports</path> file of the master node to export a work 290Edit the <path>/etc/exports</path> file of the master node to export a work
294directory struture (/home is good for this). 291directory structure (/home is good for this).
295</p> 292</p>
296 293
297<pre caption="/etc/exports"> 294<pre caption="/etc/exports">
298/home/ *(rw) 295/home/ *(rw)
299</pre> 296</pre>
300 297
301<p> 298<p>
302Add nfs to your master node's default runlevel: 299Add nfs to your master node's default runlevel:
303</p> 300</p>
311configure your salve nodes' <path>/etc/fstab</path>. Add a line like this 308configure your salve nodes' <path>/etc/fstab</path>. Add a line like this
312one: 309one:
313</p> 310</p>
314 311
315<pre caption="/etc/fstab"> 312<pre caption="/etc/fstab">
316master:/home/ /home nfs rw,exec,noauto,nouser,async 0 0 313master:/home/ /home nfs rw,exec,noauto,nouser,async 0 0
317</pre> 314</pre>
318 315
319<p> 316<p>
320You'll also need to set up your nodes so that they mount the nfs filesystem by 317You'll also need to set up your nodes so that they mount the nfs filesystem by
321issuing this command: 318issuing this command:
348 <li>Generate public and private keys</li> 345 <li>Generate public and private keys</li>
349 <li>Copy public key to slave nodes</li> 346 <li>Copy public key to slave nodes</li>
350</ul> 347</ul>
351 348
352<p> 349<p>
353For user based authentification, generate and copy as follows: 350For user based authentication, generate and copy as follows:
354</p> 351</p>
355 352
356<pre caption="SSH key authentication"> 353<pre caption="SSH key authentication">
357# <i>ssh-keygen -t dsa</i> 354# <i>ssh-keygen -t dsa</i>
358Generating public/private dsa key pair. 355Generating public/private dsa key pair.
376id_dsa.pub 100% 234 2.0MB/s 00:00 373id_dsa.pub 100% 234 2.0MB/s 00:00
377</pre> 374</pre>
378 375
379<note> 376<note>
380Host keys must have an empty passphrase. RSA is required for host-based 377Host keys must have an empty passphrase. RSA is required for host-based
381authentification. 378authentication.
382</note> 379</note>
383 380
384<p> 381<p>
385For host based authentication, you will also need to edit your 382For host based authentication, you will also need to edit your
386<path>/etc/ssh/shosts.equiv</path>. 383<path>/etc/ssh/shosts.equiv</path>.
463 460
464ALL:192.168.1.0/255.255.255.0 461ALL:192.168.1.0/255.255.255.0
465</pre> 462</pre>
466 463
467<p> 464<p>
468Finally, configure host authentification from <path>/etc/hosts.equiv</path>. 465Finally, configure host authentication from <path>/etc/hosts.equiv</path>.
469</p> 466</p>
470 467
471<pre caption="hosts.equiv"> 468<pre caption="hosts.equiv">
472# Adelie Linux Research &amp; Development Center 469# Adelie Linux Research &amp; Development Center
473# /etc/hosts.equiv 470# /etc/hosts.equiv
508Servers</uri>, and configure your <path>/etc/conf.d/ntp</path> and 505Servers</uri>, and configure your <path>/etc/conf.d/ntp</path> and
509<path>/etc/ntp.conf</path> files on the master node. 506<path>/etc/ntp.conf</path> files on the master node.
510</p> 507</p>
511 508
512<pre caption="Master /etc/conf.d/ntp"> 509<pre caption="Master /etc/conf.d/ntp">
513# Copyright 1999-2002 Gentoo Technologies, Inc.
514# Distributed under the terms of the GNU General Public License v2
515# /etc/conf.d/ntpd 510# /etc/conf.d/ntpd
516 511
517# NOTES: 512# NOTES:
518# - NTPDATE variables below are used if you wish to set your 513# - NTPDATE variables below are used if you wish to set your
519# clock when you start the ntp init.d script 514# clock when you start the ntp init.d script
534NTPDATE_CMD="ntpdate" 529NTPDATE_CMD="ntpdate"
535 530
536# Options to pass to the above command 531# Options to pass to the above command
537# Most people should just uncomment this variable and 532# Most people should just uncomment this variable and
538# change 'someserver' to a valid hostname which you 533# change 'someserver' to a valid hostname which you
539# can aquire from the URL's below 534# can acquire from the URL's below
540NTPDATE_OPTS="-b ntp1.cmc.ec.gc.ca" 535NTPDATE_OPTS="-b ntp1.cmc.ec.gc.ca"
541 536
542## 537##
543# A list of available servers is available here: 538# A list of available servers is available here:
544# http://www.eecis.udel.edu/~mills/ntp/servers.html 539# http://www.eecis.udel.edu/~mills/ntp/servers.html
581And on all your slave nodes, setup your synchronization source as your master 576And on all your slave nodes, setup your synchronization source as your master
582node. 577node.
583</p> 578</p>
584 579
585<pre caption="Node /etc/conf.d/ntp"> 580<pre caption="Node /etc/conf.d/ntp">
586# Copyright 1999-2002 Gentoo Technologies, Inc.
587# Distributed under the terms of the GNU General Public License v2
588# /etc/conf.d/ntpd 581# /etc/conf.d/ntpd
589 582
590NTPDATE_WARN="n" 583NTPDATE_WARN="n"
591NTPDATE_CMD="ntpdate" 584NTPDATE_CMD="ntpdate"
592NTPDATE_OPTS="-b master" 585NTPDATE_OPTS="-b master"
655And the rules required for this firewall: 648And the rules required for this firewall:
656</p> 649</p>
657 650
658<pre caption="rule-save"> 651<pre caption="rule-save">
659# Adelie Linux Research &amp; Development Center 652# Adelie Linux Research &amp; Development Center
660# /var/lib/iptbles/rule-save 653# /var/lib/iptables/rule-save
661 654
662*filter 655*filter
663:INPUT ACCEPT [0:0] 656:INPUT ACCEPT [0:0]
664:FORWARD ACCEPT [0:0] 657:FORWARD ACCEPT [0:0]
665:OUTPUT ACCEPT [0:0] 658:OUTPUT ACCEPT [0:0]
718Before starting using OpenPBS, some configurations are required. The files 711Before starting using OpenPBS, some configurations are required. The files
719you will need to personalize for your system are: 712you will need to personalize for your system are:
720</p> 713</p>
721 714
722<ul> 715<ul>
723 <li>/etc/pbs_environment</li> 716 <li>/etc/pbs_environment</li>
724 <li>/var/spool/PBS/server_name</li> 717 <li>/var/spool/PBS/server_name</li>
725 <li>/var/spool/PBS/server_priv/nodes</li> 718 <li>/var/spool/PBS/server_priv/nodes</li>
726 <li>/var/spool/PBS/mom_priv/config</li> 719 <li>/var/spool/PBS/mom_priv/config</li>
727 <li>/var/spool/PBS/sched_priv/sched_config</li> 720 <li>/var/spool/PBS/sched_priv/sched_config</li>
728</ul> 721</ul>
729 722
730<p> 723<p>
731Here is a sample sched_config: 724Here is a sample sched_config:
732</p> 725</p>
768set server scheduler_iteration = 60 761set server scheduler_iteration = 60
769</pre> 762</pre>
770 763
771<p> 764<p>
772To submit a task to OpenPBS, the command <c>qsub</c> is used with some 765To submit a task to OpenPBS, the command <c>qsub</c> is used with some
773optional parameters. In the exemple below, "-l" allows you to specify 766optional parameters. In the example below, "-l" allows you to specify
774the resources required, "-j" provides for redirection of standard out and 767the resources required, "-j" provides for redirection of standard out and
775standard error, and the "-m" will e-mail the user at begining (b), end (e) 768standard error, and the "-m" will e-mail the user at beginning (b), end (e)
776and on abort (a) of the job. 769and on abort (a) of the job.
777</p> 770</p>
778 771
779<pre caption="Submitting a task"> 772<pre caption="Submitting a task">
780<comment>(submit and request from OpenPBS that myscript be executed on 2 nodes)</comment> 773<comment>(submit and request from OpenPBS that myscript be executed on 2 nodes)</comment>
831You may need to export a mpich work directory to all your slave nodes in 824You may need to export a mpich work directory to all your slave nodes in
832<path>/etc/exports</path>: 825<path>/etc/exports</path>:
833</p> 826</p>
834 827
835<pre caption="/etc/exports"> 828<pre caption="/etc/exports">
836/home *(rw) 829/home *(rw)
837</pre> 830</pre>
838 831
839<p> 832<p>
840Most massively parallel processors (MPPs) provide a way to start a program on 833Most massively parallel processors (MPPs) provide a way to start a program on
841a requested number of processors; <c>mpirun</c> makes use of the appropriate 834a requested number of processors; <c>mpirun</c> makes use of the appropriate
915If <c>tstmachines</c> finds a problem, it will suggest possible reasons and 908If <c>tstmachines</c> finds a problem, it will suggest possible reasons and
916solutions. In brief, there are three tests: 909solutions. In brief, there are three tests:
917</p> 910</p>
918 911
919<ul> 912<ul>
920 <li> 913 <li>
921 <e>Can processes be started on remote machines?</e> tstmachines attempts 914 <e>Can processes be started on remote machines?</e> tstmachines attempts
922 to run the shell command true on each machine in the machines files by 915 to run the shell command true on each machine in the machines files by
923 using the remote shell command. 916 using the remote shell command.
924 </li> 917 </li>
925 <li> 918 <li>
926 <e>Is current working directory available to all machines?</e> This 919 <e>Is current working directory available to all machines?</e> This
927 attempts to ls a file that tstmachines creates by running ls using the 920 attempts to ls a file that tstmachines creates by running ls using the
928 remote shell command. 921 remote shell command.
929 </li> 922 </li>
930 <li> 923 <li>
931 <e>Can user programs be run on remote systems?</e> This checks that shared 924 <e>Can user programs be run on remote systems?</e> This checks that shared
932 libraries and other components have been properly installed on all 925 libraries and other components have been properly installed on all
933 machines. 926 machines.
934 </li> 927 </li>
935</ul> 928</ul>
986link="http://www.cyberlogic.ca">Cyberlogic</uri>'s Adelie Linux R&amp;D 979link="http://www.cyberlogic.ca">Cyberlogic</uri>'s Adelie Linux R&amp;D
987Centre. 980Centre.
988</p> 981</p>
989 982
990<ul> 983<ul>
991 <li><uri>http://www.gentoo.org</uri>, Gentoo Technologies, Inc.</li> 984 <li><uri>http://www.gentoo.org</uri>, Gentoo Foundation, Inc.</li>
992 <li> 985 <li>
993 <uri link="http://www.adelielinux.com">http://www.adelielinux.com</uri>, 986 <uri link="http://www.adelielinux.com">http://www.adelielinux.com</uri>,
994 Adelie Linux Research and Development Centre 987 Adelie Linux Research and Development Centre
995 </li> 988 </li>
996 <li> 989 <li>
997 <uri link="http://nfs.sourceforge.net/">http://nfs.sourceforge.net</uri>, 990 <uri link="http://nfs.sourceforge.net/">http://nfs.sourceforge.net</uri>,
998 Linux NFS Project 991 Linux NFS Project
999 </li> 992 </li>
1000 <li> 993 <li>
1001 <uri link="http://www-unix.mcs.anl.gov/mpi/mpich/">http://www-unix.mcs.anl.gov/mpi/mpich/</uri>, 994 <uri link="http://www-unix.mcs.anl.gov/mpi/mpich/">http://www-unix.mcs.anl.gov/mpi/mpich/</uri>,
1002 Mathematics and Computer Science Division, Argonne National Laboratory 995 Mathematics and Computer Science Division, Argonne National Laboratory
1003 </li> 996 </li>
1004 <li> 997 <li>
1005 <uri link="http://www.ntp.org/">http://ntp.org</uri> 998 <uri link="http://www.ntp.org/">http://ntp.org</uri>
1006 </li> 999 </li>
1007 <li> 1000 <li>
1008 <uri link="http://www.eecis.udel.edu/~mills/">http://www.eecis.udel.edu/~mills/</uri>, 1001 <uri link="http://www.eecis.udel.edu/~mills/">http://www.eecis.udel.edu/~mills/</uri>,
1009 David L. Mills, University of Delaware 1002 David L. Mills, University of Delaware
1010 </li> 1003 </li>
1011 <li> 1004 <li>
1012 <uri link="http://www.ietf.org/html.charters/secsh-charter.html">http://www.ietf.org/html.charters/secsh-charter.html</uri>, 1005 <uri link="http://www.ietf.org/html.charters/secsh-charter.html">http://www.ietf.org/html.charters/secsh-charter.html</uri>,
1013 Secure Shell Working Group, IETF, Internet Society 1006 Secure Shell Working Group, IETF, Internet Society
1014 </li> 1007 </li>
1015 <li> 1008 <li>
1016 <uri link="http://www.linuxsecurity.com/">http://www.linuxsecurity.com/</uri>, 1009 <uri link="http://www.linuxsecurity.com/">http://www.linuxsecurity.com/</uri>,
1017 Guardian Digital 1010 Guardian Digital
1018 </li> 1011 </li>
1019 <li> 1012 <li>
1020 <uri link="http://www.openpbs.org/">http://www.openpbs.org/</uri>, 1013 <uri link="http://www.openpbs.org/">http://www.openpbs.org/</uri>,
1021 Altair Grid Technologies, LLC. 1014 Altair Grid Technologies, LLC.
1022 </li> 1015 </li>
1023</ul> 1016</ul>
1024 1017

Legend:
Removed from v.1.5  
changed lines
  Added in v.1.13

  ViewVC Help
Powered by ViewVC 1.1.20