/[gentoo]/xml/htdocs/doc/en/hpc-howto.xml
Gentoo

Diff of /xml/htdocs/doc/en/hpc-howto.xml

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

Revision 1.4 Revision 1.13
1<?xml version='1.0' encoding="UTF-8"?> 1<?xml version='1.0' encoding="UTF-8"?>
2
3<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/hpc-howto.xml,v 1.4 2005/05/20 16:54:18 neysx Exp $ --> 2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/hpc-howto.xml,v 1.13 2006/12/18 21:47:19 nightmorph Exp $ -->
4
5<!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> 3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd">
4
6<guide link="/doc/en/hpc-howto.xml"> 5<guide link="/doc/en/hpc-howto.xml">
7
8<title>High Performance Computing on Gentoo Linux</title> 6<title>High Performance Computing on Gentoo Linux</title>
9 7
10<author title="Author"> 8<author title="Author">
11 <mail link="marc@adelielinux.com">Marc St-Pierre</mail> 9 <mail link="marc@adelielinux.com">Marc St-Pierre</mail>
12</author> 10</author>
18</author> 16</author>
19<author title="Assistant/Research"> 17<author title="Assistant/Research">
20 <mail link="olivier@adelielinux.com">Olivier Crete</mail> 18 <mail link="olivier@adelielinux.com">Olivier Crete</mail>
21</author> 19</author>
22<author title="Reviewer"> 20<author title="Reviewer">
23 <mail link="spyderous@gentoo.org">Donnie Berkholz</mail> 21 <mail link="dberkholz@gentoo.org">Donnie Berkholz</mail>
24</author> 22</author>
25 23
26<!-- No licensing information; this document has been written by a third-party 24<!-- No licensing information; this document has been written by a third-party
27 organisation without additional licensing information. 25 organisation without additional licensing information.
28 26
32--> 30-->
33 31
34<abstract> 32<abstract>
35This document was written by people at the Adelie Linux R&amp;D Center 33This document was written by people at the Adelie Linux R&amp;D Center
36&lt;http://www.adelielinux.com&gt; as a step-by-step guide to turn a Gentoo 34&lt;http://www.adelielinux.com&gt; as a step-by-step guide to turn a Gentoo
37System into an High Performance Computing (HPC) system. 35System into a High Performance Computing (HPC) system.
38</abstract> 36</abstract>
39 37
40<version>1.1</version> 38<version>1.6</version>
41<date>2003-08-01</date> 39<date>2006-12-18</date>
42 40
43<chapter> 41<chapter>
44<title>Introduction</title> 42<title>Introduction</title>
45<section> 43<section>
46<body> 44<body>
85We refer to the <uri link="/doc/en/handbook/">Gentoo Linux Handbooks</uri> in 83We refer to the <uri link="/doc/en/handbook/">Gentoo Linux Handbooks</uri> in
86this section. 84this section.
87</note> 85</note>
88 86
89<p> 87<p>
90During the installation process, you will have to set your USE variables in 88During the installation process, you will have to set your USE variables in
91<path>/etc/make.conf</path>. We recommended that you deactivate all the 89<path>/etc/make.conf</path>. We recommended that you deactivate all the
92defaults (see <path>/etc/make.profile/make.defaults</path>) by negating them 90defaults (see <path>/etc/make.profile/make.defaults</path>) by negating them in
93in make.conf. However, you may want to keep such use variables as x86, 3dnow, 91make.conf. However, you may want to keep such use variables as x86, 3dnow, gpm,
94gpm, mmx, sse, ncurses, pam and tcpd. Refer to the USE documentation for more 92mmx, nptl, nptlonly, sse, ncurses, pam and tcpd. Refer to the USE documentation
95information. 93for more information.
96</p> 94</p>
97 95
98<pre caption="USE Flags"> 96<pre caption="USE Flags">
99USE="-oss 3dnow -apm -arts -avi -berkdb -crypt -cups -encode -gdbm 97USE="-oss 3dnow -apm -arts -avi -berkdb -crypt -cups -encode -gdbm -gif gpm -gtk
100-gif gpm -gtk -imlib -java -jpeg -kde -gnome -libg++ -libwww -mikmod 98-imlib -java -jpeg -kde -gnome -libg++ -libwww -mikmod mmx -motif -mpeg ncurses
101mmx -motif -mpeg ncurses -nls -oggvorbis -opengl pam -pdflib -png 99-nls nptl nptlonly -oggvorbis -opengl pam -pdflib -png -python -qt3 -qt4 -qtmt
102-python -qt -qtmt -quicktime -readline -sdl -slang -spell -ssl 100-quicktime -readline -sdl -slang -spell -ssl -svga tcpd -truetype -X -xml2 -xv
103-svga tcpd -truetype -X -xml2 -xmms -xv -zlib" 101-zlib"
104</pre> 102</pre>
105 103
106<p> 104<p>
107Or simply: 105Or simply:
108</p> 106</p>
162</p> 160</p>
163 161
164<p> 162<p>
165The slave nodes listen for instructions (via ssh/rsh perhaps) from the master 163The slave nodes listen for instructions (via ssh/rsh perhaps) from the master
166node. They should be dedicated to crunching results and therefore should not 164node. They should be dedicated to crunching results and therefore should not
167run any unecessary services. 165run any unnecessary services.
168</p> 166</p>
169 167
170<p> 168<p>
171The rest of this documentation will assume a cluster configuration as per the 169The rest of this documentation will assume a cluster configuration as per the
172hosts file below. You should maintain on every node such a hosts file 170hosts file below. You should maintain on every node such a hosts file
176 174
177<pre caption="/etc/hosts"> 175<pre caption="/etc/hosts">
178# Adelie Linux Research &amp; Development Center 176# Adelie Linux Research &amp; Development Center
179# /etc/hosts 177# /etc/hosts
180 178
181127.0.0.1 localhost 179127.0.0.1 localhost
182 180
183192.168.1.100 master.adelie master 181192.168.1.100 master.adelie master
184 182
185192.168.1.1 node01.adelie node01 183192.168.1.1 node01.adelie node01
186192.168.1.2 node02.adelie node02 184192.168.1.2 node02.adelie node02
187</pre> 185</pre>
188 186
189<p> 187<p>
190To setup your cluster dedicated LAN, edit your <path>/etc/conf.d/net</path> 188To setup your cluster dedicated LAN, edit your <path>/etc/conf.d/net</path>
191file on the master node. 189file on the master node.
192</p> 190</p>
193 191
194<pre caption="/etc/conf.d/net"> 192<pre caption="/etc/conf.d/net">
195# Copyright 1999-2002 Gentoo Technologies, Inc.
196# Distributed under the terms of the GNU General Public License, v2 or later
197
198# Global config file for net.* rc-scripts 193# Global config file for net.* rc-scripts
199 194
200# This is basically the ifconfig argument without the ifconfig $iface 195# This is basically the ifconfig argument without the ifconfig $iface
201# 196#
202 197
223 option domain-name "adelie"; 218 option domain-name "adelie";
224 range 192.168.1.10 192.168.1.99; 219 range 192.168.1.10 192.168.1.99;
225 option routers 192.168.1.100; 220 option routers 192.168.1.100;
226 221
227 host node01.adelie { 222 host node01.adelie {
228 # MAC address of network card on node 01 223 # MAC address of network card on node 01
229 hardware ethernet 00:07:e9:0f:e2:d4; 224 hardware ethernet 00:07:e9:0f:e2:d4;
230 fixed-address 192.168.1.1; 225 fixed-address 192.168.1.1;
231 } 226 }
232 host node02.adelie { 227 host node02.adelie {
233 # MAC address of network card on node 02 228 # MAC address of network card on node 02
234 hardware ethernet 00:07:e9:0f:e2:6b; 229 hardware ethernet 00:07:e9:0f:e2:6b;
235 fixed-address 192.168.1.2; 230 fixed-address 192.168.1.2;
236 } 231 }
237} 232}
238</pre> 233</pre>
250</p> 245</p>
251 246
252<p> 247<p>
253There are other systems that provide similar functionality to NFS which could 248There are other systems that provide similar functionality to NFS which could
254be used in a cluster environment. The <uri 249be used in a cluster environment. The <uri
255link="http://www.transarc.com/Product/EFS/AFS/index.html">Andrew File System 250link="http://www.openafs.org">Andrew File System
256from IBM</uri>, recently open-sourced, provides a file sharing mechanism with 251from IBM</uri>, recently open-sourced, provides a file sharing mechanism with
257some additional security and performance features. The <uri 252some additional security and performance features. The <uri
258link="http://www.coda.cs.cmu.edu/">Coda File System</uri> is still in 253link="http://www.coda.cs.cmu.edu/">Coda File System</uri> is still in
259development, but is designed to work well with disconnected clients. Many 254development, but is designed to work well with disconnected clients. Many
260of the features of the Andrew and Coda file systems are slated for inclusion 255of the features of the Andrew and Coda file systems are slated for inclusion
291portmap:192.168.1.0/255.255.255.0 286portmap:192.168.1.0/255.255.255.0
292</pre> 287</pre>
293 288
294<p> 289<p>
295Edit the <path>/etc/exports</path> file of the master node to export a work 290Edit the <path>/etc/exports</path> file of the master node to export a work
296directory struture (/home is good for this). 291directory structure (/home is good for this).
297</p> 292</p>
298 293
299<pre caption="/etc/exports"> 294<pre caption="/etc/exports">
300/home/ *(rw) 295/home/ *(rw)
301</pre> 296</pre>
302 297
303<p> 298<p>
304Add nfs to your master node's default runlevel: 299Add nfs to your master node's default runlevel:
305</p> 300</p>
313configure your salve nodes' <path>/etc/fstab</path>. Add a line like this 308configure your salve nodes' <path>/etc/fstab</path>. Add a line like this
314one: 309one:
315</p> 310</p>
316 311
317<pre caption="/etc/fstab"> 312<pre caption="/etc/fstab">
318master:/home/ /home nfs rw,exec,noauto,nouser,async 0 0 313master:/home/ /home nfs rw,exec,noauto,nouser,async 0 0
319</pre> 314</pre>
320 315
321<p> 316<p>
322You'll also need to set up your nodes so that they mount the nfs filesystem by 317You'll also need to set up your nodes so that they mount the nfs filesystem by
323issuing this command: 318issuing this command:
350 <li>Generate public and private keys</li> 345 <li>Generate public and private keys</li>
351 <li>Copy public key to slave nodes</li> 346 <li>Copy public key to slave nodes</li>
352</ul> 347</ul>
353 348
354<p> 349<p>
355For user based authentification, generate and copy as follows: 350For user based authentication, generate and copy as follows:
356</p> 351</p>
357 352
358<pre caption="SSH key authentication"> 353<pre caption="SSH key authentication">
359# <i>ssh-keygen -t dsa</i> 354# <i>ssh-keygen -t dsa</i>
360Generating public/private dsa key pair. 355Generating public/private dsa key pair.
378id_dsa.pub 100% 234 2.0MB/s 00:00 373id_dsa.pub 100% 234 2.0MB/s 00:00
379</pre> 374</pre>
380 375
381<note> 376<note>
382Host keys must have an empty passphrase. RSA is required for host-based 377Host keys must have an empty passphrase. RSA is required for host-based
383authentification. 378authentication.
384</note> 379</note>
385 380
386<p> 381<p>
387For host based authentication, you will also need to edit your 382For host based authentication, you will also need to edit your
388<path>/etc/ssh/shosts.equiv</path>. 383<path>/etc/ssh/shosts.equiv</path>.
465 460
466ALL:192.168.1.0/255.255.255.0 461ALL:192.168.1.0/255.255.255.0
467</pre> 462</pre>
468 463
469<p> 464<p>
470Finally, configure host authentification from <path>/etc/hosts.equiv</path>. 465Finally, configure host authentication from <path>/etc/hosts.equiv</path>.
471</p> 466</p>
472 467
473<pre caption="hosts.equiv"> 468<pre caption="hosts.equiv">
474# Adelie Linux Research &amp; Development Center 469# Adelie Linux Research &amp; Development Center
475# /etc/hosts.equiv 470# /etc/hosts.equiv
510Servers</uri>, and configure your <path>/etc/conf.d/ntp</path> and 505Servers</uri>, and configure your <path>/etc/conf.d/ntp</path> and
511<path>/etc/ntp.conf</path> files on the master node. 506<path>/etc/ntp.conf</path> files on the master node.
512</p> 507</p>
513 508
514<pre caption="Master /etc/conf.d/ntp"> 509<pre caption="Master /etc/conf.d/ntp">
515# Copyright 1999-2002 Gentoo Technologies, Inc.
516# Distributed under the terms of the GNU General Public License v2
517# /etc/conf.d/ntpd 510# /etc/conf.d/ntpd
518 511
519# NOTES: 512# NOTES:
520# - NTPDATE variables below are used if you wish to set your 513# - NTPDATE variables below are used if you wish to set your
521# clock when you start the ntp init.d script 514# clock when you start the ntp init.d script
536NTPDATE_CMD="ntpdate" 529NTPDATE_CMD="ntpdate"
537 530
538# Options to pass to the above command 531# Options to pass to the above command
539# Most people should just uncomment this variable and 532# Most people should just uncomment this variable and
540# change 'someserver' to a valid hostname which you 533# change 'someserver' to a valid hostname which you
541# can aquire from the URL's below 534# can acquire from the URL's below
542NTPDATE_OPTS="-b ntp1.cmc.ec.gc.ca" 535NTPDATE_OPTS="-b ntp1.cmc.ec.gc.ca"
543 536
544## 537##
545# A list of available servers is available here: 538# A list of available servers is available here:
546# http://www.eecis.udel.edu/~mills/ntp/servers.html 539# http://www.eecis.udel.edu/~mills/ntp/servers.html
583And on all your slave nodes, setup your synchronization source as your master 576And on all your slave nodes, setup your synchronization source as your master
584node. 577node.
585</p> 578</p>
586 579
587<pre caption="Node /etc/conf.d/ntp"> 580<pre caption="Node /etc/conf.d/ntp">
588# Copyright 1999-2002 Gentoo Technologies, Inc.
589# Distributed under the terms of the GNU General Public License v2
590# /etc/conf.d/ntpd 581# /etc/conf.d/ntpd
591 582
592NTPDATE_WARN="n" 583NTPDATE_WARN="n"
593NTPDATE_CMD="ntpdate" 584NTPDATE_CMD="ntpdate"
594NTPDATE_OPTS="-b master" 585NTPDATE_OPTS="-b master"
657And the rules required for this firewall: 648And the rules required for this firewall:
658</p> 649</p>
659 650
660<pre caption="rule-save"> 651<pre caption="rule-save">
661# Adelie Linux Research &amp; Development Center 652# Adelie Linux Research &amp; Development Center
662# /var/lib/iptbles/rule-save 653# /var/lib/iptables/rule-save
663 654
664*filter 655*filter
665:INPUT ACCEPT [0:0] 656:INPUT ACCEPT [0:0]
666:FORWARD ACCEPT [0:0] 657:FORWARD ACCEPT [0:0]
667:OUTPUT ACCEPT [0:0] 658:OUTPUT ACCEPT [0:0]
720Before starting using OpenPBS, some configurations are required. The files 711Before starting using OpenPBS, some configurations are required. The files
721you will need to personalize for your system are: 712you will need to personalize for your system are:
722</p> 713</p>
723 714
724<ul> 715<ul>
725 <li>/etc/pbs_environment</li> 716 <li>/etc/pbs_environment</li>
726 <li>/var/spool/PBS/server_name</li> 717 <li>/var/spool/PBS/server_name</li>
727 <li>/var/spool/PBS/server_priv/nodes</li> 718 <li>/var/spool/PBS/server_priv/nodes</li>
728 <li>/var/spool/PBS/mom_priv/config</li> 719 <li>/var/spool/PBS/mom_priv/config</li>
729 <li>/var/spool/PBS/sched_priv/sched_config</li> 720 <li>/var/spool/PBS/sched_priv/sched_config</li>
730</ul> 721</ul>
731 722
732<p> 723<p>
733Here is a sample sched_config: 724Here is a sample sched_config:
734</p> 725</p>
770set server scheduler_iteration = 60 761set server scheduler_iteration = 60
771</pre> 762</pre>
772 763
773<p> 764<p>
774To submit a task to OpenPBS, the command <c>qsub</c> is used with some 765To submit a task to OpenPBS, the command <c>qsub</c> is used with some
775optional parameters. In the exemple below, "-l" allows you to specify 766optional parameters. In the example below, "-l" allows you to specify
776the resources required, "-j" provides for redirection of standard out and 767the resources required, "-j" provides for redirection of standard out and
777standard error, and the "-m" will e-mail the user at begining (b), end (e) 768standard error, and the "-m" will e-mail the user at beginning (b), end (e)
778and on abort (a) of the job. 769and on abort (a) of the job.
779</p> 770</p>
780 771
781<pre caption="Submitting a task"> 772<pre caption="Submitting a task">
782<comment>(submit and request from OpenPBS that myscript be executed on 2 nodes)</comment> 773<comment>(submit and request from OpenPBS that myscript be executed on 2 nodes)</comment>
833You may need to export a mpich work directory to all your slave nodes in 824You may need to export a mpich work directory to all your slave nodes in
834<path>/etc/exports</path>: 825<path>/etc/exports</path>:
835</p> 826</p>
836 827
837<pre caption="/etc/exports"> 828<pre caption="/etc/exports">
838/home *(rw) 829/home *(rw)
839</pre> 830</pre>
840 831
841<p> 832<p>
842Most massively parallel processors (MPPs) provide a way to start a program on 833Most massively parallel processors (MPPs) provide a way to start a program on
843a requested number of processors; <c>mpirun</c> makes use of the appropriate 834a requested number of processors; <c>mpirun</c> makes use of the appropriate
917If <c>tstmachines</c> finds a problem, it will suggest possible reasons and 908If <c>tstmachines</c> finds a problem, it will suggest possible reasons and
918solutions. In brief, there are three tests: 909solutions. In brief, there are three tests:
919</p> 910</p>
920 911
921<ul> 912<ul>
922 <li> 913 <li>
923 <e>Can processes be started on remote machines?</e> tstmachines attempts 914 <e>Can processes be started on remote machines?</e> tstmachines attempts
924 to run the shell command true on each machine in the machines files by 915 to run the shell command true on each machine in the machines files by
925 using the remote shell command. 916 using the remote shell command.
926 </li> 917 </li>
927 <li> 918 <li>
928 <e>Is current working directory available to all machines?</e> This 919 <e>Is current working directory available to all machines?</e> This
929 attempts to ls a file that tstmachines creates by running ls using the 920 attempts to ls a file that tstmachines creates by running ls using the
930 remote shell command. 921 remote shell command.
931 </li> 922 </li>
932 <li> 923 <li>
933 <e>Can user programs be run on remote systems?</e> This checks that shared 924 <e>Can user programs be run on remote systems?</e> This checks that shared
934 libraries and other components have been properly installed on all 925 libraries and other components have been properly installed on all
935 machines. 926 machines.
936 </li> 927 </li>
937</ul> 928</ul>
988link="http://www.cyberlogic.ca">Cyberlogic</uri>'s Adelie Linux R&amp;D 979link="http://www.cyberlogic.ca">Cyberlogic</uri>'s Adelie Linux R&amp;D
989Centre. 980Centre.
990</p> 981</p>
991 982
992<ul> 983<ul>
993 <li><uri>http://www.gentoo.org</uri>, Gentoo Technologies, Inc.</li> 984 <li><uri>http://www.gentoo.org</uri>, Gentoo Foundation, Inc.</li>
994 <li> 985 <li>
995 <uri link="http://www.adelielinux.com">http://www.adelielinux.com</uri>, 986 <uri link="http://www.adelielinux.com">http://www.adelielinux.com</uri>,
996 Adelie Linux Research and Development Centre 987 Adelie Linux Research and Development Centre
997 </li> 988 </li>
998 <li> 989 <li>
999 <uri link="http://nfs.sourceforge.net/">http://nfs.sourceforge.net</uri>, 990 <uri link="http://nfs.sourceforge.net/">http://nfs.sourceforge.net</uri>,
1000 Linux NFS Project 991 Linux NFS Project
1001 </li> 992 </li>
1002 <li> 993 <li>
1003 <uri link="http://www-unix.mcs.anl.gov/mpi/mpich/">http://www-unix.mcs.anl.gov/mpi/mpich/</uri>, 994 <uri link="http://www-unix.mcs.anl.gov/mpi/mpich/">http://www-unix.mcs.anl.gov/mpi/mpich/</uri>,
1004 Mathematics and Computer Science Division, Argonne National Laboratory 995 Mathematics and Computer Science Division, Argonne National Laboratory
1005 </li> 996 </li>
1006 <li> 997 <li>
1007 <uri link="http://www.ntp.org/">http://ntp.org</uri> 998 <uri link="http://www.ntp.org/">http://ntp.org</uri>
1008 </li> 999 </li>
1009 <li> 1000 <li>
1010 <uri link="http://www.eecis.udel.edu/~mills/">http://www.eecis.udel.edu/~mills/</uri>, 1001 <uri link="http://www.eecis.udel.edu/~mills/">http://www.eecis.udel.edu/~mills/</uri>,
1011 David L. Mills, University of Delaware 1002 David L. Mills, University of Delaware
1012 </li> 1003 </li>
1013 <li> 1004 <li>
1014 <uri link="http://www.ietf.org/html.charters/secsh-charter.html">http://www.ietf.org/html.charters/secsh-charter.html</uri>, 1005 <uri link="http://www.ietf.org/html.charters/secsh-charter.html">http://www.ietf.org/html.charters/secsh-charter.html</uri>,
1015 Secure Shell Working Group, IETF, Internet Society 1006 Secure Shell Working Group, IETF, Internet Society
1016 </li> 1007 </li>
1017 <li> 1008 <li>
1018 <uri link="http://www.linuxsecurity.com/">http://www.linuxsecurity.com/</uri>, 1009 <uri link="http://www.linuxsecurity.com/">http://www.linuxsecurity.com/</uri>,
1019 Guardian Digital 1010 Guardian Digital
1020 </li> 1011 </li>
1021 <li> 1012 <li>
1022 <uri link="http://www.openpbs.org/">http://www.openpbs.org/</uri>, 1013 <uri link="http://www.openpbs.org/">http://www.openpbs.org/</uri>,
1023 Altair Grid Technologies, LLC. 1014 Altair Grid Technologies, LLC.
1024 </li> 1015 </li>
1025</ul> 1016</ul>
1026 1017

Legend:
Removed from v.1.4  
changed lines
  Added in v.1.13

  ViewVC Help
Powered by ViewVC 1.1.20