/[gentoo]/xml/htdocs/doc/en/hpc-howto.xml
Gentoo

Diff of /xml/htdocs/doc/en/hpc-howto.xml

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

Revision 1.3 Revision 1.13
1<?xml version='1.0' encoding="UTF-8"?> 1<?xml version='1.0' encoding="UTF-8"?>
2
3<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/hpc-howto.xml,v 1.3 2005/05/13 20:15:50 neysx Exp $ --> 2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/hpc-howto.xml,v 1.13 2006/12/18 21:47:19 nightmorph Exp $ -->
4
5<!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> 3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd">
4
6<guide link="/doc/en/hpc-howto.xml"> 5<guide link="/doc/en/hpc-howto.xml">
7
8<title>High Performance Computing on Gentoo Linux</title> 6<title>High Performance Computing on Gentoo Linux</title>
9 7
10<author title="Author"> 8<author title="Author">
11 <mail link="marc@adelielinux.com">Marc St-Pierre</mail> 9 <mail link="marc@adelielinux.com">Marc St-Pierre</mail>
12</author> 10</author>
18</author> 16</author>
19<author title="Assistant/Research"> 17<author title="Assistant/Research">
20 <mail link="olivier@adelielinux.com">Olivier Crete</mail> 18 <mail link="olivier@adelielinux.com">Olivier Crete</mail>
21</author> 19</author>
22<author title="Reviewer"> 20<author title="Reviewer">
23 <mail link="spyderous@gentoo.org">Donnie Berkholz</mail> 21 <mail link="dberkholz@gentoo.org">Donnie Berkholz</mail>
24</author> 22</author>
25 23
26<!-- No licensing information; this document has been written by a third-party 24<!-- No licensing information; this document has been written by a third-party
27 organisation without additional licensing information. 25 organisation without additional licensing information.
28 26
32--> 30-->
33 31
34<abstract> 32<abstract>
35This document was written by people at the Adelie Linux R&amp;D Center 33This document was written by people at the Adelie Linux R&amp;D Center
36&lt;http://www.adelielinux.com&gt; as a step-by-step guide to turn a Gentoo 34&lt;http://www.adelielinux.com&gt; as a step-by-step guide to turn a Gentoo
37System into an High Performance Computing (HPC) system. 35System into a High Performance Computing (HPC) system.
38</abstract> 36</abstract>
39 37
40<version>1.0</version> 38<version>1.6</version>
41<date>2003-08-01</date> 39<date>2006-12-18</date>
42 40
43<chapter> 41<chapter>
44<title>Introduction</title> 42<title>Introduction</title>
45<section> 43<section>
46<body> 44<body>
85We refer to the <uri link="/doc/en/handbook/">Gentoo Linux Handbooks</uri> in 83We refer to the <uri link="/doc/en/handbook/">Gentoo Linux Handbooks</uri> in
86this section. 84this section.
87</note> 85</note>
88 86
89<p> 87<p>
90During the installation process, you will have to set your USE variables in 88During the installation process, you will have to set your USE variables in
91<path>/etc/make.conf</path>. We recommended that you deactivate all the 89<path>/etc/make.conf</path>. We recommended that you deactivate all the
92defaults (see <path>/etc/make.profile/make.defaults</path>) by negating them 90defaults (see <path>/etc/make.profile/make.defaults</path>) by negating them in
93in make.conf. However, you may want to keep such use variables as x86, 3dnow, 91make.conf. However, you may want to keep such use variables as x86, 3dnow, gpm,
94gpm, mmx, sse, ncurses, pam and tcpd. Refer to the USE documentation for more 92mmx, nptl, nptlonly, sse, ncurses, pam and tcpd. Refer to the USE documentation
95information. 93for more information.
96</p> 94</p>
97 95
98<pre caption="USE Flags"> 96<pre caption="USE Flags">
99# Copyright 2000-2003 Daniel Robbins, Gentoo Technologies, Inc.
100# Contains local system settings for Portage system
101
102# Please review 'man make.conf' for more information.
103
104USE="-oss 3dnow -apm -arts -avi -berkdb -crypt -cups -encode -gdbm 97USE="-oss 3dnow -apm -arts -avi -berkdb -crypt -cups -encode -gdbm -gif gpm -gtk
105-gif gpm -gtk -imlib -java -jpeg -kde -gnome -libg++ -libwww -mikmod 98-imlib -java -jpeg -kde -gnome -libg++ -libwww -mikmod mmx -motif -mpeg ncurses
106mmx -motif -mpeg ncurses -nls -oggvorbis -opengl pam -pdflib -png 99-nls nptl nptlonly -oggvorbis -opengl pam -pdflib -png -python -qt3 -qt4 -qtmt
107-python -qt -qtmt -quicktime -readline -sdl -slang -spell -ssl 100-quicktime -readline -sdl -slang -spell -ssl -svga tcpd -truetype -X -xml2 -xv
108-svga tcpd -truetype -X -xml2 -xmms -xv -zlib" 101-zlib"
109</pre> 102</pre>
110 103
111<p> 104<p>
112Or simply: 105Or simply:
113</p> 106</p>
114 107
115<pre caption="USE Flags - simplified version"> 108<pre caption="USE Flags - simplified version">
116# Copyright 2000-2003 Daniel Robbins, Gentoo Technologies, Inc.
117# Contains local system settings for Portage system
118
119# Please review 'man make.conf' for more information.
120
121USE="-* 3dnow gpm mmx ncurses pam sse tcpd" 109USE="-* 3dnow gpm mmx ncurses pam sse tcpd"
122</pre> 110</pre>
123 111
124<note> 112<note>
125The <e>tcpd</e> USE flag increases security for packages such as xinetd. 113The <e>tcpd</e> USE flag increases security for packages such as xinetd.
172</p> 160</p>
173 161
174<p> 162<p>
175The slave nodes listen for instructions (via ssh/rsh perhaps) from the master 163The slave nodes listen for instructions (via ssh/rsh perhaps) from the master
176node. They should be dedicated to crunching results and therefore should not 164node. They should be dedicated to crunching results and therefore should not
177run any unecessary services. 165run any unnecessary services.
178</p> 166</p>
179 167
180<p> 168<p>
181The rest of this documentation will assume a cluster configuration as per the 169The rest of this documentation will assume a cluster configuration as per the
182hosts file below. You should maintain on every node such a hosts file 170hosts file below. You should maintain on every node such a hosts file
186 174
187<pre caption="/etc/hosts"> 175<pre caption="/etc/hosts">
188# Adelie Linux Research &amp; Development Center 176# Adelie Linux Research &amp; Development Center
189# /etc/hosts 177# /etc/hosts
190 178
191127.0.0.1 localhost 179127.0.0.1 localhost
192 180
193192.168.1.100 master.adelie master 181192.168.1.100 master.adelie master
194 182
195192.168.1.1 node01.adelie node01 183192.168.1.1 node01.adelie node01
196192.168.1.2 node02.adelie node02 184192.168.1.2 node02.adelie node02
197</pre> 185</pre>
198 186
199<p> 187<p>
200To setup your cluster dedicated LAN, edit your <path>/etc/conf.d/net</path> 188To setup your cluster dedicated LAN, edit your <path>/etc/conf.d/net</path>
201file on the master node. 189file on the master node.
202</p> 190</p>
203 191
204<pre caption="/etc/conf.d/net"> 192<pre caption="/etc/conf.d/net">
205# Copyright 1999-2002 Gentoo Technologies, Inc.
206# Distributed under the terms of the GNU General Public License, v2 or later
207
208# Global config file for net.* rc-scripts 193# Global config file for net.* rc-scripts
209 194
210# This is basically the ifconfig argument without the ifconfig $iface 195# This is basically the ifconfig argument without the ifconfig $iface
211# 196#
212 197
233 option domain-name "adelie"; 218 option domain-name "adelie";
234 range 192.168.1.10 192.168.1.99; 219 range 192.168.1.10 192.168.1.99;
235 option routers 192.168.1.100; 220 option routers 192.168.1.100;
236 221
237 host node01.adelie { 222 host node01.adelie {
238 # MAC address of network card on node 01 223 # MAC address of network card on node 01
239 hardware ethernet 00:07:e9:0f:e2:d4; 224 hardware ethernet 00:07:e9:0f:e2:d4;
240 fixed-address 192.168.1.1; 225 fixed-address 192.168.1.1;
241 } 226 }
242 host node02.adelie { 227 host node02.adelie {
243 # MAC address of network card on node 02 228 # MAC address of network card on node 02
244 hardware ethernet 00:07:e9:0f:e2:6b; 229 hardware ethernet 00:07:e9:0f:e2:6b;
245 fixed-address 192.168.1.2; 230 fixed-address 192.168.1.2;
246 } 231 }
247} 232}
248</pre> 233</pre>
260</p> 245</p>
261 246
262<p> 247<p>
263There are other systems that provide similar functionality to NFS which could 248There are other systems that provide similar functionality to NFS which could
264be used in a cluster environment. The <uri 249be used in a cluster environment. The <uri
265link="http://www.transarc.com/Product/EFS/AFS/index.html">Andrew File System 250link="http://www.openafs.org">Andrew File System
266from IBM</uri>, recently open-sourced, provides a file sharing mechanism with 251from IBM</uri>, recently open-sourced, provides a file sharing mechanism with
267some additional security and performance features. The <uri 252some additional security and performance features. The <uri
268link="http://www.coda.cs.cmu.edu/">Coda File System</uri> is still in 253link="http://www.coda.cs.cmu.edu/">Coda File System</uri> is still in
269development, but is designed to work well with disconnected clients. Many 254development, but is designed to work well with disconnected clients. Many
270of the features of the Andrew and Coda file systems are slated for inclusion 255of the features of the Andrew and Coda file systems are slated for inclusion
301portmap:192.168.1.0/255.255.255.0 286portmap:192.168.1.0/255.255.255.0
302</pre> 287</pre>
303 288
304<p> 289<p>
305Edit the <path>/etc/exports</path> file of the master node to export a work 290Edit the <path>/etc/exports</path> file of the master node to export a work
306directory struture (/home is good for this). 291directory structure (/home is good for this).
307</p> 292</p>
308 293
309<pre caption="/etc/exports"> 294<pre caption="/etc/exports">
310/home/ *(rw) 295/home/ *(rw)
311</pre> 296</pre>
312 297
313<p> 298<p>
314Add nfs to your master node's default runlevel: 299Add nfs to your master node's default runlevel:
315</p> 300</p>
323configure your salve nodes' <path>/etc/fstab</path>. Add a line like this 308configure your salve nodes' <path>/etc/fstab</path>. Add a line like this
324one: 309one:
325</p> 310</p>
326 311
327<pre caption="/etc/fstab"> 312<pre caption="/etc/fstab">
328master:/home/ /home nfs rw,exec,noauto,nouser,async 0 0 313master:/home/ /home nfs rw,exec,noauto,nouser,async 0 0
329</pre> 314</pre>
330 315
331<p> 316<p>
332You'll also need to set up your nodes so that they mount the nfs filesystem by 317You'll also need to set up your nodes so that they mount the nfs filesystem by
333issuing this command: 318issuing this command:
360 <li>Generate public and private keys</li> 345 <li>Generate public and private keys</li>
361 <li>Copy public key to slave nodes</li> 346 <li>Copy public key to slave nodes</li>
362</ul> 347</ul>
363 348
364<p> 349<p>
365For user based authentification, generate and copy as follows: 350For user based authentication, generate and copy as follows:
366</p> 351</p>
367 352
368<pre caption="SSH key authentication"> 353<pre caption="SSH key authentication">
369# <i>ssh-keygen -t dsa</i> 354# <i>ssh-keygen -t dsa</i>
370Generating public/private dsa key pair. 355Generating public/private dsa key pair.
388id_dsa.pub 100% 234 2.0MB/s 00:00 373id_dsa.pub 100% 234 2.0MB/s 00:00
389</pre> 374</pre>
390 375
391<note> 376<note>
392Host keys must have an empty passphrase. RSA is required for host-based 377Host keys must have an empty passphrase. RSA is required for host-based
393authentification. 378authentication.
394</note> 379</note>
395 380
396<p> 381<p>
397For host based authentication, you will also need to edit your 382For host based authentication, you will also need to edit your
398<path>/etc/ssh/shosts.equiv</path>. 383<path>/etc/ssh/shosts.equiv</path>.
475 460
476ALL:192.168.1.0/255.255.255.0 461ALL:192.168.1.0/255.255.255.0
477</pre> 462</pre>
478 463
479<p> 464<p>
480Finally, configure host authentification from <path>/etc/hosts.equiv</path>. 465Finally, configure host authentication from <path>/etc/hosts.equiv</path>.
481</p> 466</p>
482 467
483<pre caption="hosts.equiv"> 468<pre caption="hosts.equiv">
484# Adelie Linux Research &amp; Development Center 469# Adelie Linux Research &amp; Development Center
485# /etc/hosts.equiv 470# /etc/hosts.equiv
520Servers</uri>, and configure your <path>/etc/conf.d/ntp</path> and 505Servers</uri>, and configure your <path>/etc/conf.d/ntp</path> and
521<path>/etc/ntp.conf</path> files on the master node. 506<path>/etc/ntp.conf</path> files on the master node.
522</p> 507</p>
523 508
524<pre caption="Master /etc/conf.d/ntp"> 509<pre caption="Master /etc/conf.d/ntp">
525# Copyright 1999-2002 Gentoo Technologies, Inc.
526# Distributed under the terms of the GNU General Public License v2
527# /etc/conf.d/ntpd 510# /etc/conf.d/ntpd
528 511
529# NOTES: 512# NOTES:
530# - NTPDATE variables below are used if you wish to set your 513# - NTPDATE variables below are used if you wish to set your
531# clock when you start the ntp init.d script 514# clock when you start the ntp init.d script
546NTPDATE_CMD="ntpdate" 529NTPDATE_CMD="ntpdate"
547 530
548# Options to pass to the above command 531# Options to pass to the above command
549# Most people should just uncomment this variable and 532# Most people should just uncomment this variable and
550# change 'someserver' to a valid hostname which you 533# change 'someserver' to a valid hostname which you
551# can aquire from the URL's below 534# can acquire from the URL's below
552NTPDATE_OPTS="-b ntp1.cmc.ec.gc.ca" 535NTPDATE_OPTS="-b ntp1.cmc.ec.gc.ca"
553 536
554## 537##
555# A list of available servers is available here: 538# A list of available servers is available here:
556# http://www.eecis.udel.edu/~mills/ntp/servers.html 539# http://www.eecis.udel.edu/~mills/ntp/servers.html
593And on all your slave nodes, setup your synchronization source as your master 576And on all your slave nodes, setup your synchronization source as your master
594node. 577node.
595</p> 578</p>
596 579
597<pre caption="Node /etc/conf.d/ntp"> 580<pre caption="Node /etc/conf.d/ntp">
598# Copyright 1999-2002 Gentoo Technologies, Inc.
599# Distributed under the terms of the GNU General Public License v2
600# /etc/conf.d/ntpd 581# /etc/conf.d/ntpd
601 582
602NTPDATE_WARN="n" 583NTPDATE_WARN="n"
603NTPDATE_CMD="ntpdate" 584NTPDATE_CMD="ntpdate"
604NTPDATE_OPTS="-b master" 585NTPDATE_OPTS="-b master"
667And the rules required for this firewall: 648And the rules required for this firewall:
668</p> 649</p>
669 650
670<pre caption="rule-save"> 651<pre caption="rule-save">
671# Adelie Linux Research &amp; Development Center 652# Adelie Linux Research &amp; Development Center
672# /var/lib/iptbles/rule-save 653# /var/lib/iptables/rule-save
673 654
674*filter 655*filter
675:INPUT ACCEPT [0:0] 656:INPUT ACCEPT [0:0]
676:FORWARD ACCEPT [0:0] 657:FORWARD ACCEPT [0:0]
677:OUTPUT ACCEPT [0:0] 658:OUTPUT ACCEPT [0:0]
730Before starting using OpenPBS, some configurations are required. The files 711Before starting using OpenPBS, some configurations are required. The files
731you will need to personalize for your system are: 712you will need to personalize for your system are:
732</p> 713</p>
733 714
734<ul> 715<ul>
735 <li>/etc/pbs_environment</li> 716 <li>/etc/pbs_environment</li>
736 <li>/var/spool/PBS/server_name</li> 717 <li>/var/spool/PBS/server_name</li>
737 <li>/var/spool/PBS/server_priv/nodes</li> 718 <li>/var/spool/PBS/server_priv/nodes</li>
738 <li>/var/spool/PBS/mom_priv/config</li> 719 <li>/var/spool/PBS/mom_priv/config</li>
739 <li>/var/spool/PBS/sched_priv/sched_config</li> 720 <li>/var/spool/PBS/sched_priv/sched_config</li>
740</ul> 721</ul>
741 722
742<p> 723<p>
743Here is a sample sched_config: 724Here is a sample sched_config:
744</p> 725</p>
780set server scheduler_iteration = 60 761set server scheduler_iteration = 60
781</pre> 762</pre>
782 763
783<p> 764<p>
784To submit a task to OpenPBS, the command <c>qsub</c> is used with some 765To submit a task to OpenPBS, the command <c>qsub</c> is used with some
785optional parameters. In the exemple below, "-l" allows you to specify 766optional parameters. In the example below, "-l" allows you to specify
786the resources required, "-j" provides for redirection of standard out and 767the resources required, "-j" provides for redirection of standard out and
787standard error, and the "-m" will e-mail the user at begining (b), end (e) 768standard error, and the "-m" will e-mail the user at beginning (b), end (e)
788and on abort (a) of the job. 769and on abort (a) of the job.
789</p> 770</p>
790 771
791<pre caption="Submitting a task"> 772<pre caption="Submitting a task">
792<comment>(submit and request from OpenPBS that myscript be executed on 2 nodes)</comment> 773<comment>(submit and request from OpenPBS that myscript be executed on 2 nodes)</comment>
843You may need to export a mpich work directory to all your slave nodes in 824You may need to export a mpich work directory to all your slave nodes in
844<path>/etc/exports</path>: 825<path>/etc/exports</path>:
845</p> 826</p>
846 827
847<pre caption="/etc/exports"> 828<pre caption="/etc/exports">
848/home *(rw) 829/home *(rw)
849</pre> 830</pre>
850 831
851<p> 832<p>
852Most massively parallel processors (MPPs) provide a way to start a program on 833Most massively parallel processors (MPPs) provide a way to start a program on
853a requested number of processors; <c>mpirun</c> makes use of the appropriate 834a requested number of processors; <c>mpirun</c> makes use of the appropriate
927If <c>tstmachines</c> finds a problem, it will suggest possible reasons and 908If <c>tstmachines</c> finds a problem, it will suggest possible reasons and
928solutions. In brief, there are three tests: 909solutions. In brief, there are three tests:
929</p> 910</p>
930 911
931<ul> 912<ul>
932 <li> 913 <li>
933 <e>Can processes be started on remote machines?</e> tstmachines attempts 914 <e>Can processes be started on remote machines?</e> tstmachines attempts
934 to run the shell command true on each machine in the machines files by 915 to run the shell command true on each machine in the machines files by
935 using the remote shell command. 916 using the remote shell command.
936 </li> 917 </li>
937 <li> 918 <li>
938 <e>Is current working directory available to all machines?</e> This 919 <e>Is current working directory available to all machines?</e> This
939 attempts to ls a file that tstmachines creates by running ls using the 920 attempts to ls a file that tstmachines creates by running ls using the
940 remote shell command. 921 remote shell command.
941 </li> 922 </li>
942 <li> 923 <li>
943 <e>Can user programs be run on remote systems?</e> This checks that shared 924 <e>Can user programs be run on remote systems?</e> This checks that shared
944 libraries and other components have been properly installed on all 925 libraries and other components have been properly installed on all
945 machines. 926 machines.
946 </li> 927 </li>
947</ul> 928</ul>
998link="http://www.cyberlogic.ca">Cyberlogic</uri>'s Adelie Linux R&amp;D 979link="http://www.cyberlogic.ca">Cyberlogic</uri>'s Adelie Linux R&amp;D
999Centre. 980Centre.
1000</p> 981</p>
1001 982
1002<ul> 983<ul>
1003 <li><uri>http://www.gentoo.org</uri>, Gentoo Technologies, Inc.</li> 984 <li><uri>http://www.gentoo.org</uri>, Gentoo Foundation, Inc.</li>
1004 <li> 985 <li>
1005 <uri link="http://www.adelielinux.com">http://www.adelielinux.com</uri>, 986 <uri link="http://www.adelielinux.com">http://www.adelielinux.com</uri>,
1006 Adelie Linux Research and Development Centre 987 Adelie Linux Research and Development Centre
1007 </li> 988 </li>
1008 <li> 989 <li>
1009 <uri link="http://nfs.sourceforge.net/">http://nfs.sourceforge.net</uri>, 990 <uri link="http://nfs.sourceforge.net/">http://nfs.sourceforge.net</uri>,
1010 Linux NFS Project 991 Linux NFS Project
1011 </li> 992 </li>
1012 <li> 993 <li>
1013 <uri link="http://www-unix.mcs.anl.gov/mpi/mpich/">http://www-unix.mcs.anl.gov/mpi/mpich/</uri>, 994 <uri link="http://www-unix.mcs.anl.gov/mpi/mpich/">http://www-unix.mcs.anl.gov/mpi/mpich/</uri>,
1014 Mathematics and Computer Science Division, Argonne National Laboratory 995 Mathematics and Computer Science Division, Argonne National Laboratory
1015 </li> 996 </li>
1016 <li> 997 <li>
1017 <uri link="http://www.ntp.org/">http://ntp.org</uri> 998 <uri link="http://www.ntp.org/">http://ntp.org</uri>
1018 </li> 999 </li>
1019 <li> 1000 <li>
1020 <uri link="http://www.eecis.udel.edu/~mills/">http://www.eecis.udel.edu/~mills/</uri>, 1001 <uri link="http://www.eecis.udel.edu/~mills/">http://www.eecis.udel.edu/~mills/</uri>,
1021 David L. Mills, University of Delaware 1002 David L. Mills, University of Delaware
1022 </li> 1003 </li>
1023 <li> 1004 <li>
1024 <uri link="http://www.ietf.org/html.charters/secsh-charter.html">http://www.ietf.org/html.charters/secsh-charter.html</uri>, 1005 <uri link="http://www.ietf.org/html.charters/secsh-charter.html">http://www.ietf.org/html.charters/secsh-charter.html</uri>,
1025 Secure Shell Working Group, IETF, Internet Society 1006 Secure Shell Working Group, IETF, Internet Society
1026 </li> 1007 </li>
1027 <li> 1008 <li>
1028 <uri link="http://www.linuxsecurity.com/">http://www.linuxsecurity.com/</uri>, 1009 <uri link="http://www.linuxsecurity.com/">http://www.linuxsecurity.com/</uri>,
1029 Guardian Digital 1010 Guardian Digital
1030 </li> 1011 </li>
1031 <li> 1012 <li>
1032 <uri link="http://www.openpbs.org/">http://www.openpbs.org/</uri>, 1013 <uri link="http://www.openpbs.org/">http://www.openpbs.org/</uri>,
1033 Altair Grid Technologies, LLC. 1014 Altair Grid Technologies, LLC.
1034 </li> 1015 </li>
1035</ul> 1016</ul>
1036 1017

Legend:
Removed from v.1.3  
changed lines
  Added in v.1.13

  ViewVC Help
Powered by ViewVC 1.1.20