| 1 | <?xml version='1.0' encoding="UTF-8"?> |
1 | <?xml version='1.0' encoding="UTF-8"?> |
| 2 | <!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/hpc-howto.xml,v 1.5 2005/10/04 19:05:41 rane Exp $ --> |
2 | <!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/hpc-howto.xml,v 1.6 2005/10/04 22:39:47 rane Exp $ --> |
| 3 | <!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> |
3 | <!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> |
| 4 | |
4 | |
| 5 | <guide link="/doc/en/hpc-howto.xml"> |
5 | <guide link="/doc/en/hpc-howto.xml"> |
| 6 | <title>High Performance Computing on Gentoo Linux</title> |
6 | <title>High Performance Computing on Gentoo Linux</title> |
| 7 | |
7 | |
| 8 | <author title="Author"> |
8 | <author title="Author"> |
| 9 | <mail link="marc@adelielinux.com">Marc St-Pierre</mail> |
9 | <mail link="marc@adelielinux.com">Marc St-Pierre</mail> |
| 10 | </author> |
10 | </author> |
| 11 | <author title="Author"> |
11 | <author title="Author"> |
| 12 | <mail link="benoit@adelielinux.com">Benoit Morin</mail> |
12 | <mail link="benoit@adelielinux.com">Benoit Morin</mail> |
| 13 | </author> |
13 | </author> |
| 14 | <author title="Assistant/Research"> |
14 | <author title="Assistant/Research"> |
| 15 | <mail link="jean-francois@adelielinux.com">Jean-Francois Richard</mail> |
15 | <mail link="jean-francois@adelielinux.com">Jean-Francois Richard</mail> |
| 16 | </author> |
16 | </author> |
| 17 | <author title="Assistant/Research"> |
17 | <author title="Assistant/Research"> |
| … | |
… | |
| 150 | <p> |
150 | <p> |
| 151 | A cluster is composed of two node types: master and slave. Typically, your |
151 | A cluster is composed of two node types: master and slave. Typically, your |
| 152 | cluster will have one master node and several slave nodes. |
152 | cluster will have one master node and several slave nodes. |
| 153 | </p> |
153 | </p> |
| 154 | |
154 | |
| 155 | <p> |
155 | <p> |
| 156 | The master node is the cluster's server. It is responsible for telling the |
156 | The master node is the cluster's server. It is responsible for telling the |
| 157 | slave nodes what to do. This server will typically run such daemons as dhcpd, |
157 | slave nodes what to do. This server will typically run such daemons as dhcpd, |
| 158 | nfs, pbs-server, and pbs-sched. Your master node will allow interactive |
158 | nfs, pbs-server, and pbs-sched. Your master node will allow interactive |
| 159 | sessions for users, and accept job executions. |
159 | sessions for users, and accept job executions. |
| 160 | </p> |
160 | </p> |
| 161 | |
161 | |
| 162 | <p> |
162 | <p> |
| 163 | The slave nodes listen for instructions (via ssh/rsh perhaps) from the master |
163 | The slave nodes listen for instructions (via ssh/rsh perhaps) from the master |
| 164 | node. They should be dedicated to crunching results and therefore should not |
164 | node. They should be dedicated to crunching results and therefore should not |
| 165 | run any unecessary services. |
165 | run any unnecessary services. |
| 166 | </p> |
166 | </p> |
| 167 | |
167 | |
| 168 | <p> |
168 | <p> |
| 169 | The rest of this documentation will assume a cluster configuration as per the |
169 | The rest of this documentation will assume a cluster configuration as per the |
| 170 | hosts file below. You should maintain on every node such a hosts file |
170 | hosts file below. You should maintain on every node such a hosts file |
| 171 | (<path>/etc/hosts</path>) with entries for each node participating node in the |
171 | (<path>/etc/hosts</path>) with entries for each node participating node in the |
| 172 | cluster. |
172 | cluster. |
| 173 | </p> |
173 | </p> |
| 174 | |
174 | |
| 175 | <pre caption="/etc/hosts"> |
175 | <pre caption="/etc/hosts"> |
| 176 | # Adelie Linux Research & Development Center |
176 | # Adelie Linux Research & Development Center |
| 177 | # /etc/hosts |
177 | # /etc/hosts |
| 178 | |
178 | |
| 179 | 127.0.0.1 localhost |
179 | 127.0.0.1 localhost |
| 180 | |
180 | |
| 181 | 192.168.1.100 master.adelie master |
181 | 192.168.1.100 master.adelie master |
| 182 | |
182 | |
| 183 | 192.168.1.1 node01.adelie node01 |
183 | 192.168.1.1 node01.adelie node01 |
| 184 | 192.168.1.2 node02.adelie node02 |
184 | 192.168.1.2 node02.adelie node02 |
| 185 | </pre> |
185 | </pre> |
| 186 | |
186 | |
| 187 | <p> |
187 | <p> |
| 188 | To setup your cluster dedicated LAN, edit your <path>/etc/conf.d/net</path> |
188 | To setup your cluster dedicated LAN, edit your <path>/etc/conf.d/net</path> |
| 189 | file on the master node. |
189 | file on the master node. |
| 190 | </p> |
190 | </p> |
| 191 | |
191 | |
| 192 | <pre caption="/etc/conf.d/net"> |
192 | <pre caption="/etc/conf.d/net"> |
| 193 | # Copyright 1999-2002 Gentoo Technologies, Inc. |
193 | # Copyright 1999-2002 Gentoo Technologies, Inc. |
| 194 | # Distributed under the terms of the GNU General Public License, v2 or later |
194 | # Distributed under the terms of the GNU General Public License, v2 or later |
| 195 | |
195 | |
| 196 | # Global config file for net.* rc-scripts |
196 | # Global config file for net.* rc-scripts |
| 197 | |
197 | |
| 198 | # This is basically the ifconfig argument without the ifconfig $iface |
198 | # This is basically the ifconfig argument without the ifconfig $iface |
| 199 | # |
199 | # |
| … | |
… | |
| 211 | |
211 | |
| 212 | <pre caption="/etc/dhcp/dhcpd.conf"> |
212 | <pre caption="/etc/dhcp/dhcpd.conf"> |
| 213 | # Adelie Linux Research & Development Center |
213 | # Adelie Linux Research & Development Center |
| 214 | # /etc/dhcp/dhcpd.conf |
214 | # /etc/dhcp/dhcpd.conf |
| 215 | |
215 | |
| 216 | log-facility local7; |
216 | log-facility local7; |
| 217 | ddns-update-style none; |
217 | ddns-update-style none; |
| 218 | use-host-decl-names on; |
218 | use-host-decl-names on; |
| 219 | |
219 | |
| 220 | subnet 192.168.1.0 netmask 255.255.255.0 { |
220 | subnet 192.168.1.0 netmask 255.255.255.0 { |
| 221 | option domain-name "adelie"; |
221 | option domain-name "adelie"; |
| 222 | range 192.168.1.10 192.168.1.99; |
222 | range 192.168.1.10 192.168.1.99; |
| 223 | option routers 192.168.1.100; |
223 | option routers 192.168.1.100; |
| 224 | |
224 | |
| 225 | host node01.adelie { |
225 | host node01.adelie { |
| 226 | # MAC address of network card on node 01 |
226 | # MAC address of network card on node 01 |
| 227 | hardware ethernet 00:07:e9:0f:e2:d4; |
227 | hardware ethernet 00:07:e9:0f:e2:d4; |
| 228 | fixed-address 192.168.1.1; |
228 | fixed-address 192.168.1.1; |
| 229 | } |
229 | } |
| 230 | host node02.adelie { |
230 | host node02.adelie { |
| 231 | # MAC address of network card on node 02 |
231 | # MAC address of network card on node 02 |
| 232 | hardware ethernet 00:07:e9:0f:e2:6b; |
232 | hardware ethernet 00:07:e9:0f:e2:6b; |
| 233 | fixed-address 192.168.1.2; |
233 | fixed-address 192.168.1.2; |
| 234 | } |
234 | } |
| 235 | } |
235 | } |
| 236 | </pre> |
236 | </pre> |
| 237 | |
237 | |
| 238 | </body> |
238 | </body> |
| 239 | </section> |
239 | </section> |
| 240 | <section> |
240 | <section> |
| 241 | <title>NFS/NIS</title> |
241 | <title>NFS/NIS</title> |
| 242 | <body> |
242 | <body> |
| 243 | |
243 | |
| 244 | <p> |
244 | <p> |
| 245 | The Network File System (NFS) was developed to allow machines to mount a disk |
245 | The Network File System (NFS) was developed to allow machines to mount a disk |
| 246 | partition on a remote machine as if it were on a local hard drive. This allows |
246 | partition on a remote machine as if it were on a local hard drive. This allows |
| … | |
… | |
| 279 | CONFIG_LOCKD_V4=y |
279 | CONFIG_LOCKD_V4=y |
| 280 | </pre> |
280 | </pre> |
| 281 | |
281 | |
| 282 | <p> |
282 | <p> |
| 283 | On the master node, edit your <path>/etc/hosts.allow</path> file to allow |
283 | On the master node, edit your <path>/etc/hosts.allow</path> file to allow |
| 284 | connections from slave nodes. If your cluster LAN is on 192.168.1.0/24, |
284 | connections from slave nodes. If your cluster LAN is on 192.168.1.0/24, |
| 285 | your <path>hosts.allow</path> will look like: |
285 | your <path>hosts.allow</path> will look like: |
| 286 | </p> |
286 | </p> |
| 287 | |
287 | |
| 288 | <pre caption="hosts.allow"> |
288 | <pre caption="hosts.allow"> |
| 289 | portmap:192.168.1.0/255.255.255.0 |
289 | portmap:192.168.1.0/255.255.255.0 |
| 290 | </pre> |
290 | </pre> |
| 291 | |
291 | |
| 292 | <p> |
292 | <p> |
| 293 | Edit the <path>/etc/exports</path> file of the master node to export a work |
293 | Edit the <path>/etc/exports</path> file of the master node to export a work |
| 294 | directory struture (/home is good for this). |
294 | directory structure (/home is good for this). |
| 295 | </p> |
295 | </p> |
| 296 | |
296 | |
| 297 | <pre caption="/etc/exports"> |
297 | <pre caption="/etc/exports"> |
| 298 | /home/ *(rw) |
298 | /home/ *(rw) |
| 299 | </pre> |
299 | </pre> |
| 300 | |
300 | |
| 301 | <p> |
301 | <p> |
| 302 | Add nfs to your master node's default runlevel: |
302 | Add nfs to your master node's default runlevel: |
| 303 | </p> |
303 | </p> |
| 304 | |
304 | |
| 305 | <pre caption="Adding NFS to the default runlevel"> |
305 | <pre caption="Adding NFS to the default runlevel"> |
| 306 | # <i>rc-update add nfs default</i> |
306 | # <i>rc-update add nfs default</i> |
| 307 | </pre> |
307 | </pre> |
| 308 | |
308 | |
| 309 | <p> |
309 | <p> |
| 310 | To mount the nfs exported filesystem from the master, you also have to |
310 | To mount the nfs exported filesystem from the master, you also have to |
| 311 | configure your salve nodes' <path>/etc/fstab</path>. Add a line like this |
311 | configure your salve nodes' <path>/etc/fstab</path>. Add a line like this |
| 312 | one: |
312 | one: |
| 313 | </p> |
313 | </p> |
| 314 | |
314 | |
| 315 | <pre caption="/etc/fstab"> |
315 | <pre caption="/etc/fstab"> |
| 316 | master:/home/ /home nfs rw,exec,noauto,nouser,async 0 0 |
316 | master:/home/ /home nfs rw,exec,noauto,nouser,async 0 0 |
| 317 | </pre> |
317 | </pre> |
| 318 | |
318 | |
| 319 | <p> |
319 | <p> |
| 320 | You'll also need to set up your nodes so that they mount the nfs filesystem by |
320 | You'll also need to set up your nodes so that they mount the nfs filesystem by |
| 321 | issuing this command: |
321 | issuing this command: |
| 322 | </p> |
322 | </p> |
| 323 | |
323 | |
| 324 | <pre caption="Adding nfsmount to the default runlevel"> |
324 | <pre caption="Adding nfsmount to the default runlevel"> |
| 325 | # <i>rc-update add nfsmount default</i> |
325 | # <i>rc-update add nfsmount default</i> |
| 326 | </pre> |
326 | </pre> |
| 327 | |
327 | |
| 328 | </body> |
328 | </body> |
| 329 | </section> |
329 | </section> |
| 330 | <section> |
330 | <section> |
| 331 | <title>RSH/SSH</title> |
331 | <title>RSH/SSH</title> |
| … | |
… | |
| 338 | systems, and the private key which is kept on the local system, is done first |
338 | systems, and the private key which is kept on the local system, is done first |
| 339 | to configure OpenSSH on the cluster. |
339 | to configure OpenSSH on the cluster. |
| 340 | </p> |
340 | </p> |
| 341 | |
341 | |
| 342 | <p> |
342 | <p> |
| 343 | For transparent cluster usage, private/public keys may be used. This process |
343 | For transparent cluster usage, private/public keys may be used. This process |
| 344 | has two steps: |
344 | has two steps: |
| 345 | </p> |
345 | </p> |
| 346 | |
346 | |
| 347 | <ul> |
347 | <ul> |
| 348 | <li>Generate public and private keys</li> |
348 | <li>Generate public and private keys</li> |
| 349 | <li>Copy public key to slave nodes</li> |
349 | <li>Copy public key to slave nodes</li> |
| 350 | </ul> |
350 | </ul> |
| 351 | |
351 | |
| 352 | <p> |
352 | <p> |
| 353 | For user based authentification, generate and copy as follows: |
353 | For user based authentication, generate and copy as follows: |
| 354 | </p> |
354 | </p> |
| 355 | |
355 | |
| 356 | <pre caption="SSH key authentication"> |
356 | <pre caption="SSH key authentication"> |
| 357 | # <i>ssh-keygen -t dsa</i> |
357 | # <i>ssh-keygen -t dsa</i> |
| 358 | Generating public/private dsa key pair. |
358 | Generating public/private dsa key pair. |
| 359 | Enter file in which to save the key (/root/.ssh/id_dsa): /root/.ssh/id_dsa |
359 | Enter file in which to save the key (/root/.ssh/id_dsa): /root/.ssh/id_dsa |
| 360 | Enter passphrase (empty for no passphrase): |
360 | Enter passphrase (empty for no passphrase): |
| 361 | Enter same passphrase again: |
361 | Enter same passphrase again: |
| 362 | Your identification has been saved in /root/.ssh/id_dsa. |
362 | Your identification has been saved in /root/.ssh/id_dsa. |
| 363 | Your public key has been saved in /root/.ssh/id_dsa.pub. |
363 | Your public key has been saved in /root/.ssh/id_dsa.pub. |
| 364 | The key fingerprint is: |
364 | The key fingerprint is: |
| 365 | f1:45:15:40:fd:3c:2d:f7:9f:ea:55:df:76:2f:a4:1f root@master |
365 | f1:45:15:40:fd:3c:2d:f7:9f:ea:55:df:76:2f:a4:1f root@master |
| 366 | |
366 | |
| 367 | <comment>WARNING! If you already have an "authorized_keys" file, |
367 | <comment>WARNING! If you already have an "authorized_keys" file, |
| 368 | please append to it, do not use the following command.</comment> |
368 | please append to it, do not use the following command.</comment> |
| 369 | |
369 | |
| 370 | # <i>scp /root/.ssh/id_dsa.pub node01:/root/.ssh/authorized_keys</i> |
370 | # <i>scp /root/.ssh/id_dsa.pub node01:/root/.ssh/authorized_keys</i> |
| 371 | root@master's password: |
371 | root@master's password: |
| 372 | id_dsa.pub 100% 234 2.0MB/s 00:00 |
372 | id_dsa.pub 100% 234 2.0MB/s 00:00 |
| 373 | |
373 | |
| 374 | # <i>scp /root/.ssh/id_dsa.pub node02:/root/.ssh/authorized_keys</i> |
374 | # <i>scp /root/.ssh/id_dsa.pub node02:/root/.ssh/authorized_keys</i> |
| 375 | root@master's password: |
375 | root@master's password: |
| 376 | id_dsa.pub 100% 234 2.0MB/s 00:00 |
376 | id_dsa.pub 100% 234 2.0MB/s 00:00 |
| 377 | </pre> |
377 | </pre> |
| 378 | |
378 | |
| 379 | <note> |
379 | <note> |
| 380 | Host keys must have an empty passphrase. RSA is required for host-based |
380 | Host keys must have an empty passphrase. RSA is required for host-based |
| 381 | authentification. |
381 | authentication. |
| 382 | </note> |
382 | </note> |
| 383 | |
383 | |
| 384 | <p> |
384 | <p> |
| 385 | For host based authentication, you will also need to edit your |
385 | For host based authentication, you will also need to edit your |
| 386 | <path>/etc/ssh/shosts.equiv</path>. |
386 | <path>/etc/ssh/shosts.equiv</path>. |
| 387 | </p> |
387 | </p> |
| 388 | |
388 | |
| 389 | <pre caption="/etc/ssh/shosts.equiv"> |
389 | <pre caption="/etc/ssh/shosts.equiv"> |
| 390 | node01.adelie |
390 | node01.adelie |
| 391 | node02.adelie |
391 | node02.adelie |
| 392 | master.adelie |
392 | master.adelie |
| 393 | </pre> |
393 | </pre> |
| 394 | |
394 | |
| 395 | <p> |
395 | <p> |
| 396 | And a few modifications to the <path>/etc/ssh/sshd_config</path> file: |
396 | And a few modifications to the <path>/etc/ssh/sshd_config</path> file: |
| … | |
… | |
| 453 | in.rshd:192.168.1.0/255.255.255.0 |
453 | in.rshd:192.168.1.0/255.255.255.0 |
| 454 | </pre> |
454 | </pre> |
| 455 | |
455 | |
| 456 | <p> |
456 | <p> |
| 457 | Or you can simply trust your cluster LAN: |
457 | Or you can simply trust your cluster LAN: |
| 458 | </p> |
458 | </p> |
| 459 | |
459 | |
| 460 | <pre caption="hosts.allow"> |
460 | <pre caption="hosts.allow"> |
| 461 | # Adelie Linux Research & Development Center |
461 | # Adelie Linux Research & Development Center |
| 462 | # /etc/hosts.allow |
462 | # /etc/hosts.allow |
| 463 | |
463 | |
| 464 | ALL:192.168.1.0/255.255.255.0 |
464 | ALL:192.168.1.0/255.255.255.0 |
| 465 | </pre> |
465 | </pre> |
| 466 | |
466 | |
| 467 | <p> |
467 | <p> |
| 468 | Finally, configure host authentification from <path>/etc/hosts.equiv</path>. |
468 | Finally, configure host authentication from <path>/etc/hosts.equiv</path>. |
| 469 | </p> |
469 | </p> |
| 470 | |
470 | |
| 471 | <pre caption="hosts.equiv"> |
471 | <pre caption="hosts.equiv"> |
| 472 | # Adelie Linux Research & Development Center |
472 | # Adelie Linux Research & Development Center |
| 473 | # /etc/hosts.equiv |
473 | # /etc/hosts.equiv |
| 474 | |
474 | |
| 475 | master |
475 | master |
| 476 | node01 |
476 | node01 |
| 477 | node02 |
477 | node02 |
| 478 | </pre> |
478 | </pre> |
| 479 | |
479 | |
| 480 | <p> |
480 | <p> |
| 481 | And, add xinetd to your default runlevel: |
481 | And, add xinetd to your default runlevel: |
| 482 | </p> |
482 | </p> |
| 483 | |
483 | |
| … | |
… | |
| 524 | # - read each of the comments above each of the variable |
524 | # - read each of the comments above each of the variable |
| 525 | |
525 | |
| 526 | # Comment this out if you dont want the init script to warn |
526 | # Comment this out if you dont want the init script to warn |
| 527 | # about not having ntpdate setup |
527 | # about not having ntpdate setup |
| 528 | NTPDATE_WARN="n" |
528 | NTPDATE_WARN="n" |
| 529 | |
529 | |
| 530 | # Command to run to set the clock initially |
530 | # Command to run to set the clock initially |
| 531 | # Most people should just uncomment this line ... |
531 | # Most people should just uncomment this line ... |
| 532 | # however, if you know what you're doing, and you |
532 | # however, if you know what you're doing, and you |
| 533 | # want to use ntpd to set the clock, change this to 'ntpd' |
533 | # want to use ntpd to set the clock, change this to 'ntpd' |
| 534 | NTPDATE_CMD="ntpdate" |
534 | NTPDATE_CMD="ntpdate" |
| 535 | |
535 | |
| 536 | # Options to pass to the above command |
536 | # Options to pass to the above command |
| 537 | # Most people should just uncomment this variable and |
537 | # Most people should just uncomment this variable and |
| 538 | # change 'someserver' to a valid hostname which you |
538 | # change 'someserver' to a valid hostname which you |
| 539 | # can aquire from the URL's below |
539 | # can acquire from the URL's below |
| 540 | NTPDATE_OPTS="-b ntp1.cmc.ec.gc.ca" |
540 | NTPDATE_OPTS="-b ntp1.cmc.ec.gc.ca" |
| 541 | |
541 | |
| 542 | ## |
542 | ## |
| 543 | # A list of available servers is available here: |
543 | # A list of available servers is available here: |
| 544 | # http://www.eecis.udel.edu/~mills/ntp/servers.html |
544 | # http://www.eecis.udel.edu/~mills/ntp/servers.html |
| 545 | # Please follow the rules of engagement and use a |
545 | # Please follow the rules of engagement and use a |
| 546 | # Stratum 2 server (unless you qualify for Stratum 1) |
546 | # Stratum 2 server (unless you qualify for Stratum 1) |
| 547 | ## |
547 | ## |
| 548 | |
548 | |
| 549 | # Options to pass to the ntpd process that will *always* be run |
549 | # Options to pass to the ntpd process that will *always* be run |
| 550 | # Most people should not uncomment this line ... |
550 | # Most people should not uncomment this line ... |
| 551 | # however, if you know what you're doing, feel free to tweak |
551 | # however, if you know what you're doing, feel free to tweak |
| 552 | #NTPD_OPTS="" |
552 | #NTPD_OPTS="" |
| 553 | |
553 | |
| 554 | </pre> |
554 | </pre> |
| … | |
… | |
| 645 | CONFIG_IP_NF_MATCH_STATE=y |
645 | CONFIG_IP_NF_MATCH_STATE=y |
| 646 | CONFIG_IP_NF_FILTER=y |
646 | CONFIG_IP_NF_FILTER=y |
| 647 | CONFIG_IP_NF_TARGET_REJECT=y |
647 | CONFIG_IP_NF_TARGET_REJECT=y |
| 648 | CONFIG_IP_NF_NAT=y |
648 | CONFIG_IP_NF_NAT=y |
| 649 | CONFIG_IP_NF_NAT_NEEDED=y |
649 | CONFIG_IP_NF_NAT_NEEDED=y |
| 650 | CONFIG_IP_NF_TARGET_MASQUERADE=y |
650 | CONFIG_IP_NF_TARGET_MASQUERADE=y |
| 651 | CONFIG_IP_NF_TARGET_LOG=y |
651 | CONFIG_IP_NF_TARGET_LOG=y |
| 652 | </pre> |
652 | </pre> |
| 653 | |
653 | |
| 654 | <p> |
654 | <p> |
| 655 | And the rules required for this firewall: |
655 | And the rules required for this firewall: |
| 656 | </p> |
656 | </p> |
| 657 | |
657 | |
| 658 | <pre caption="rule-save"> |
658 | <pre caption="rule-save"> |
| 659 | # Adelie Linux Research & Development Center |
659 | # Adelie Linux Research & Development Center |
| 660 | # /var/lib/iptbles/rule-save |
660 | # /var/lib/iptables/rule-save |
| 661 | |
661 | |
| 662 | *filter |
662 | *filter |
| 663 | :INPUT ACCEPT [0:0] |
663 | :INPUT ACCEPT [0:0] |
| 664 | :FORWARD ACCEPT [0:0] |
664 | :FORWARD ACCEPT [0:0] |
| 665 | :OUTPUT ACCEPT [0:0] |
665 | :OUTPUT ACCEPT [0:0] |
| 666 | -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT |
666 | -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT |
| 667 | -A INPUT -p tcp -m tcp --dport 22 -j ACCEPT |
667 | -A INPUT -p tcp -m tcp --dport 22 -j ACCEPT |
| 668 | -A INPUT -s 192.168.1.0/255.255.255.0 -i eth1 -j ACCEPT |
668 | -A INPUT -s 192.168.1.0/255.255.255.0 -i eth1 -j ACCEPT |
| 669 | -A INPUT -s 127.0.0.1 -i lo -j ACCEPT |
669 | -A INPUT -s 127.0.0.1 -i lo -j ACCEPT |
| 670 | -A INPUT -p icmp -j ACCEPT |
670 | -A INPUT -p icmp -j ACCEPT |
| 671 | -A INPUT -j LOG |
671 | -A INPUT -j LOG |
| 672 | -A INPUT -j REJECT --reject-with icmp-port-unreachable |
672 | -A INPUT -j REJECT --reject-with icmp-port-unreachable |
| 673 | COMMIT |
673 | COMMIT |
| 674 | *nat |
674 | *nat |
| 675 | :PREROUTING ACCEPT [0:0] |
675 | :PREROUTING ACCEPT [0:0] |
| … | |
… | |
| 708 | <pre caption="Installing openpbs"> |
708 | <pre caption="Installing openpbs"> |
| 709 | # <i>emerge -p openpbs</i> |
709 | # <i>emerge -p openpbs</i> |
| 710 | </pre> |
710 | </pre> |
| 711 | |
711 | |
| 712 | <note> |
712 | <note> |
| 713 | OpenPBS ebuild does not currently set proper permissions on var-directories |
713 | OpenPBS ebuild does not currently set proper permissions on var-directories |
| 714 | used by OpenPBS. |
714 | used by OpenPBS. |
| 715 | </note> |
715 | </note> |
| 716 | |
716 | |
| 717 | <p> |
717 | <p> |
| 718 | Before starting using OpenPBS, some configurations are required. The files |
718 | Before starting using OpenPBS, some configurations are required. The files |
| 719 | you will need to personalize for your system are: |
719 | you will need to personalize for your system are: |
| 720 | </p> |
720 | </p> |
| 721 | |
721 | |
| 722 | <ul> |
722 | <ul> |
| 723 | <li>/etc/pbs_environment</li> |
723 | <li>/etc/pbs_environment</li> |
| 724 | <li>/var/spool/PBS/server_name</li> |
724 | <li>/var/spool/PBS/server_name</li> |
| 725 | <li>/var/spool/PBS/server_priv/nodes</li> |
725 | <li>/var/spool/PBS/server_priv/nodes</li> |
| 726 | <li>/var/spool/PBS/mom_priv/config</li> |
726 | <li>/var/spool/PBS/mom_priv/config</li> |
| 727 | <li>/var/spool/PBS/sched_priv/sched_config</li> |
727 | <li>/var/spool/PBS/sched_priv/sched_config</li> |
| 728 | </ul> |
728 | </ul> |
| 729 | |
729 | |
| 730 | <p> |
730 | <p> |
| 731 | Here is a sample sched_config: |
731 | Here is a sample sched_config: |
| 732 | </p> |
732 | </p> |
| 733 | |
733 | |
| 734 | <pre caption="/var/spool/PBS/sched_priv/sched_config"> |
734 | <pre caption="/var/spool/PBS/sched_priv/sched_config"> |
| 735 | # |
735 | # |
| 736 | # Create queues and set their attributes. |
736 | # Create queues and set their attributes. |
| 737 | # |
737 | # |
| 738 | # |
738 | # |
| 739 | # Create and define queue upto4nodes |
739 | # Create and define queue upto4nodes |
| 740 | # |
740 | # |
| 741 | create queue upto4nodes |
741 | create queue upto4nodes |
| 742 | set queue upto4nodes queue_type = Execution |
742 | set queue upto4nodes queue_type = Execution |
| … | |
… | |
| 758 | # |
758 | # |
| 759 | set server scheduling = True |
759 | set server scheduling = True |
| 760 | set server acl_host_enable = True |
760 | set server acl_host_enable = True |
| 761 | set server default_queue = default |
761 | set server default_queue = default |
| 762 | set server log_events = 511 |
762 | set server log_events = 511 |
| 763 | set server mail_from = adm |
763 | set server mail_from = adm |
| 764 | set server query_other_jobs = True |
764 | set server query_other_jobs = True |
| 765 | set server resources_default.neednodes = 1 |
765 | set server resources_default.neednodes = 1 |
| 766 | set server resources_default.nodect = 1 |
766 | set server resources_default.nodect = 1 |
| 767 | set server resources_default.nodes = 1 |
767 | set server resources_default.nodes = 1 |
| 768 | set server scheduler_iteration = 60 |
768 | set server scheduler_iteration = 60 |
| 769 | </pre> |
769 | </pre> |
| 770 | |
770 | |
| 771 | <p> |
771 | <p> |
| 772 | To submit a task to OpenPBS, the command <c>qsub</c> is used with some |
772 | To submit a task to OpenPBS, the command <c>qsub</c> is used with some |
| 773 | optional parameters. In the exemple below, "-l" allows you to specify |
773 | optional parameters. In the example below, "-l" allows you to specify |
| 774 | the resources required, "-j" provides for redirection of standard out and |
774 | the resources required, "-j" provides for redirection of standard out and |
| 775 | standard error, and the "-m" will e-mail the user at begining (b), end (e) |
775 | standard error, and the "-m" will e-mail the user at beginning (b), end (e) |
| 776 | and on abort (a) of the job. |
776 | and on abort (a) of the job. |
| 777 | </p> |
777 | </p> |
| 778 | |
778 | |
| 779 | <pre caption="Submitting a task"> |
779 | <pre caption="Submitting a task"> |
| 780 | <comment>(submit and request from OpenPBS that myscript be executed on 2 nodes)</comment> |
780 | <comment>(submit and request from OpenPBS that myscript be executed on 2 nodes)</comment> |
| 781 | # <i>qsub -l nodes=2 -j oe -m abe myscript</i> |
781 | # <i>qsub -l nodes=2 -j oe -m abe myscript</i> |
| 782 | </pre> |
782 | </pre> |
| 783 | |
783 | |
| 784 | <p> |
784 | <p> |
| 785 | Normally jobs submitted to OpenPBS are in the form of scripts. Sometimes, you |
785 | Normally jobs submitted to OpenPBS are in the form of scripts. Sometimes, you |
| 786 | may want to try a task manually. To request an interactive shell from OpenPBS, |
786 | may want to try a task manually. To request an interactive shell from OpenPBS, |
| 787 | use the "-I" parameter. |
787 | use the "-I" parameter. |
| 788 | </p> |
788 | </p> |
| 789 | |
789 | |
| 790 | <pre caption="Requesting an interactive shell"> |
790 | <pre caption="Requesting an interactive shell"> |
| … | |
… | |
| 821 | installed, while <e>crypt</e> will configure MPICH to use <c>ssh</c> instead |
821 | installed, while <e>crypt</e> will configure MPICH to use <c>ssh</c> instead |
| 822 | of <c>rsh</c>. |
822 | of <c>rsh</c>. |
| 823 | </p> |
823 | </p> |
| 824 | |
824 | |
| 825 | <pre caption="Installing the mpich application"> |
825 | <pre caption="Installing the mpich application"> |
| 826 | # <i>emerge -p mpich</i> |
826 | # <i>emerge -p mpich</i> |
| 827 | # <i>emerge mpich</i> |
827 | # <i>emerge mpich</i> |
| 828 | </pre> |
828 | </pre> |
| 829 | |
829 | |
| 830 | <p> |
830 | <p> |
| 831 | You may need to export a mpich work directory to all your slave nodes in |
831 | You may need to export a mpich work directory to all your slave nodes in |
| 832 | <path>/etc/exports</path>: |
832 | <path>/etc/exports</path>: |
| 833 | </p> |
833 | </p> |
| 834 | |
834 | |
| 835 | <pre caption="/etc/exports"> |
835 | <pre caption="/etc/exports"> |
| 836 | /home *(rw) |
836 | /home *(rw) |
| 837 | </pre> |
837 | </pre> |
| 838 | |
838 | |
| 839 | <p> |
839 | <p> |
| 840 | Most massively parallel processors (MPPs) provide a way to start a program on |
840 | Most massively parallel processors (MPPs) provide a way to start a program on |
| 841 | a requested number of processors; <c>mpirun</c> makes use of the appropriate |
841 | a requested number of processors; <c>mpirun</c> makes use of the appropriate |
| 842 | command whenever possible. In contrast, workstation clusters require that each |
842 | command whenever possible. In contrast, workstation clusters require that each |
| 843 | process in a parallel job be started individually, though programs to help |
843 | process in a parallel job be started individually, though programs to help |
| 844 | start these processes exist. Because workstation clusters are not already |
844 | start these processes exist. Because workstation clusters are not already |
| 845 | organized as an MPP, additional information is required to make use of them. |
845 | organized as an MPP, additional information is required to make use of them. |
| 846 | Mpich should be installed with a list of participating workstations in the |
846 | Mpich should be installed with a list of participating workstations in the |
| 847 | file <path>machines.LINUX</path> in the directory |
847 | file <path>machines.LINUX</path> in the directory |
| 848 | <path>/usr/share/mpich/</path>. This file is used by <c>mpirun</c> to choose |
848 | <path>/usr/share/mpich/</path>. This file is used by <c>mpirun</c> to choose |
| 849 | processors to run on. |
849 | processors to run on. |
| 850 | </p> |
850 | </p> |
| 851 | |
851 | |
| … | |
… | |
| 905 | <pre caption="Output of the above command"> |
905 | <pre caption="Output of the above command"> |
| 906 | Trying true on host1.uoffoo.edu ... |
906 | Trying true on host1.uoffoo.edu ... |
| 907 | Trying true on host2.uoffoo.edu ... |
907 | Trying true on host2.uoffoo.edu ... |
| 908 | Trying ls on host1.uoffoo.edu ... |
908 | Trying ls on host1.uoffoo.edu ... |
| 909 | Trying ls on host2.uoffoo.edu ... |
909 | Trying ls on host2.uoffoo.edu ... |
| 910 | Trying user program on host1.uoffoo.edu ... |
910 | Trying user program on host1.uoffoo.edu ... |
| 911 | Trying user program on host2.uoffoo.edu ... |
911 | Trying user program on host2.uoffoo.edu ... |
| 912 | </pre> |
912 | </pre> |
| 913 | |
913 | |
| 914 | <p> |
914 | <p> |
| 915 | If <c>tstmachines</c> finds a problem, it will suggest possible reasons and |
915 | If <c>tstmachines</c> finds a problem, it will suggest possible reasons and |
| 916 | solutions. In brief, there are three tests: |
916 | solutions. In brief, there are three tests: |
| 917 | </p> |
917 | </p> |
| 918 | |
918 | |
| 919 | <ul> |
919 | <ul> |
| 920 | <li> |
920 | <li> |
| 921 | <e>Can processes be started on remote machines?</e> tstmachines attempts |
921 | <e>Can processes be started on remote machines?</e> tstmachines attempts |
| 922 | to run the shell command true on each machine in the machines files by |
922 | to run the shell command true on each machine in the machines files by |
| 923 | using the remote shell command. |
923 | using the remote shell command. |
| 924 | </li> |
924 | </li> |
| 925 | <li> |
925 | <li> |
| 926 | <e>Is current working directory available to all machines?</e> This |
926 | <e>Is current working directory available to all machines?</e> This |
| 927 | attempts to ls a file that tstmachines creates by running ls using the |
927 | attempts to ls a file that tstmachines creates by running ls using the |
| 928 | remote shell command. |
928 | remote shell command. |
| 929 | </li> |
929 | </li> |
| 930 | <li> |
930 | <li> |
| 931 | <e>Can user programs be run on remote systems?</e> This checks that shared |
931 | <e>Can user programs be run on remote systems?</e> This checks that shared |
| 932 | libraries and other components have been properly installed on all |
932 | libraries and other components have been properly installed on all |
| 933 | machines. |
933 | machines. |
| 934 | </li> |
934 | </li> |
| 935 | </ul> |
935 | </ul> |
| 936 | |
936 | |
| 937 | <p> |
937 | <p> |
| 938 | And the required test for every development tool: |
938 | And the required test for every development tool: |
| 939 | </p> |
939 | </p> |
| 940 | |
940 | |
| 941 | <pre caption="Testing a development tool"> |
941 | <pre caption="Testing a development tool"> |
| 942 | # <i>cd ~</i> |
942 | # <i>cd ~</i> |
| 943 | # <i>cp /usr/share/mpich/examples1/hello++.c ~</i> |
943 | # <i>cp /usr/share/mpich/examples1/hello++.c ~</i> |
| 944 | # <i>make hello++</i> |
944 | # <i>make hello++</i> |
| 945 | # <i>mpirun -machinefile /usr/share/mpich/machines.LINUX -np 1 hello++</i> |
945 | # <i>mpirun -machinefile /usr/share/mpich/machines.LINUX -np 1 hello++</i> |
| … | |
… | |
| 976 | |
976 | |
| 977 | <chapter> |
977 | <chapter> |
| 978 | <title>Bibliography</title> |
978 | <title>Bibliography</title> |
| 979 | <section> |
979 | <section> |
| 980 | <body> |
980 | <body> |
| 981 | |
981 | |
| 982 | <p> |
982 | <p> |
| 983 | The original document is published at the <uri |
983 | The original document is published at the <uri |
| 984 | link="http://www.adelielinux.com">Adelie Linux R&D Centre</uri> web site, |
984 | link="http://www.adelielinux.com">Adelie Linux R&D Centre</uri> web site, |
| 985 | and is reproduced here with the permission of the authors and <uri |
985 | and is reproduced here with the permission of the authors and <uri |
| 986 | link="http://www.cyberlogic.ca">Cyberlogic</uri>'s Adelie Linux R&D |
986 | link="http://www.cyberlogic.ca">Cyberlogic</uri>'s Adelie Linux R&D |
| 987 | Centre. |
987 | Centre. |
| 988 | </p> |
988 | </p> |
| 989 | |
989 | |
| 990 | <ul> |
990 | <ul> |
| 991 | <li><uri>http://www.gentoo.org</uri>, Gentoo Technologies, Inc.</li> |
991 | <li><uri>http://www.gentoo.org</uri>, Gentoo Technologies, Inc.</li> |
| 992 | <li> |
992 | <li> |
| 993 | <uri link="http://www.adelielinux.com">http://www.adelielinux.com</uri>, |
993 | <uri link="http://www.adelielinux.com">http://www.adelielinux.com</uri>, |
| 994 | Adelie Linux Research and Development Centre |
994 | Adelie Linux Research and Development Centre |
| 995 | </li> |
995 | </li> |
| 996 | <li> |
996 | <li> |
| 997 | <uri link="http://nfs.sourceforge.net/">http://nfs.sourceforge.net</uri>, |
997 | <uri link="http://nfs.sourceforge.net/">http://nfs.sourceforge.net</uri>, |
| 998 | Linux NFS Project |
998 | Linux NFS Project |
| 999 | </li> |
999 | </li> |
| 1000 | <li> |
1000 | <li> |
| 1001 | <uri link="http://www-unix.mcs.anl.gov/mpi/mpich/">http://www-unix.mcs.anl.gov/mpi/mpich/</uri>, |
1001 | <uri link="http://www-unix.mcs.anl.gov/mpi/mpich/">http://www-unix.mcs.anl.gov/mpi/mpich/</uri>, |
| 1002 | Mathematics and Computer Science Division, Argonne National Laboratory |
1002 | Mathematics and Computer Science Division, Argonne National Laboratory |
| 1003 | </li> |
1003 | </li> |
| 1004 | <li> |
1004 | <li> |
| 1005 | <uri link="http://www.ntp.org/">http://ntp.org</uri> |
1005 | <uri link="http://www.ntp.org/">http://ntp.org</uri> |
| 1006 | </li> |
1006 | </li> |
| 1007 | <li> |
1007 | <li> |
| 1008 | <uri link="http://www.eecis.udel.edu/~mills/">http://www.eecis.udel.edu/~mills/</uri>, |
1008 | <uri link="http://www.eecis.udel.edu/~mills/">http://www.eecis.udel.edu/~mills/</uri>, |
| 1009 | David L. Mills, University of Delaware |
1009 | David L. Mills, University of Delaware |
| 1010 | </li> |
1010 | </li> |
| 1011 | <li> |
1011 | <li> |
| 1012 | <uri link="http://www.ietf.org/html.charters/secsh-charter.html">http://www.ietf.org/html.charters/secsh-charter.html</uri>, |
1012 | <uri link="http://www.ietf.org/html.charters/secsh-charter.html">http://www.ietf.org/html.charters/secsh-charter.html</uri>, |
| 1013 | Secure Shell Working Group, IETF, Internet Society |
1013 | Secure Shell Working Group, IETF, Internet Society |
| 1014 | </li> |
1014 | </li> |
| 1015 | <li> |
1015 | <li> |
| 1016 | <uri link="http://www.linuxsecurity.com/">http://www.linuxsecurity.com/</uri>, |
1016 | <uri link="http://www.linuxsecurity.com/">http://www.linuxsecurity.com/</uri>, |
| 1017 | Guardian Digital |
1017 | Guardian Digital |
| 1018 | </li> |
1018 | </li> |
| 1019 | <li> |
1019 | <li> |
| 1020 | <uri link="http://www.openpbs.org/">http://www.openpbs.org/</uri>, |
1020 | <uri link="http://www.openpbs.org/">http://www.openpbs.org/</uri>, |
| 1021 | Altair Grid Technologies, LLC. |
1021 | Altair Grid Technologies, LLC. |
| 1022 | </li> |
1022 | </li> |
| 1023 | </ul> |
1023 | </ul> |
| 1024 | |
1024 | |
| 1025 | </body> |
1025 | </body> |
| 1026 | </section> |
1026 | </section> |
| 1027 | </chapter> |
1027 | </chapter> |
| 1028 | |
1028 | |
| 1029 | </guide> |
1029 | </guide> |