Contents of /xml/htdocs/doc/en/multipath.xml

Parent Directory Parent Directory | Revision Log Revision Log

Revision 1.1 - (hide annotations) (download) (as text)
Wed Sep 10 23:14:34 2008 UTC (9 years, 7 months ago) by nightmorph
Branch: MAIN
File MIME type: application/xml
Added new multipath guide from bug 233527. thanks to tsunam and all the guys from liquidustech who assembled the thing. and to rane for some xml cleanups. added to the sysadmin_specific documentation category.

1 nightmorph 1.1 <?xml version='1.0' encoding="UTF-8"?>
2     <!-- $Header: $ -->
3     <!DOCTYPE guide SYSTEM "/dtd/guide.dtd">
5     <guide>
6     <title>Multipathing for Gentoo</title>
8     <author title="Author">
9     <mail link="tsunam"/>
10     </author>
11     <author title="Author">
12     <mail link="matthew.summers@liquidustech.com">Matthew Summers</mail>
13     </author>
14     <author title="Author">
15     <mail link="richard.anderson@liquidustech.com">Richard Anderson</mail>
16     </author>
17     <author title="Author">
18     <mail link="steve.rucker@liquidustech.com">Steve Rucker</mail>
19     </author>
20     <author title="Editor">
21     <mail link="nightmorph"/>
22     </author>
24     <abstract>
25     This document teaches you how to set up multipathing services for data storage.
26     </abstract>
28     <!-- The content of this document is licensed under the CC-BY-SA license -->
29     <!-- See http://creativecommons.org/licenses/by-sa/2.5 -->
30     <license/>
32     <version>1</version>
33     <date>2008-09-10</date>
35     <chapter>
36     <title>Introduction</title>
37     <section>
38     <body>
40     <p>
41     Multipathing services, generally deployed in enterprise environments, provide a
42     means for high performance, load-balanced, and fault-tolerant data storage
43     either locally or via a storage area network (SAN). Multipathing facilitates a
44     single storage device to be transparently accessed across one or more paths.
45     For example, if there are two connections from a server Host Bus Adapter (HBA)
46     to two Fibre Channel switches and then to a SAN, when the HBA module loads and
47     scans the bus, it will read four paths to the SAN: the paths from the server HBA
48     to and from each Fibre Channel switch and at the storage device. Taking
49     advantage of this situation, Multipath allows you to make use of each path
50     simultaneously or independently to ensure a constant and reliable connection to
51     the data in storage. Multipath serves as a failover for all connections points
52     in the event of losing one path making critical data always available due to
53     redundancy in the design and implementation.
54     </p>
56     <p>
57     In the most basic sense, multipathing is made of two distinct parts:
58     <c>device-mapper</c> and <c>multipath-tools</c>. <b>Device Mapper</b> is the
59     first key element of this application. Administrators are probably familiar with
60     Device Mapper from LVM, EVMS, dm-crypt, or in this case, Multipath. In short,
61     working within the kernel space Device Mapper takes one block device such as
62     <path>/dev/sda</path> (as all SAN based targets will be some type of SCSI
63     device) and maps it to another device.
64     </p>
66     <p>
67     On a lower level, Device Mapper creates a virtual block device accepting all of
68     the commands of a regular block device, but passes on the actual data to the
69     real block device. As previously stated, the mapping process is all handled in
70     the kernel space and not in user space.
71     </p>
73     <p>
74     <b>Multipath Tools</b> is a set of userspace tools that interacts with the
75     Device Mapper tools and creates structures for device handling, implementing I/O
76     multipathing at the OS level. In a typical SAN environment, you will have
77     multiple paths to the same storage device: a fiber card (or two) on your server
78     that connects to a switch which then connects to the actual storage itself (as
79     in the scenario discussed above). So administrators could possibly see the same
80     device one to four times in such a situation (each card will see the LUN twice,
81     once for each path it has available to it). Thus, a single drive could be
82     recognized as <path>sda</path>, <path>sdb</path>, <path>sdc</path>, and
83     <path>sdd</path>. If you were to mount <path>/dev/sda</path> to
84     <path>/san1</path>, for instance, you would be going over the singular path from
85     one fiber card to a switch and then to a port on the same storage device. If any
86     of those points were to fail, you would lose your storage device suddenly and
87     have to unmount and remount with another device (<path>sdb</path>).
88     </p>
90     <p>
91     Consequently, this scenario is not ideal as you are only using one out of the
92     four possible paths. This is where the combination of Multipath tools and Device
93     Mapper are beneficial. As already explained, Device Mapper creates virtual block
94     devices and then passes information to the real block devices.
95     </p>
97     </body>
98     </section>
99     </chapter>
101     <chapter>
102     <title>Installation and Configuration</title>
103     <section>
104     <title>Installation and Tools</title>
105     <body>
107     <p>
108     Tou need to emerge <c>multipath-tools</c> and <c>sg3_utils</c>. On the disk, you
109     want to find the <c>wwid</c>. You can use <c>sq_vpd</c> (provided by
110     <c>sg3_utils</c>) to do this.
111     </p>
113     <pre caption="Installing multipath-tools and initial configuration">
114     # <i>emerge multipath-tools sg3_utils</i>
115     <comment>(Replace /dev/DEVICE with your disk to find its wwid)</comment>
116     # <i>/usr/bin/sq_vpd ?page=di /dev/DEVICE</i>
117     </pre>
119     <p>
120     Where DEVICE is the sd device, the ID will come back with a <c>0x6</c>. Replace
121     <c>0x</c> with <c>3</c>, and you will have the proper ID that you'll put into
122     the multipath <c>wwid</c> in <path>/etc/multipath.conf</path>. More on this in
123     the next chapter.
124     </p>
126     </body>
127     </section>
128     <section>
129     <title>Configuring Gentoo for multipathing</title>
130     <body>
132     <p>
133     To configure Gentoo for multipath, your kernel needs the following settings:
134     </p>
136     <pre caption="Adding multipath support">
137     Device Drivers --->
138     SCSI device support --->
139     &lt;*&gt; SCSI target support
140     &lt;*&gt; SCSI disk support
141     [*] Probe all LUNs on each SCSI device
142     [*] Multiple devices driver support (RAID and LVM) --->
143     &lt;*&gt; Multipath I/O support
144     &lt;*&gt; Device mapper support
145     &lt;*&gt; Multipath target
146     <comment>(Select your device from the list)</comment>)
147     &lt;*&gt; EMC CX/AX multipath support
148     &lt;*&gt; LSI/Engenio RDAC multipath support
149     &lt;*&gt; HP MSA multipath support
150     </pre>
152     <note>
153     <c>scsi_id</c> is done by targets. IDE drives have two spots to which you can
154     connect. An administrator has the ability to set a drive as a master and another
155     drive as a slave or set to autoselect by changing the dip switches. scsi_id is
156     similar. Each drive or Logical Unit Number (LUN) has a unique ID, which ranges
157     from 0 to 254. A device that has ID 0 will be discovered before a device that
158     has, for example, ID 120, because it performs a LIP (a scan of the SCSI bus for
159     devices that respond) that starts from 0 and works its way upwards.
160     </note>
162     <p>
163     In the kernel menu config, make sure CONFIG_SCSI_MULTI_LUN=y is set to ensure
164     the SCSI subsystem is able to probe all Logical Unit Numbers (LUNs) (This is
165     recommended as you'll stop scanning after ID 0 if you have a device on an ID of
166     <c>0</c> but not <c>1</c> and then on an ID of <c>2</c>. Simply, you'll get your
167     device for ID <c>0</c> but not <c>2</c>.) or whichever device you need for SCSI,
168     such as a QLogic 2400 card, which is in the SCSI low-level drivers area.
169     </p>
171     <p>
172     For a better understanding, consider the following scenarios:
173     </p>
175     <p>
176     There are three drives with IDs of 0,1,2. Without the "probe all LUNs" setting,
177     you will see IDs 0,1,2 as sda,sdb,sdc - all devices are seen. If you delete the
178     ID 1 drive. IDs 0,2 will still be seen. It might seem to make sense that you
179     would see sda and sdb now (sdc would move to sdb as there is no device to fill
180     it up). However, if you don't probe all LUNs, it will perform in the following
181     manner:
182     </p>
184     <p>
185     Scenario 1: Without "probe all LUNs", the scan will start and ID 0 will be seen.
186     ID 0 will be set to sda and then move to find ID 1. If ID 1 is not detected,
187     scanning will stop and be considered complete having perceived to have scanned
188     all devices even if there is a device on ID 2 or any other subsequent ID. Reboot
189     for scenario two.
190     </p>
192     <p>
193     Scenario 2: If you have "probe all LUNs", the scan will start and detect ID 0.
194     This ID will be assigned sda and will continue to detect the next device. If ID
195     1 is not detected, scanning will continue to find more devices. ID 2 will be
196     located and assigned to be sdb. If no devices (IDs) are detected beyond that,
197     scanning will be considered complete.
198     </p>
200     <note>
201     Although it seems that it is unfeasible or even unnecessary to have devices
202     spaced many LUNs apart, to account for all options it is necessary to still
203     probe all LUNs. An administrator will encounter many reasons (business or
204     personal) for such a setup. Therefore, the second scenario would be optimal to
205     ensure that all devices are recognized and assigned an ID in the multipath setup
206     process.
207     </note>
209     <p>
210     So, once you probe all LUNs, all devices will be recognized and assigned an ID
211     in Multipath.
212     </p>
214     </body>
215     </section>
216     </chapter>
218     <chapter>
219     <title>Architectural Overview</title>
220     <section>
221     <body>
223     <p>
224     As part of Multipath Tools, there are priority groups filled with the devices
225     mentioned earlier. After you have configured <c>multipath-tools</c> and started
226     it with <c>/etc/init.d/multipath start</c>, you can list the groups via
227     <c>multipath -l</c>. The output will look like the following:
228     </p>
230     <pre caption="multipath -l output">
231     EVA_SAN (3600508b4001044ee00013000031e0000)
232     [size=300 GB][features="1 queue_if_no_path"][hwhandler="0"]
233     \_ round-robin 0 [active]
234     \_ 0:0:0:1 sda 8:0 [active]
235     \_ round-robin 0 [enabled]
236     \_ 0:0:1:1 sdb 8:16 [active]
238     EVA_SAN2 (3600508b4001044ee0001300003880000)
239     [size=300 GB][features="1 queue_if_no_path"][hwhandler="0"]
240     \_ round-robin 0 [active]
241     \_ 0:0:0:2 sdc 8:32 [active]
242     \_ round-robin 0 [enabled]
243     \_ 0:0:1:2 sdd 8:48 [active]
244     </pre>
246     <p>
247     By default, it will pick the first priority group (the first top round-robin for
248     the EVA_SAN2, for instance, being <path>sdc</path>). In this instance, due to
249     round robin it will bounce back and forth. But if one path was to fail, it would
250     push all information to the other path and continue. Only if all the devices in
251     a path fail will it actually fail and go to the secondary priority group.
252     </p>
254     </body>
255     </section>
256     <section>
257     <title>Typical Configuration</title>
258     <body>
260     <p>
261     A typical Multipath configuration looks like the following:
262     </p>
264     <pre caption="A typical /etc/multipath.conf file">
265     defaults {
266     udev_dir /dev
267     polling_interval 15
268     selector "round-robin 0"
269     path_grouping_policy group_by_prio
270     failback 5
271     path_checker tur
272     prio_callout "/sbin/mpath_prio_tpc /dev/%n"
273     rr_min_io 100
274     rr_weight uniform
275     no_path_retry queue
276     user_friendly_names yes
277     }
278     blacklist {
279     devnode cciss
280     devnode fd
281     devnode hd
282     devnode md
283     devnode sr
284     devnode scd
285     devnode st
286     devnode ram
287     devnode raw
288     devnode loop
289     devnode sda
290     }
292     multipaths {
293     multipath {
294     wwid
295     <comment>(To find your wwid, please use /usr/bin/sq_vpd ?page=di /dev/DEVICE.
296     The address will be a 0x6. Remove the 0x and replace it with 3.)</comment>
297     alias DB_SAN
298     }
299     devices {
300     device {
301     <comment>(White spacing is important on these two items to match the vendor specifications.)</comment>
302     "IBM "
303     "1815 FAStT "
304     }
305     }
306     }
307     </pre>
309     <impo>
310     On your devices, it is best to <c>cat</c>
311     <path>/sys/block/sd(device)/device/model</path> and <c>cat</c>
312     <path>/sys/block/device/sd(device)/device/vendor</path>, placing both directly
313     into your devices section in <path>/etc/multipath.conf</path>. You might not
314     always see the white spacing, and it's part of the name in this case. One reason
315     for the device section is that not every vendor's string is in the kernel
316     convention and naming, and the string, as such, is not always detected as
317     required.
318     </impo>
320     <p>
321     A typical multipath configuration utilizing an EVA_SAN where the device
322     information is in the kernel information regarding SAN hardware detection would
323     look like:
324     </p>
326     <pre caption="EVA_SAN configuration">
327     multipaths {
328     multipath {
329     wwid 3600508b4001044ee00013000031e0000
330     alias EVA_SAN
331     }
332     multipath {
333     wwid 3600508b4001044ee0001300003880000
334     alias EVA_SAN2
335     }
336     }
337     </pre>
339     </body>
340     </section>
341     </chapter>
343     <chapter>
344     <title>Setting Up Your Own Configuration</title>
345     <section>
346     <body>
348     <p>
349     The multipath configuration is fairly simple to accomplish because the only file
350     that needs modification is <path>/etc/multipath.conf</path>.
351     </p>
353     <p>
354     To begin, set the <b>polling interview</b> to how often (in seconds) path checks
355     will be performed to ensure that the path is alive and healthy.
356     </p>
358     <p>
359     <b>selector</b> will be set at <c>"round-robin 0"</c>.
360     </p>
362     <note>
363     This round-robin value is the only selector value that will be used in this
364     configuration.
365     </note>
367     <p>
368     <b>prio_callout</b>: This one can be quite important, and there are a number of
369     different priorities for different devices, such as:
370     </p>
372     <ul>
373     <li>mpath_prio_alua</li>
374     <li>mpath_prio_emc</li>
375     <li>mpath_prio_hds_modular</li>
376     <li>mpath_prio_netapp</li>
377     <li>mpath_prio_tpc</li>
378     </ul>
380     <note>
381     For most people, <c>mpath_prio_tpc</c> will suffice as it's a conservative
382     checker. Other devices like <c>mpath_prio_netapp</c> have special functionality
383     for priority grouping, such as netapps.
384     </note>
386     <p>
387     <b>path_grouping_policy</b> has a few different options: failover, multibus,
388     group_by_prio. <c>Failover</c> will only have one disk per priority group.
389     <c>Multibus</c> will put all devices into one priority group.
390     <c>Group_by_prio</c> is done by a "priority value." So routes that have the same
391     priority value will be grouped together, the priority values being determined by
392     the callout.
393     </p>
395     <p>
396     <b>no_path_retry</b> is set to <c>queue</c> as most people don't want data to
397     fail to send at all. So, if all paths fail, for instance, the I/Os will queue up
398     until the device returns and then sends everything again. Depending on your
399     transfer, this can cause load issues.
400     </p>
402     <p>
403     <b>rr_min_io</b> are the number of I/Os to do per path before switching to the
404     next I/Os in the same group. If <path>sda</path> and <path>sdb</path> were in
405     the same group, rr_min_io would do 100 I/Os to <path>sda</path> then do 100 to
406     <path>sdb</path>, bouncing back and forth. This is a setting to tweak for each
407     instance to maximize performance because the data load and size of
408     transfers/request vary by company. The default in the case is <c>1000</c>, but
409     some may prefer a smaller number in order to switch ports more often, when
410     possible.
411     </p>
413     <p>
414     <b>user_friendly_names</b> make it easier to see which device you are working
415     with. For example, if you set user_friendly_names to <c>no</c>, then you'll see
416     WWID instead of EVA_SAN for your device.
417     </p>
419     </body>
420     </section>
421     </chapter>
422     </guide>

  ViewVC Help
Powered by ViewVC 1.1.20