Contents of /xml/htdocs/doc/en/multipath.xml

Parent Directory Parent Directory | Revision Log Revision Log

Revision 1.1 - (show annotations) (download) (as text)
Wed Sep 10 23:14:34 2008 UTC (6 years, 3 months ago) by nightmorph
Branch: MAIN
File MIME type: application/xml
Added new multipath guide from bug 233527. thanks to tsunam and all the guys from liquidustech who assembled the thing. and to rane for some xml cleanups. added to the sysadmin_specific documentation category.

1 <?xml version='1.0' encoding="UTF-8"?>
2 <!-- $Header: $ -->
3 <!DOCTYPE guide SYSTEM "/dtd/guide.dtd">
5 <guide>
6 <title>Multipathing for Gentoo</title>
8 <author title="Author">
9 <mail link="tsunam"/>
10 </author>
11 <author title="Author">
12 <mail link="matthew.summers@liquidustech.com">Matthew Summers</mail>
13 </author>
14 <author title="Author">
15 <mail link="richard.anderson@liquidustech.com">Richard Anderson</mail>
16 </author>
17 <author title="Author">
18 <mail link="steve.rucker@liquidustech.com">Steve Rucker</mail>
19 </author>
20 <author title="Editor">
21 <mail link="nightmorph"/>
22 </author>
24 <abstract>
25 This document teaches you how to set up multipathing services for data storage.
26 </abstract>
28 <!-- The content of this document is licensed under the CC-BY-SA license -->
29 <!-- See http://creativecommons.org/licenses/by-sa/2.5 -->
30 <license/>
32 <version>1</version>
33 <date>2008-09-10</date>
35 <chapter>
36 <title>Introduction</title>
37 <section>
38 <body>
40 <p>
41 Multipathing services, generally deployed in enterprise environments, provide a
42 means for high performance, load-balanced, and fault-tolerant data storage
43 either locally or via a storage area network (SAN). Multipathing facilitates a
44 single storage device to be transparently accessed across one or more paths.
45 For example, if there are two connections from a server Host Bus Adapter (HBA)
46 to two Fibre Channel switches and then to a SAN, when the HBA module loads and
47 scans the bus, it will read four paths to the SAN: the paths from the server HBA
48 to and from each Fibre Channel switch and at the storage device. Taking
49 advantage of this situation, Multipath allows you to make use of each path
50 simultaneously or independently to ensure a constant and reliable connection to
51 the data in storage. Multipath serves as a failover for all connections points
52 in the event of losing one path making critical data always available due to
53 redundancy in the design and implementation.
54 </p>
56 <p>
57 In the most basic sense, multipathing is made of two distinct parts:
58 <c>device-mapper</c> and <c>multipath-tools</c>. <b>Device Mapper</b> is the
59 first key element of this application. Administrators are probably familiar with
60 Device Mapper from LVM, EVMS, dm-crypt, or in this case, Multipath. In short,
61 working within the kernel space Device Mapper takes one block device such as
62 <path>/dev/sda</path> (as all SAN based targets will be some type of SCSI
63 device) and maps it to another device.
64 </p>
66 <p>
67 On a lower level, Device Mapper creates a virtual block device accepting all of
68 the commands of a regular block device, but passes on the actual data to the
69 real block device. As previously stated, the mapping process is all handled in
70 the kernel space and not in user space.
71 </p>
73 <p>
74 <b>Multipath Tools</b> is a set of userspace tools that interacts with the
75 Device Mapper tools and creates structures for device handling, implementing I/O
76 multipathing at the OS level. In a typical SAN environment, you will have
77 multiple paths to the same storage device: a fiber card (or two) on your server
78 that connects to a switch which then connects to the actual storage itself (as
79 in the scenario discussed above). So administrators could possibly see the same
80 device one to four times in such a situation (each card will see the LUN twice,
81 once for each path it has available to it). Thus, a single drive could be
82 recognized as <path>sda</path>, <path>sdb</path>, <path>sdc</path>, and
83 <path>sdd</path>. If you were to mount <path>/dev/sda</path> to
84 <path>/san1</path>, for instance, you would be going over the singular path from
85 one fiber card to a switch and then to a port on the same storage device. If any
86 of those points were to fail, you would lose your storage device suddenly and
87 have to unmount and remount with another device (<path>sdb</path>).
88 </p>
90 <p>
91 Consequently, this scenario is not ideal as you are only using one out of the
92 four possible paths. This is where the combination of Multipath tools and Device
93 Mapper are beneficial. As already explained, Device Mapper creates virtual block
94 devices and then passes information to the real block devices.
95 </p>
97 </body>
98 </section>
99 </chapter>
101 <chapter>
102 <title>Installation and Configuration</title>
103 <section>
104 <title>Installation and Tools</title>
105 <body>
107 <p>
108 Tou need to emerge <c>multipath-tools</c> and <c>sg3_utils</c>. On the disk, you
109 want to find the <c>wwid</c>. You can use <c>sq_vpd</c> (provided by
110 <c>sg3_utils</c>) to do this.
111 </p>
113 <pre caption="Installing multipath-tools and initial configuration">
114 # <i>emerge multipath-tools sg3_utils</i>
115 <comment>(Replace /dev/DEVICE with your disk to find its wwid)</comment>
116 # <i>/usr/bin/sq_vpd ?page=di /dev/DEVICE</i>
117 </pre>
119 <p>
120 Where DEVICE is the sd device, the ID will come back with a <c>0x6</c>. Replace
121 <c>0x</c> with <c>3</c>, and you will have the proper ID that you'll put into
122 the multipath <c>wwid</c> in <path>/etc/multipath.conf</path>. More on this in
123 the next chapter.
124 </p>
126 </body>
127 </section>
128 <section>
129 <title>Configuring Gentoo for multipathing</title>
130 <body>
132 <p>
133 To configure Gentoo for multipath, your kernel needs the following settings:
134 </p>
136 <pre caption="Adding multipath support">
137 Device Drivers --->
138 SCSI device support --->
139 &lt;*&gt; SCSI target support
140 &lt;*&gt; SCSI disk support
141 [*] Probe all LUNs on each SCSI device
142 [*] Multiple devices driver support (RAID and LVM) --->
143 &lt;*&gt; Multipath I/O support
144 &lt;*&gt; Device mapper support
145 &lt;*&gt; Multipath target
146 <comment>(Select your device from the list)</comment>)
147 &lt;*&gt; EMC CX/AX multipath support
148 &lt;*&gt; LSI/Engenio RDAC multipath support
149 &lt;*&gt; HP MSA multipath support
150 </pre>
152 <note>
153 <c>scsi_id</c> is done by targets. IDE drives have two spots to which you can
154 connect. An administrator has the ability to set a drive as a master and another
155 drive as a slave or set to autoselect by changing the dip switches. scsi_id is
156 similar. Each drive or Logical Unit Number (LUN) has a unique ID, which ranges
157 from 0 to 254. A device that has ID 0 will be discovered before a device that
158 has, for example, ID 120, because it performs a LIP (a scan of the SCSI bus for
159 devices that respond) that starts from 0 and works its way upwards.
160 </note>
162 <p>
163 In the kernel menu config, make sure CONFIG_SCSI_MULTI_LUN=y is set to ensure
164 the SCSI subsystem is able to probe all Logical Unit Numbers (LUNs) (This is
165 recommended as you'll stop scanning after ID 0 if you have a device on an ID of
166 <c>0</c> but not <c>1</c> and then on an ID of <c>2</c>. Simply, you'll get your
167 device for ID <c>0</c> but not <c>2</c>.) or whichever device you need for SCSI,
168 such as a QLogic 2400 card, which is in the SCSI low-level drivers area.
169 </p>
171 <p>
172 For a better understanding, consider the following scenarios:
173 </p>
175 <p>
176 There are three drives with IDs of 0,1,2. Without the "probe all LUNs" setting,
177 you will see IDs 0,1,2 as sda,sdb,sdc - all devices are seen. If you delete the
178 ID 1 drive. IDs 0,2 will still be seen. It might seem to make sense that you
179 would see sda and sdb now (sdc would move to sdb as there is no device to fill
180 it up). However, if you don't probe all LUNs, it will perform in the following
181 manner:
182 </p>
184 <p>
185 Scenario 1: Without "probe all LUNs", the scan will start and ID 0 will be seen.
186 ID 0 will be set to sda and then move to find ID 1. If ID 1 is not detected,
187 scanning will stop and be considered complete having perceived to have scanned
188 all devices even if there is a device on ID 2 or any other subsequent ID. Reboot
189 for scenario two.
190 </p>
192 <p>
193 Scenario 2: If you have "probe all LUNs", the scan will start and detect ID 0.
194 This ID will be assigned sda and will continue to detect the next device. If ID
195 1 is not detected, scanning will continue to find more devices. ID 2 will be
196 located and assigned to be sdb. If no devices (IDs) are detected beyond that,
197 scanning will be considered complete.
198 </p>
200 <note>
201 Although it seems that it is unfeasible or even unnecessary to have devices
202 spaced many LUNs apart, to account for all options it is necessary to still
203 probe all LUNs. An administrator will encounter many reasons (business or
204 personal) for such a setup. Therefore, the second scenario would be optimal to
205 ensure that all devices are recognized and assigned an ID in the multipath setup
206 process.
207 </note>
209 <p>
210 So, once you probe all LUNs, all devices will be recognized and assigned an ID
211 in Multipath.
212 </p>
214 </body>
215 </section>
216 </chapter>
218 <chapter>
219 <title>Architectural Overview</title>
220 <section>
221 <body>
223 <p>
224 As part of Multipath Tools, there are priority groups filled with the devices
225 mentioned earlier. After you have configured <c>multipath-tools</c> and started
226 it with <c>/etc/init.d/multipath start</c>, you can list the groups via
227 <c>multipath -l</c>. The output will look like the following:
228 </p>
230 <pre caption="multipath -l output">
231 EVA_SAN (3600508b4001044ee00013000031e0000)
232 [size=300 GB][features="1 queue_if_no_path"][hwhandler="0"]
233 \_ round-robin 0 [active]
234 \_ 0:0:0:1 sda 8:0 [active]
235 \_ round-robin 0 [enabled]
236 \_ 0:0:1:1 sdb 8:16 [active]
238 EVA_SAN2 (3600508b4001044ee0001300003880000)
239 [size=300 GB][features="1 queue_if_no_path"][hwhandler="0"]
240 \_ round-robin 0 [active]
241 \_ 0:0:0:2 sdc 8:32 [active]
242 \_ round-robin 0 [enabled]
243 \_ 0:0:1:2 sdd 8:48 [active]
244 </pre>
246 <p>
247 By default, it will pick the first priority group (the first top round-robin for
248 the EVA_SAN2, for instance, being <path>sdc</path>). In this instance, due to
249 round robin it will bounce back and forth. But if one path was to fail, it would
250 push all information to the other path and continue. Only if all the devices in
251 a path fail will it actually fail and go to the secondary priority group.
252 </p>
254 </body>
255 </section>
256 <section>
257 <title>Typical Configuration</title>
258 <body>
260 <p>
261 A typical Multipath configuration looks like the following:
262 </p>
264 <pre caption="A typical /etc/multipath.conf file">
265 defaults {
266 udev_dir /dev
267 polling_interval 15
268 selector "round-robin 0"
269 path_grouping_policy group_by_prio
270 failback 5
271 path_checker tur
272 prio_callout "/sbin/mpath_prio_tpc /dev/%n"
273 rr_min_io 100
274 rr_weight uniform
275 no_path_retry queue
276 user_friendly_names yes
277 }
278 blacklist {
279 devnode cciss
280 devnode fd
281 devnode hd
282 devnode md
283 devnode sr
284 devnode scd
285 devnode st
286 devnode ram
287 devnode raw
288 devnode loop
289 devnode sda
290 }
292 multipaths {
293 multipath {
294 wwid
295 <comment>(To find your wwid, please use /usr/bin/sq_vpd ?page=di /dev/DEVICE.
296 The address will be a 0x6. Remove the 0x and replace it with 3.)</comment>
297 alias DB_SAN
298 }
299 devices {
300 device {
301 <comment>(White spacing is important on these two items to match the vendor specifications.)</comment>
302 "IBM "
303 "1815 FAStT "
304 }
305 }
306 }
307 </pre>
309 <impo>
310 On your devices, it is best to <c>cat</c>
311 <path>/sys/block/sd(device)/device/model</path> and <c>cat</c>
312 <path>/sys/block/device/sd(device)/device/vendor</path>, placing both directly
313 into your devices section in <path>/etc/multipath.conf</path>. You might not
314 always see the white spacing, and it's part of the name in this case. One reason
315 for the device section is that not every vendor's string is in the kernel
316 convention and naming, and the string, as such, is not always detected as
317 required.
318 </impo>
320 <p>
321 A typical multipath configuration utilizing an EVA_SAN where the device
322 information is in the kernel information regarding SAN hardware detection would
323 look like:
324 </p>
326 <pre caption="EVA_SAN configuration">
327 multipaths {
328 multipath {
329 wwid 3600508b4001044ee00013000031e0000
330 alias EVA_SAN
331 }
332 multipath {
333 wwid 3600508b4001044ee0001300003880000
334 alias EVA_SAN2
335 }
336 }
337 </pre>
339 </body>
340 </section>
341 </chapter>
343 <chapter>
344 <title>Setting Up Your Own Configuration</title>
345 <section>
346 <body>
348 <p>
349 The multipath configuration is fairly simple to accomplish because the only file
350 that needs modification is <path>/etc/multipath.conf</path>.
351 </p>
353 <p>
354 To begin, set the <b>polling interview</b> to how often (in seconds) path checks
355 will be performed to ensure that the path is alive and healthy.
356 </p>
358 <p>
359 <b>selector</b> will be set at <c>"round-robin 0"</c>.
360 </p>
362 <note>
363 This round-robin value is the only selector value that will be used in this
364 configuration.
365 </note>
367 <p>
368 <b>prio_callout</b>: This one can be quite important, and there are a number of
369 different priorities for different devices, such as:
370 </p>
372 <ul>
373 <li>mpath_prio_alua</li>
374 <li>mpath_prio_emc</li>
375 <li>mpath_prio_hds_modular</li>
376 <li>mpath_prio_netapp</li>
377 <li>mpath_prio_tpc</li>
378 </ul>
380 <note>
381 For most people, <c>mpath_prio_tpc</c> will suffice as it's a conservative
382 checker. Other devices like <c>mpath_prio_netapp</c> have special functionality
383 for priority grouping, such as netapps.
384 </note>
386 <p>
387 <b>path_grouping_policy</b> has a few different options: failover, multibus,
388 group_by_prio. <c>Failover</c> will only have one disk per priority group.
389 <c>Multibus</c> will put all devices into one priority group.
390 <c>Group_by_prio</c> is done by a "priority value." So routes that have the same
391 priority value will be grouped together, the priority values being determined by
392 the callout.
393 </p>
395 <p>
396 <b>no_path_retry</b> is set to <c>queue</c> as most people don't want data to
397 fail to send at all. So, if all paths fail, for instance, the I/Os will queue up
398 until the device returns and then sends everything again. Depending on your
399 transfer, this can cause load issues.
400 </p>
402 <p>
403 <b>rr_min_io</b> are the number of I/Os to do per path before switching to the
404 next I/Os in the same group. If <path>sda</path> and <path>sdb</path> were in
405 the same group, rr_min_io would do 100 I/Os to <path>sda</path> then do 100 to
406 <path>sdb</path>, bouncing back and forth. This is a setting to tweak for each
407 instance to maximize performance because the data load and size of
408 transfers/request vary by company. The default in the case is <c>1000</c>, but
409 some may prefer a smaller number in order to switch ports more often, when
410 possible.
411 </p>
413 <p>
414 <b>user_friendly_names</b> make it easier to see which device you are working
415 with. For example, if you set user_friendly_names to <c>no</c>, then you'll see
416 WWID instead of EVA_SAN for your device.
417 </p>
419 </body>
420 </section>
421 </chapter>
422 </guide>

  ViewVC Help
Powered by ViewVC 1.1.20