| 1 | <?xml version='1.0' encoding="UTF-8"?> |
1 | <?xml version='1.0' encoding="UTF-8"?> |
| 2 | <!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/articles/l-sed2.xml,v 1.7 2011/09/04 17:53:41 swift Exp $ --> |
2 | <!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/articles/l-sed2.xml,v 1.8 2012/06/29 16:03:34 swift Exp $ --> |
| 3 | <!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> |
3 | <!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> |
| 4 | |
4 | |
| 5 | <guide disclaimer="articles"> |
5 | <guide disclaimer="articles"> |
| 6 | <title>Sed by example, Part 2</title> |
6 | <title>Sed by example, Part 2</title> |
| 7 | |
7 | |
| 8 | <author title="Author"> |
8 | <author title="Author"> |
| 9 | <mail link="drobbins@gentoo.org">Daniel Robbins</mail> |
9 | <mail link="drobbins@gentoo.org">Daniel Robbins</mail> |
| 10 | </author> |
10 | </author> |
| 11 | |
11 | |
| 12 | <abstract> |
12 | <abstract> |
| 13 | Sed is a very powerful and compact text stream editor. In this article, the |
13 | Sed is a very powerful and compact text stream editor. In this article, the |
| 14 | second in the series, Daniel shows you how to use sed to perform string |
14 | second in the series, Daniel shows you how to use sed to perform string |
| 15 | substitution; create larger sed scripts; and use sed's append, insert, and |
15 | substitution; create larger sed scripts; and use sed's append, insert, and |
| 16 | change line commands. |
16 | change line commands. |
| 17 | </abstract> |
17 | </abstract> |
| 18 | |
18 | |
| 19 | <!-- The original version of this article was published on IBM developerWorks, |
19 | <!-- The original version of this article was published on IBM developerWorks, |
| 20 | and is property of Westtech Information Services. This document is an updated |
20 | and is property of Westtech Information Services. This document is an updated |
| 21 | version of the original article, and contains various improvements made by the |
21 | version of the original article, and contains various improvements made by the |
| 22 | Gentoo Linux Documentation team --> |
22 | Gentoo Linux Documentation team --> |
| 23 | |
23 | |
| 24 | <version>1.2</version> |
24 | <version>2</version> |
| 25 | <date>2005-10-09</date> |
25 | <date>2005-10-09</date> |
| 26 | |
26 | |
| 27 | <chapter> |
27 | <chapter> |
| 28 | <title>How to further take advantage of the UNIX text editor</title> |
28 | <title>How to further take advantage of the UNIX text editor</title> |
| 29 | <section> |
29 | <section> |
| 30 | <title>Substitution!</title> |
30 | <title>Substitution!</title> |
| 31 | <body> |
31 | <body> |
| 32 | |
32 | |
| 33 | <p> |
33 | <p> |
| 34 | Let's look at one of sed's most useful commands, the substitution command. |
34 | Let's look at one of sed's most useful commands, the substitution command. |
| 35 | Using it, we can replace a particular string or matched regular expression with |
35 | Using it, we can replace a particular string or matched regular expression with |
| 36 | another string. Here's an example of the most basic use of this command: |
36 | another string. Here's an example of the most basic use of this command: |
| 37 | </p> |
37 | </p> |
| 38 | |
38 | |
| 39 | <pre caption="Most basic use of substitution command"> |
39 | <pre caption="Most basic use of substitution command"> |
| 40 | $ <i>sed -e 's/foo/bar/' myfile.txt</i> |
40 | $ <i>sed -e 's/foo/bar/' myfile.txt</i> |
| 41 | </pre> |
41 | </pre> |
| 42 | |
42 | |
| 43 | <p> |
43 | <p> |
| 44 | The above command will output the contents of myfile.txt to stdout, with the |
44 | The above command will output the contents of myfile.txt to stdout, with the |
| 45 | first occurrence of 'foo' (if any) on each line replaced with the string 'bar'. |
45 | first occurrence of 'foo' (if any) on each line replaced with the string 'bar'. |
| 46 | Please note that I said first occurrence on each line, though this is normally |
46 | Please note that I said first occurrence on each line, though this is normally |
| 47 | not what you want. Normally, when I do a string replacement, I want to perform |
47 | not what you want. Normally, when I do a string replacement, I want to perform |
| 48 | it globally. That is, I want to replace all occurrences on every line, as |
48 | it globally. That is, I want to replace all occurrences on every line, as |
| 49 | follows: |
49 | follows: |
| 50 | </p> |
50 | </p> |
| 51 | |
51 | |
| 52 | <pre caption="Replacing all the occurences on every line"> |
52 | <pre caption="Replacing all the occurrences on every line"> |
| 53 | $ <i>sed -e 's/foo/bar/g' myfile.txt</i> |
53 | $ <i>sed -e 's/foo/bar/g' myfile.txt</i> |
| 54 | </pre> |
54 | </pre> |
| 55 | |
55 | |
| 56 | <p> |
56 | <p> |
| 57 | The additional 'g' option after the last slash tells sed to perform a global |
57 | The additional 'g' option after the last slash tells sed to perform a global |
| 58 | replace. |
58 | replace. |
| 59 | </p> |
59 | </p> |
| 60 | |
60 | |
| 61 | <p> |
61 | <p> |
| 62 | Here are a few other things you should know about the <c>s///</c> substitution |
62 | Here are a few other things you should know about the <c>s///</c> substitution |
| 63 | command. First, it is a command, and a command only; there are no addresses |
63 | command. First, it is a command, and a command only; there are no addresses |
| 64 | specified in any of the above examples. This means that the <c>s///</c> command |
64 | specified in any of the above examples. This means that the <c>s///</c> command |
| 65 | can also be used with addresses to control what lines it will be applied to, as |
65 | can also be used with addresses to control what lines it will be applied to, as |
| 66 | follows: |
66 | follows: |
| 67 | </p> |
67 | </p> |
| … | |
… | |
| 83 | <p> |
83 | <p> |
| 84 | This example will swap 'hills' for 'mountains', but only on blocks of text |
84 | This example will swap 'hills' for 'mountains', but only on blocks of text |
| 85 | beginning with a blank line, and ending with a line beginning with the three |
85 | beginning with a blank line, and ending with a line beginning with the three |
| 86 | characters 'END', inclusive. |
86 | characters 'END', inclusive. |
| 87 | </p> |
87 | </p> |
| 88 | |
88 | |
| 89 | <p> |
89 | <p> |
| 90 | Another nice thing about the <c>s///</c> command is that we have a lot of |
90 | Another nice thing about the <c>s///</c> command is that we have a lot of |
| 91 | options when it comes to those <c>/</c> separators. If we're performing string |
91 | options when it comes to those <c>/</c> separators. If we're performing string |
| 92 | substitution and the regular expression or replacement string has a lot of |
92 | substitution and the regular expression or replacement string has a lot of |
| 93 | slashes in it, we can change the separator by specifying a different character |
93 | slashes in it, we can change the separator by specifying a different character |
| 94 | after the 's'. For example, this will replace all occurrences of |
94 | after the 's'. For example, this will replace all occurrences of |
| 95 | <path>/usr/local</path> with <path>/usr</path>: |
95 | <path>/usr/local</path> with <path>/usr</path>: |
| 96 | </p> |
96 | </p> |
| 97 | |
97 | |
| 98 | <pre caption="Replacing all the occurences of one string with another one"> |
98 | <pre caption="Replacing all the occurrences of one string with another one"> |
| 99 | $ <i>sed -e 's:/usr/local:/usr:g' mylist.txt</i> |
99 | $ <i>sed -e 's:/usr/local:/usr:g' mylist.txt</i> |
| 100 | </pre> |
100 | </pre> |
| 101 | |
101 | |
| 102 | <note> |
102 | <note> |
| 103 | In this example, we're using the colon as a separator. If you ever need to |
103 | In this example, we're using the colon as a separator. If you ever need to |
| 104 | specify the separator character in the regular expression, put a backslash |
104 | specify the separator character in the regular expression, put a backslash |
| 105 | before it. |
105 | before it. |
| 106 | </note> |
106 | </note> |
| 107 | |
107 | |
| 108 | </body> |
108 | </body> |
| 109 | </section> |
109 | </section> |
| 110 | <section> |
110 | <section> |
| 111 | <title>Regexp snafus</title> |
111 | <title>Regexp snafus</title> |
| 112 | <body> |
112 | <body> |
| 113 | |
113 | |
| 114 | <p> |
114 | <p> |
| 115 | Up until now, we've only performed simple string substitution. While this is |
115 | Up until now, we've only performed simple string substitution. While this is |
| 116 | handy, we can also match a regular expression. For example, the following sed |
116 | handy, we can also match a regular expression. For example, the following sed |
| 117 | command will match a phrase beginning with '<' and ending with '>', and |
117 | command will match a phrase beginning with '<' and ending with '>', and |
| 118 | containing any number of characters inbetween. This phrase will be deleted |
118 | containing any number of characters in-between. This phrase will be deleted |
| 119 | (replaced with an empty string): |
119 | (replaced with an empty string): |
| 120 | </p> |
120 | </p> |
| 121 | |
121 | |
| 122 | <pre caption="Deleting specified phrase"> |
122 | <pre caption="Deleting specified phrase"> |
| 123 | $ <i>sed -e 's/<.*>//g' myfile.html</i> |
123 | $ <i>sed -e 's/<.*>//g' myfile.html</i> |
| 124 | </pre> |
124 | </pre> |
| 125 | |
125 | |
| 126 | <p> |
126 | <p> |
| 127 | This is a good first attempt at a sed script that will remove HTML tags from a |
127 | This is a good first attempt at a sed script that will remove HTML tags from a |
| 128 | file, but it won't work well, due to a regular expression quirk. The reason? |
128 | file, but it won't work well, due to a regular expression quirk. The reason? |
| 129 | When sed tries to match the regular expression on a line, it finds the longest |
129 | When sed tries to match the regular expression on a line, it finds the longest |
| 130 | match on the line. This wasn't an issue in my previous sed article, because we |
130 | match on the line. This wasn't an issue in my previous sed article, because we |
| 131 | were using the <c>d</c> and <c>p</c> commands, which would delete or print the |
131 | were using the <c>d</c> and <c>p</c> commands, which would delete or print the |
| 132 | entire line anyway. But when we use the <c>s///</c> command, it definitely makes |
132 | entire line anyway. But when we use the <c>s///</c> command, it definitely makes |
| 133 | a big difference, because the entire portion that the regular expression matches |
133 | a big difference, because the entire portion that the regular expression matches |
| … | |
… | |
| 176 | their results. |
176 | their results. |
| 177 | </p> |
177 | </p> |
| 178 | |
178 | |
| 179 | </body> |
179 | </body> |
| 180 | </section> |
180 | </section> |
| 181 | <section> |
181 | <section> |
| 182 | <title>More character matching</title> |
182 | <title>More character matching</title> |
| 183 | <body> |
183 | <body> |
| 184 | |
184 | |
| 185 | <p> |
185 | <p> |
| 186 | The '[ ]' regular expression syntax has some more additional options. To specify |
186 | The '[ ]' regular expression syntax has some more additional options. To specify |
| 187 | a range of characters, you can use a '-' as long as it isn't in the first or |
187 | a range of characters, you can use a '-' as long as it isn't in the first or |
| 188 | last position, as follows: |
188 | last position, as follows: |
| 189 | </p> |
189 | </p> |
| 190 | |
190 | |
| 191 | <pre caption="Specifying a rangle of characters"> |
191 | <pre caption="Specifying a range of characters"> |
| 192 | '[a-x]*' |
192 | '[a-x]*' |
| 193 | </pre> |
193 | </pre> |
| 194 | |
194 | |
| 195 | <p> |
195 | <p> |
| 196 | This will match zero or more characters, as long as all of them are |
196 | This will match zero or more characters, as long as all of them are |
| 197 | 'a','b','c'...'v','w','x'. In addition, the '[:space:]' character class is |
197 | 'a','b','c'...'v','w','x'. In addition, the '[:space:]' character class is |
| 198 | available for matching whitespace. Here's a fairly complete list of available |
198 | available for matching whitespace. Here's a fairly complete list of available |
| 199 | character classes: |
199 | character classes: |
| 200 | </p> |
200 | </p> |
| 201 | |
201 | |
| 202 | |
202 | |
| 203 | <table> |
203 | <table> |
| 204 | <tr> |
204 | <tr> |
| 205 | <th>Character class</th> |
205 | <th>Character class</th> |
| 206 | <th>Description</th> |
206 | <th>Description</th> |
| … | |
… | |
| 245 | <ti>[:space:]</ti> |
245 | <ti>[:space:]</ti> |
| 246 | <ti>Whitespace</ti> |
246 | <ti>Whitespace</ti> |
| 247 | </tr> |
247 | </tr> |
| 248 | <tr> |
248 | <tr> |
| 249 | <ti>[:upper:]</ti> |
249 | <ti>[:upper:]</ti> |
| 250 | <ti>Upper-case [A-Z]</ti> |
250 | <ti>Upper-case [A-Z]</ti> |
| 251 | </tr> |
251 | </tr> |
| 252 | <tr> |
252 | <tr> |
| 253 | <ti>[:xdigit:]</ti> |
253 | <ti>[:xdigit:]</ti> |
| 254 | <ti>hex digits [0-9 a-f A-F]</ti> |
254 | <ti>hex digits [0-9 a-f A-F]</ti> |
| 255 | </tr> |
255 | </tr> |
| 256 | </table> |
256 | </table> |
| 257 | |
257 | |
| 258 | <p> |
258 | <p> |
| 259 | It's advantageous to use character classes whenever possible, because they adapt |
259 | It's advantageous to use character classes whenever possible, because they adapt |
| 260 | better to nonEnglish speaking locales (including accented characters when |
260 | better to non-English speaking locales (including accented characters when |
| 261 | necessary, etc.). |
261 | necessary, etc.). |
| 262 | </p> |
262 | </p> |
| 263 | |
263 | |
| 264 | </body> |
264 | </body> |
| 265 | </section> |
265 | </section> |
| 266 | <section> |
266 | <section> |
| 267 | <title>Advanced substitution stuff</title> |
267 | <title>Advanced substitution stuff</title> |
| 268 | <body> |
268 | <body> |
| 269 | |
269 | |
| 270 | <p> |
270 | <p> |
| 271 | We've looked at how to perform simple and even reasonably complex straight |
271 | We've looked at how to perform simple and even reasonably complex straight |
| 272 | substitutions, but sed can do even more. We can actually refer to either parts |
272 | substitutions, but sed can do even more. We can actually refer to either parts |
| 273 | of or the entire matched regular expression, and use these parts to construct |
273 | of or the entire matched regular expression, and use these parts to construct |
| 274 | the replacement string. As an example, let's say you were replying to a message. |
274 | the replacement string. As an example, let's say you were replying to a message. |
| 275 | The following example would prefix each line with the phrase "ralph said: ": |
275 | The following example would prefix each line with the phrase "ralph said: ": |