Deleting the second to last character in each line - with sed

sed

6,812

Solution 1

You can do:

sed -E 's/.(.)$/\1/' file.txt

To edit the file in place, without backup:

sed -Ei 's/.(.)$/\1/' file.txt

To edit the file in place, with original file backed up with .bak extension:

sed -Ei.bak 's/.(.)$/\1/' file.txt

POSIX-ly:

sed 's/.\(.\)$/\1/' file.txt

Solution 2

To FULLY explain the portable command (since someone asked) so that ANYONE may understand this:

sed 's/.\(.\)$/\1/' file.txt

Firstly, the "obvious": This line consists of a command name (sed) and two separate arguments which are passed to that command by the shell. The single quotes are stripped away by the shell, so what sed "sees" as its arguments are:

s/.$.$$/\1/

and

file.txt

Since none of the arguments to sed begin with a hyphen, it doesn't interpret any of them as options.

The first argument is interpreted as an editing command to be run, and any other arguments (in this case just the one, file.txt) are interpreted as names of files from which to read the text to be edited by the editing command (the first argument).

(Note that the edited text is written to sed's "standard output"—that is, back to your terminal, your command line window—it is not written back to the file.)

file.txt must be a filename of a file located in the directory which is the "current working directory" of your shell when you execute this sed command. (If you want the command to work on the same file regardless of what your shell's current working directory is when you run the command, read up on "absolute paths.")

Now we shall deconstruct the editing command itself:

s/.$.$$/\1/

The editing command begins with the letter s, which is for "substitute." From the character following the "s" (which is / in this case), up to the next instance of that same character (/ again), is the pattern which is to be substituted for. In other words, it specifies what the text should "look like" that is to be replaced—it tells sed how to "know" when it has found text which should be replaced (should be substituted for).

The pattern in this case is:

.$.$$

(The proper term in place of "pattern" is actually regex, originally short for "regular expression." I won't go into the broader subject of regexes here.)

This regex begins with a dot (.), which is a "wildcard" with the meaning "any single character." It matches (describes, symbolizes) any single character of text.

The backslash (\) is used often in shell commands and regexes as an "escape" character. In general, this means that either it removes the special significance of the character which follows it, or it adds a special significance to the following character.

In this case, the parentheses (both ( and )) are escaped (which is to say, preceded by a backslash) in order to add a special meaning. The special meaning of escaped parentheses in a sed regex is that whatever text matches the part of the regex in between the parentheses is "noted" specially and can be referred back to. We'll come back to this later (when we refer back to this parentheses grouping).

The period (.) within the parentheses again matches any single character.

The dollar sign ($) is called an anchor, and it matches the end of a line of text. In absence of this anchor, the regex would simply match any two characters (specifically it would match the first two characters on each line of text read in from the file called file.txt), and (due to the escaped parentheses) sed would "note" the second of the two characters for referring back to later on.

Because the regex is anchored to the end of the line, the two dots must match the last two characters on each line of text (and the final character is noted for referring back to later).

The next portion of the s (substitute) command is from the second instance of the character following s (in this case a slash, /), to the third instance of the character following s. This is called the replacement pattern. It specifies what sed should put in place of the text matched by the search pattern (the regex).

In this case the replacement pattern is:

\1

Again, the backslash is used to escape the character following, and in this case it is again to add a special meaning rather than to take away a special meaning.

A backslash followed by a numeral (from 1 to 9) is called a backreference. This is what refers back to the text matched within the parentheses grouping in the search pattern. Since the numeral is 1, this refers to the first parentheses grouping. (In this case, of course, there is only one such grouping.)

In summary, this editing command means to use the text matched within those escaped parentheses (which is the final character of the line) to replace the text matched by the entire search regex (which is the final two characters of the line).

The net effect is to remove the second to last character from each line.

Or, more precisely, sed will read in each line of text from the file called file.txt found in the current working directory; for each line it will replace the final two characters of the line with the single final character of that line; and it will print each modified line to its standard output.

6,812

Anurag Singh

Updated on September 18, 2022

Comments

Anurag Singh over 1 year

How do I delete the character before the last character in each line in a file?

I tried sed 's/.$//' myfile1.txt which removed the last character of each line in myfile1.txt, but I am not sure how to delete the penultimate character in each line.