Remove lines based on pattern but keeping first n lines that match

text-processing awk sed

387

Solution 1

If you want to delete all lines starting with % put preserving the first two lines of input, you could do:

sed -e 1,2b -e '/^%/d'

Though the same would be more legible with awk:

awk 'NR <= 2 || !/^%/'

Or, if you're after performance:

{ head -n 2; grep -v '^%'; } < input-file

If you want to preserve the first two lines matching the pattern while they may not be the first ones of the input, awk would certainly be a better option:

awk '!/^%/ || ++n <= 2'

With sed, you could use tricks like:

sed -e '/^%/!b' -e 'x;/xx/{h;d;}' -e 's/^/x/;x'

That is, use the hold space to count the number of occurrences of the patterns matched so far. Not terribly efficient or legible.

Solution 2

I'm afraid sed alone is a bit too simple for this (not that it would be impossible, rather complicated - see e.g. sed sokoban for what can be done).

How about awk?

#!/bin/awk -f
BEGIN { c = 0; }
{
    if (/^%/) {
        if (c++ < 3) {
            print;
        }
    } else {
        print;
    }
}

If you can rely on using recent enough BASH (which supports regular expressions), the awk above can be translated to:

#!/bin/bash -
c=0
while IFS= read -r line; do
    if [[ $line =~ ^% ]]; then
        if ((c++ < 3)); then
            printf '%s\n' "$line"
        fi
    else
        printf '%s\n' "$line"
    fi
done

You can also use sed or grep to do the pattern matching instead of the =~ operator.

Solution 3

A Perl one-liners solution:

# in-place editing
perl -i -pe '$.>2 && s/^%.*//s' filename.txt

# print to the standard output
perl -ne '$.>2 && /^%/ || print' filename.txt

Solution 4

sed '/^%/{
3,$d}' '% 1 
% 2
% 3
% 4
% 5
text1
text2
text3'

One way of removing the extra lines.

Edit: my answer works under the same condition as Stephane Chazelas's if the % rows doesn't occur first, it won't work.

Nerd sniping.

sed -n '/^% [^12]*$/!{
/^% [12][[:digit:]]\{1,\}/n
p}' file.txt

Will work regardless of where the % number string is found in the stream. Any line that starts with % and ends with any number of characters besides 1 or 2, which we negate. That address matches anything besides /% [A-Za-z3-9]*/ leaving an blind spot. Numbers between 10-29 will print still. So we nest a second address to match that range and skip the line.

But awk would still be better.

Solution 5

tr '\n' ';' < input | sed 's/% /##/3g' | tr ';' '\n' | sed '/##/d'

I replaced new line characters with ';' to obtain single line string, then turned all but first two occurrences of pattern into ## marking with sed 's/pattern/##/3g' (replace from third to last occurrence of pattern in line), changed back ';' to '\n' and finally removed marked lines.

View more solutions

387

GreeneScreen

Updated on September 18, 2022

Comments

GreeneScreen over 1 year

I have a asp.net page with c# code behind. I have a first panel where the user selects and enters information, they then click continue and that data is stored in variable. A new panel displays on screen and the select some new data which when they click continue stores that data in that panel and sends all the information to a c# program. The problem I an getting is that when I click continue the first time and the page refreshing showing only the new panel all the data defaults to 0. How can I fix this?

Thanks
- Admin almost 13 years
  
  Are you using asp.net webforms or mvc?
- Turnkey almost 13 years
  
  Sounds like probably webforms with code behind.
Stéphane Chazelas over 11 years

To match a line starting with % in shell, no need for regexps or ksh/bash specific features like [[, you can use case $line in %*). Doing it this way with shells, especially bash, is going to be terribly inefficient. Using loops in shells is generally considered bad practice.
Jana over 11 years

Thanks @Stephane. It worked. Thanks for the additional info as well.
Jana over 11 years

Thanks @peterph. Since my files are huge, I was really looking for something like Stephane's answer. Thanks again
Jana over 11 years

Thanks @Nykakin. The pattern replacing for my data won't be efficient. Thank you for your input
peterph over 11 years

@Jana No problem, it wasn't just really clear to me, whether the lines matching he pattern were supposed to be only at the beginning of the file or interspersed with the rest. That's why I used the loops.