Remove everything after 2nd occurrence in a string in unix

27,451

Solution 1

Something like this would do it.

echo "After-u-math-how-however" | cut -f1,2 -d'-'

This will split up (cut) the string into fields, using a dash (-) as the delimiter. Once the string has been split into fields, cut will print the 1st and 2nd fields.

Solution 2

This might work for you (GNU sed):

sed 's/-[^-]*//2g' file

Solution 3

You could use the following regex to select what you want:

^[^-]*-\?[^-]*

For example:

echo "After-u-math-how-however" | grep -o "^[^-]*-\?[^-]*"

Results:

After-u

Solution 4

@EvanPurkisher's cut -f1,2 -d'-' solution is IMHO the best one but since you asked about sed and awk:

With GNU sed for -r

$ echo "After-u-math-how-however" | sed -r 's/([^-]+-[^-]*).*/\1/'
After-u

With GNU awk for gensub():

$ echo "After-u-math-how-however" | awk '{$0=gensub(/([^-]+-[^-]*).*/,"\\1","")}1'
After-u

Can be done with non-GNU sed using \( and *, and with non-GNU awk using match() and substr() if necessary.

Solution 5

awk -F - '{print $1 (NF>1? FS $2 : "")}' <<<'After-u-math-how-however'
  • Split the line into fields based on field separator - (option spec. -F -) - accessible as special variable FS inside the awk program.
  • Always print the 1st field (print $1), followed by:
    • If there's more than 1 field (NF>1), append FS (i.e., -) and the 2nd field ($2)
    • Otherwise: append "", i.e.: effectively only print the 1st field (which in itself may be empty, if the input is empty).
Share:
27,451
Jose
Author by

Jose

Updated on July 25, 2020

Comments

  • Jose
    Jose almost 4 years

    I would like to remove everything after the 2nd occurrence of a particular pattern in a string. What is the best way to do it in Unix? What is most elegant and simple method to achieve this; sed, awk or just unix commands like cut?

    My input would be

    After-u-math-how-however
    

    Output should be

    After-u
    

    Everything after the 2nd - should be stripped out. The regex should also match zero occurrences of the pattern, so zero or one occurrence should be ignored and from the 2nd occurrence everything should be removed.

    So if the input is as follows

    After
    

    Output should be

    After
    
  • Jose
    Jose almost 10 years
    Looks like the best! Any idea about how to get the same in sed or awk?
  • John C
    John C almost 10 years
    Ok, I had another crack. Despite my better judgement because the OP has done no research or made any attempt to solve.
  • Evan Purkhiser
    Evan Purkhiser almost 10 years
    Good solution. You should reset the IFS after though, no?
  • mklement0
    mklement0 almost 10 years
    +1; note, however, that there appears to be a bug in FreeBDS grep 2.5.1 (as of OS X 10.9.3, for instance), causing the ^ anchor to be ignored, resulting in potentially multiple matches (and thus multiple output lines). Works fine with GNU grep.
  • kojiro
    kojiro almost 10 years
    @EvanPurkhiser no, you should use scope to manage the value. Put the above code in a function with local IFS instead of trying to manually save and restore the original IFS.
  • Ed Morton
    Ed Morton almost 10 years
    So the positive about this is that there's no fork, no external process (why do we care?) but the negatives are that you still need to write more code to manage the scope of the IFS change, plus if you want to do this on more than 1 line you need to manually write a loop to process every line (unlike sed and awk solutions), plus as written it will handle any backslashes in the input incorrectly, plus you need to think about whether there's a globbing impact, plus you need to think about whether the echo is going to behave as desired. Shell is an environment from which to call tools.
  • mklement0
    mklement0 almost 10 years
    +1 for the sed solution; using -E instead of -r would make the command work with both GNU (Linux) and BSD (OSX) sed. POSIX sed, which uses basic regexes, can emulate +, namely as \{1,\}: sed 's/\([^-]\{1,\}-[^-]*\).*/\1/'
  • kojiro
    kojiro almost 10 years
    @EdMorton All of these "negatives" start with "if". "If" you haven't clarified your requirements, then you will get a generalized answer that may be optimal in some cases and suboptimal in others. Shell is an environment from which to call tools, and often it's valuable to understand which of those tools are built into the shell, instead of always falling back on awk and sed.
  • kojiro
    kojiro almost 10 years
    @EdMorton also, what globbing impact? 1. Bash doesn't expand globs in a herestring. 2. The shell doesn't expand globs within double-quoted parameter expansions, including array expansions. The only way to have a problem with globs in this answer would be to remove the quotes, which would substantially change the answer.
  • Hussain K
    Hussain K over 3 years
    How to achieve the same in reverse? Just cut "however" in the string and print it. No matter how big is the string
  • Amanda
    Amanda almost 3 years
    @HussainK - using stackoverflow.com/questions/22727107/…, you could do ... | rev | cut -f1 -d'-' | rev
  • Ed Morton
    Ed Morton over 2 years
    @IsinAltinkaya the way to express a preference is to upvote the answer you prefer. I upvoted potong's answer, for example.