How to append Line to previous Line?

9,505

Solution 1

A version in perl, using negative lookaheads:

$ perl -0pe 's/\n(?!([0-9]{8}|$))//g' test.txt
20141101 server contain dump
20141101 server contain nothing    {uekdmsam ikdas jwdjamc ksadkek} ssfjddkc * kdlsdlsddsfd jfkdfk
20141101 server contain dump

-0 allows the regex to be matched across the entire file, and \n(?!([0-9]{8}|$)) is a negative lookahead, meaning a newline not followed by 8 digits, or end of the line (which, with -0, will be the end of the file).

Solution 2

May be a little bit easy with sed

sed -e ':1 ; N ; $!b1' -e 's/\n\+\( *[^0-9]\)/\1/g'
  • first part :1;N;$!b1 collect all lines in file divided by \n in 1 long line

  • second part strip newline symbol if it followed non-digit symbol with possible spaces between its.

To avoid memory limitation (espesially for big files) you can use:

sed -e '1{h;d}' -e '1!{/^[0-9]/!{H;d};/^[0-9]/x;$G}' -e 's/\n\+\( *[^0-9]\)/\1/g'

Or forget a difficult sedscripts and to remember that year starts from 2

tr '\n2' ' \n' | sed -e '1!s/^/2/' -e 1{/^$/d} -e $a

Solution 3

One way would be:

 $ perl -lne 's/^/\n/ if $.>1 && /^\d+/; printf "%s",$_' file
 20141101 server contain dump
 20141101 server contain nothing    {uekdmsam ikdas jwdjamc ksadkek} ssfjddkc * kdlsdlsddsfd jfkdfk 
 20141101 server contain dump

However, .that also removes the final newline. To add it again, use:

$ { perl -lne 's/^/\n/ if $.>1 && /^\d+/; printf "%s",$_' file; echo; } > new

Explanation

The -l will remove trailing newlines (and also add one to each print call which is why I use printf instead. Then, if the current line starts with numbers (/^\d+/) and the current line number is greater than one ($.>1, this is needed to avoid adding an extra empty line at the beginning), add a \n to the beginning of the line. The printf prints each line.


Alternatively, you can change all \n characters to \0, then change those \0 that are right before a string of numbers to \n again:

$ tr '\n' '\0' < file | perl -pe 's/\0\d+ |$/\n$&/g' | tr -d '\0'
20141101 server contain dump
20141101 server contain nothing    {uekdmsam ikdas jwdjamc ksadkek} ssfjddkc * kdlsdlsddsfd jfkdfk 
20141101 server contain dump

To make it match only strings of 8 numbers, use this instead:

$ tr '\n' '\0' < file | perl -pe 's/\0\d{8} |$/\n$&/g' | tr -d '\0'

Solution 4

Try doing this using :

#!/usr/bin/awk -f

{
    # if the current line begins with 8 digits followed by
    # 'nothing' OR the current line doesn't start with 8 digits
    if (/^[0-9]{8}.*nothing/ || !/^[0-9]{8}/) {
        # print current line without newline
        printf "%s", $0
        # feeding a 'state' variable
        weird=1
    }
    else {
        # if last line was treated in the 'if' statement
        if (weird==1) {
            printf "\n%s", $0
            weird=0
        }
        else {
            print # print the current line
        }
    }
}
END{
    print # add a newline when there's no more line to treat
}

To use it:

chmod +x script.awk
./script.awk file.txt

Solution 5

Another simplest way (than my other answer) using and terdon's algorithm :

awk 'NR>1 && /^[0-9]{8}/{printf "%s","\n"$0;next}{printf "%s",$0}END{print}' file
Share:
9,505
William R
Author by

William R

Updated on September 18, 2022

Comments

  • William R
    William R over 1 year

    I have a Log file which need to be parsed and analysed. File contains something similar like below:

    File:

    20141101 server contain dump
    20141101 server contain nothing
        {uekdmsam ikdas 
    
    jwdjamc ksadkek} ssfjddkc * kdlsdl
    sddsfd jfkdfk 
    20141101 server contain dump
    

    Based on the above scenario, I have to check if the starting line doesn't contain date or Number I have to append to previous line.

    Output file:

    20141101 server contain dump
    20141101 server contain nothing {uekdmsam ikdas jwdjamc ksadkek} ssfjddkc * kdlsdl sddsfd jfkdfk 
    20141101 server contain dump
    
  • muru
    muru over 9 years
    @terdon, updated to save last newline.
  • terdon
    terdon over 9 years
    Nice one! I'd upvote you but I'm afraid I already had :)
  • terdon
    terdon over 9 years
    Nice, +1. Could you add an explanation of how it works please?
  • mirabilos
    mirabilos over 9 years
    Aw. Nice. I always do tr '\n' $'\a' | sed $'s/\a\a*\( *[^0-9]\)/\1/g' | tr $'\a' '\n' myself.
  • mirabilos
    mirabilos over 9 years
    Sorry, have to downvote though for using things that are not POSIX BASIC REGULAR EXPRESSIONS in sed(1), which is a GNUism.
  • Costas
    Costas over 9 years
    @mirabilos Kindly ask you to indicate non-POSIX exptression in my script.
  • mirabilos
    mirabilos over 9 years
    There is no + or \+ in POSIX basic regular expressions.
  • Costas
    Costas over 9 years
    @mirabilos From man grep >**Basic vs Extended Regular Expressions** > In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; > instead use the backslashed versions \?, \+, \{, \|, (, and ).
  • Stéphane Chazelas
    Stéphane Chazelas over 9 years
    No, -0 if for NUL-delimited records. Use -0777 to slurp the entire file in memory (which you don't need to here).
  • Stéphane Chazelas
    Stéphane Chazelas over 9 years
    @Costas, that's GNU grep's man page. POSIX BRE spec are there. BRE equivalent of ERE + is \{1,\}. [\n] is not portable either. \n\{1,\} would be POSIX.
  • muru
    muru over 9 years
    @StéphaneChazelas So whats the best way to make Perl match the newline, other than reading the whole file in?
  • Costas
    Costas over 9 years
    @StéphaneChazelas OK, if you'd like to be so old-school you are welcome to change \+ to \{1,\}
  • Stéphane Chazelas
    Stéphane Chazelas over 9 years
    Also, you can't have another command after a label. : 1;x is to define the 1;x label in POSIX seds. So you need: sed -e :1 -e 'N;$!b1' -e 's/\n\{1,\}\( *[^0-9]\)/\1/g'. Also note that many sed implementations have a small limit on the size of their pattern space (POSIX only guarantees 10 x LINE_MAX IIRC).
  • Stéphane Chazelas
    Stéphane Chazelas over 9 years
    The first argument to printf is the format. Use printf "%s", $_
  • Costas
    Costas over 9 years
    @StéphaneChazelas Yes, I am worry about space limitation too, even try to play with P and D but I couldn't find acceptable solution
  • terdon
    terdon over 9 years
    @StéphaneChazelas why? I mean, I know it's cleaner and perhaps easier to understand but is there any danger that that would protect from?
  • Stéphane Chazelas
    Stéphane Chazelas over 9 years
    Yes, it's wrong and potentially dangerous if the input may contain % characters. Try with an input with %10000000000s for instance.
  • Stéphane Chazelas
    Stéphane Chazelas over 9 years
    In C, that's a very well known very bad practice and vulnerability source. With perl, echo %.10000000000f | perl -ne printf brings my machine to its knees.
  • Stéphane Chazelas
    Stéphane Chazelas over 9 years
    See the other answers that process the file line by line.
  • terdon
    terdon over 9 years
    @StéphaneChazelas wow, yes. Mine too. Fair enough then, answer edited and thanks.
  • Stéphane Chazelas
    Stéphane Chazelas over 9 years
    ITYM END{print ""}. Alternative: awk -v ORS= 'NR>1 && /^[0-9]{8}/{print "\n"};1;END{print "\n"}'
  • muru
    muru over 9 years
    @AvinashRaj Yes, it should be more efficient, but produces wrong results if non-log lines include blank ones?
  • muru
    muru over 9 years
    @StéphaneChazelas so there's no middle ground between "matching a newline" and "reading the whole file and the library next to it"?
  • mirabilos
    mirabilos over 9 years
    This will break if the line contains, say, a backslash and an n. It also strips whitespace. But you can use mksh to do this: while IFS= read -r L; do [[ $L = [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]* ]] && print; print -nr -- "$L"; done; print
  • rook
    rook over 9 years
    Of course it is not for everything algorithm, but solution for the requirements provided by the task. Of course the final solution will be more complex and less readable at a glance as it usually happens in Real Life :)
  • mirabilos
    mirabilos over 9 years
    I agree, but I’ve learned the hard way to not assume too much about the OP ☺ especially if they replace the actual text by dummy text.