How to append Line to previous Line?
Solution 1
A version in perl
, using negative lookaheads:
$ perl -0pe 's/\n(?!([0-9]{8}|$))//g' test.txt
20141101 server contain dump
20141101 server contain nothing {uekdmsam ikdas jwdjamc ksadkek} ssfjddkc * kdlsdlsddsfd jfkdfk
20141101 server contain dump
-0
allows the regex to be matched across the entire file, and \n(?!([0-9]{8}|$))
is a negative lookahead, meaning a newline not followed by 8 digits, or end of the line (which, with -0
, will be the end of the file).
Solution 2
May be a little bit easy with sed
sed -e ':1 ; N ; $!b1' -e 's/\n\+\( *[^0-9]\)/\1/g'
first part
:1;N;$!b1
collect all lines in file divided by\n
in 1 long linesecond part strip newline symbol if it followed non-digit symbol with possible spaces between its.
To avoid memory limitation (espesially for big files) you can use:
sed -e '1{h;d}' -e '1!{/^[0-9]/!{H;d};/^[0-9]/x;$G}' -e 's/\n\+\( *[^0-9]\)/\1/g'
Or forget a difficult sed
scripts and to remember that year starts from 2
tr '\n2' ' \n' | sed -e '1!s/^/2/' -e 1{/^$/d} -e $a
Solution 3
One way would be:
$ perl -lne 's/^/\n/ if $.>1 && /^\d+/; printf "%s",$_' file
20141101 server contain dump
20141101 server contain nothing {uekdmsam ikdas jwdjamc ksadkek} ssfjddkc * kdlsdlsddsfd jfkdfk
20141101 server contain dump
However, .that also removes the final newline. To add it again, use:
$ { perl -lne 's/^/\n/ if $.>1 && /^\d+/; printf "%s",$_' file; echo; } > new
Explanation
The -l
will remove trailing newlines (and also add one to each print
call which is why I use printf
instead. Then, if the current line starts with numbers (/^\d+/
) and the current line number is greater than one ($.>1
, this is needed to avoid adding an extra empty line at the beginning), add a \n
to the beginning of the line. The printf
prints each line.
Alternatively, you can change all \n
characters to \0
, then change those \0
that are right before a string of numbers to \n
again:
$ tr '\n' '\0' < file | perl -pe 's/\0\d+ |$/\n$&/g' | tr -d '\0'
20141101 server contain dump
20141101 server contain nothing {uekdmsam ikdas jwdjamc ksadkek} ssfjddkc * kdlsdlsddsfd jfkdfk
20141101 server contain dump
To make it match only strings of 8 numbers, use this instead:
$ tr '\n' '\0' < file | perl -pe 's/\0\d{8} |$/\n$&/g' | tr -d '\0'
Solution 4
Try doing this using awk :
#!/usr/bin/awk -f
{
# if the current line begins with 8 digits followed by
# 'nothing' OR the current line doesn't start with 8 digits
if (/^[0-9]{8}.*nothing/ || !/^[0-9]{8}/) {
# print current line without newline
printf "%s", $0
# feeding a 'state' variable
weird=1
}
else {
# if last line was treated in the 'if' statement
if (weird==1) {
printf "\n%s", $0
weird=0
}
else {
print # print the current line
}
}
}
END{
print # add a newline when there's no more line to treat
}
To use it:
chmod +x script.awk
./script.awk file.txt
Solution 5
Another simplest way (than my other answer) using awk and terdon's algorithm :
awk 'NR>1 && /^[0-9]{8}/{printf "%s","\n"$0;next}{printf "%s",$0}END{print}' file
William R
Updated on September 18, 2022Comments
-
William R over 1 year
I have a Log file which need to be parsed and analysed. File contains something similar like below:
File:
20141101 server contain dump 20141101 server contain nothing {uekdmsam ikdas jwdjamc ksadkek} ssfjddkc * kdlsdl sddsfd jfkdfk 20141101 server contain dump
Based on the above scenario, I have to check if the starting line doesn't contain date or Number I have to append to previous line.
Output file:
20141101 server contain dump 20141101 server contain nothing {uekdmsam ikdas jwdjamc ksadkek} ssfjddkc * kdlsdl sddsfd jfkdfk 20141101 server contain dump
-
muru over 9 years@terdon, updated to save last newline.
-
terdon over 9 yearsNice one! I'd upvote you but I'm afraid I already had :)
-
terdon over 9 yearsNice, +1. Could you add an explanation of how it works please?
-
mirabilos over 9 yearsAw. Nice. I always do
tr '\n' $'\a' | sed $'s/\a\a*\( *[^0-9]\)/\1/g' | tr $'\a' '\n'
myself. -
mirabilos over 9 yearsSorry, have to downvote though for using things that are not POSIX BASIC REGULAR EXPRESSIONS in sed(1), which is a GNUism.
-
Costas over 9 years@mirabilos Kindly ask you to indicate non-POSIX exptression in my script.
-
mirabilos over 9 yearsThere is no
+
or\+
in POSIX basic regular expressions. -
Costas over 9 years@mirabilos From
man grep
>**Basic vs Extended Regular Expressions** > In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; > instead use the backslashed versions \?, \+, \{, \|, (, and ). -
Stéphane Chazelas over 9 yearsNo,
-0
if for NUL-delimited records. Use-0777
to slurp the entire file in memory (which you don't need to here). -
Stéphane Chazelas over 9 years@Costas, that's GNU grep's man page. POSIX BRE spec are there. BRE equivalent of ERE
+
is\{1,\}
.[\n]
is not portable either.\n\{1,\}
would be POSIX. -
muru over 9 years@StéphaneChazelas So whats the best way to make Perl match the newline, other than reading the whole file in?
-
Costas over 9 years@StéphaneChazelas OK, if you'd like to be so old-school you are welcome to change
\+
to\{1,\}
-
Stéphane Chazelas over 9 yearsAlso, you can't have another command after a label.
: 1;x
is to define the1;x
label in POSIX seds. So you need:sed -e :1 -e 'N;$!b1' -e 's/\n\{1,\}\( *[^0-9]\)/\1/g'
. Also note that manysed
implementations have a small limit on the size of their pattern space (POSIX only guarantees 10 x LINE_MAX IIRC). -
Stéphane Chazelas over 9 yearsThe first argument to
printf
is the format. Useprintf "%s", $_
-
Costas over 9 years@StéphaneChazelas Yes, I am worry about space limitation too, even try to play with
P
andD
but I couldn't find acceptable solution -
terdon over 9 years@StéphaneChazelas why? I mean, I know it's cleaner and perhaps easier to understand but is there any danger that that would protect from?
-
Stéphane Chazelas over 9 yearsYes, it's wrong and potentially dangerous if the input may contain % characters. Try with an input with
%10000000000s
for instance. -
Stéphane Chazelas over 9 yearsIn C, that's a very well known very bad practice and vulnerability source. With
perl
,echo %.10000000000f | perl -ne printf
brings my machine to its knees. -
Stéphane Chazelas over 9 yearsSee the other answers that process the file line by line.
-
terdon over 9 years@StéphaneChazelas wow, yes. Mine too. Fair enough then, answer edited and thanks.
-
Stéphane Chazelas over 9 yearsITYM
END{print ""}
. Alternative:awk -v ORS= 'NR>1 && /^[0-9]{8}/{print "\n"};1;END{print "\n"}'
-
muru over 9 years@AvinashRaj Yes, it should be more efficient, but produces wrong results if non-log lines include blank ones?
-
muru over 9 years@StéphaneChazelas so there's no middle ground between "matching a newline" and "reading the whole file and the library next to it"?
-
mirabilos over 9 yearsThis will break if the line contains, say, a backslash and an
n
. It also strips whitespace. But you can usemksh
to do this:while IFS= read -r L; do [[ $L = [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]* ]] && print; print -nr -- "$L"; done; print
-
rook over 9 yearsOf course it is not for everything algorithm, but solution for the requirements provided by the task. Of course the final solution will be more complex and less readable at a glance as it usually happens in Real Life :)
-
mirabilos over 9 yearsI agree, but I’ve learned the hard way to not assume too much about the OP ☺ especially if they replace the actual text by dummy text.