Combine multiple files with different count values
177
Solution 1
you can try (if, I understood correctly)
awk '!($1 in d){d[$1]=$2; next}
{d[$1]+=$2}
END{for(key in d) print key, d[key]; }' *.sam
you get:
__too_low_aQual 3 mir-671 19 mir-8 2 __not_aligned 1 Y_RNA 4 mir-10 5
Solution 2
this script will do what you need
t0=mktemp; touch t0;
for f in prefix*.csv;
do paste t0 <(cut -d" " -f2 $f) > t1 && mv t1 t0;
done;
tr '\t' ' ' <t0 && rm t0
use cut/paste to collect second columns in a temp file; when done remove the temp file after printing results.
To preserve the first column change touch t0
to cut -d" " -f1 oneofthefiles.csv > t0
Alternatively, awk to the rescue!
awk '
{a[FNR]=a[FNR]?a[FNR] OFS $2:$1}
END{for(i=1;i<=FNR;i++) print a[i]}
' prefix*.csv
combine second field from all files based on the line number, preserve the first field from first file.
Related videos on Youtube
Author by
BioMan
Updated on November 29, 2022Comments
-
BioMan over 1 year
I would like to combine 96 files by taking the second column from each files and keep the first column which is similar between all files. I tried to do this in R, but figued it would be better in the terminal. Does it work using awk?
Sample data:
DMED7013:Rfam robinm$ head Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput402R.sam Seq_../trimmed/402R.tally.fasta __not_aligned __too_low_aQual 3 mir-10 5 Y_RNA 4 __too_low_aQual 0 __too_low_aQual 0 __not_aligned 1 mir-8 2 mir-671 3 mir-671 16
The files:
DMED7013:Rfam robinm$ ls -l -rw-r--r-- 1 robinm staff 1711388 Sep 22 19:12 Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput100G.sam -rw-r--r-- 1 robinm staff 1712778 Sep 22 19:12 Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput100R.sam -rw-r--r-- 1 robinm staff 1709703 Sep 22 19:12 Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput106G.sam -rw-r--r-- 1 robinm staff 1707486 Sep 22 19:12 Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput106R.sam -rw-r--r-- 1 robinm staff 1704757 Sep 22 19:12 Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput122G.sam -rw-r--r-- 1 robinm staff 1705471 Sep 22 19:12 Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput122R.sam .....
-
sat over 8 yearswhere is
1
st and2
nd column? sample data? -
Jose Ricardo Bustos M. over 8 yearswhich is your expected output?
-
BioMan over 8 yearsColumn 2 is the counts. Columns 1 is __too_low_aQual, mir-10 etc....There is no header for column 2
-
karakfa over 8 yearsAre the number of lines and the order the same in all files? Since the first field is not unique it's not possible to join them 1 to 1 otherwise.
-
BioMan over 8 yearsyes, number of lines and the order are the same
-
Ed Morton over 8 yearsWith every question it helps us to see testable sample input and expected output. If you want us to help you combine columns from multiple files then you need to provide multiple input files so that would mean at least 2, right? Maybe 3 would be better/necessary to show various use cases? Then you'd provide the one output file that you'd want to see given those input files. THEN we can start working on helping you solve your problem because before that we're just guessing about several aspects of what you want and we can't test a solution. Get rid of the directory listing, it's irrelevant.
-
-
George Udosen over 7 yearsDoes this answer the question ?