Pandas: Concatenate files but skip the headers except the first file
13,746
Solution 1
I think you need numpy.concatenate
with DataFrame
constructor:
df = pd.DataFrame(np.concatenate([df1.values, df2.values, df3.values]), columns=df1.columns)
Another solution is replace columns names in df2
and df3
:
df2.columns = df1.columns
df3.columns = df1.columns
df = pd.concat([df1,df2,df3], ignore_index=True)
Samples:
np.random.seed(100)
df1 = pd.DataFrame(np.random.randint(10, size=(2,3)), columns=list('ABF'))
print (df1)
A B F
0 8 8 3
1 7 7 0
df2 = pd.DataFrame(np.random.randint(10, size=(1,3)), columns=list('ERT'))
print (df2)
E R T
0 4 2 5
df3 = pd.DataFrame(np.random.randint(10, size=(3,3)), columns=list('HTR'))
print (df3)
H T R
0 2 2 2
1 1 0 8
2 4 0 9
print (np.concatenate([df1.values, df2.values, df3.values]))
[[8 8 3]
[7 7 0]
[4 2 5]
[2 2 2]
[1 0 8]
[4 0 9]]
df = pd.DataFrame(np.concatenate([df1.values, df2.values, df3.values]), columns=df1.columns)
print (df)
A B F
0 8 8 3
1 7 7 0
2 4 2 5
3 2 2 2
4 1 0 8
5 4 0 9
df = pd.concat([df1,df2,df3], ignore_index=True)
print (df)
A B F
0 8 8 3
1 7 7 0
2 4 2 5
3 2 2 2
4 1 0 8
5 4 0 9
Solution 2
You have to use argument skip_rows
of read_csv
for second and third lines like here:
import pandas
df1 = pandas.read_csv('path1')
df2 = pandas.read_csv('path2', skiprows=1)
df3 = pandas.read_csv('path3', skiprows=1)
df = pandas.concat([df1,df2,df3])
Solution 3
Been working on this recently myself, here's the most compact/elegant thing I came up with:
import pandas as pd
frame_list=[df1, df2, df3]
frame_mod=[frame_list[i].iloc[0:] for i in range(0,len(frame_list))]
frame_frame=pd.concat(frame_mod)
Related videos on Youtube

Author by
MCG Code
Updated on July 28, 2022Comments
-
MCG Code 10 months
I have 3 files representing the same dataset split in 3 and I need to concatenate:
import pandas df1 = pandas.read_csv('path1') df2 = pandas.read_csv('path2') df3 = pandas.read_csv('path3') df = pandas.concat([df1,df2,df3])
But this will keep the headers in the middle of the dataset, I need to remove the headers (column names) from the 2nd and 3rd file. How do I do that?
-
MCG Code almost 6 yearsYou're right, I check the skipping of the line but not the concatenation. Definitely the skiprows code is not the right one, the dataset should have 23 columns it has almost 3 times that.
-
-
MCG Code almost 6 yearsI agree with Jezrael, the concatenation duplicates the columns, as many times as the files. I was a bit too fast, I was happy to see the first line disappear, but didn't check on the right that column numbers became huge
-
MCG Code almost 6 yearsDo you have any idea why is it required to add df2.columns = df1.columns if the files already have identical headers?
-
jezrael almost 6 yearsIf columns are identical, then your solution should works perfectly - concat align data by columns.
-
MCG Code almost 6 yearsYour code works perfectly. I'm just wandering why pandas insists on me making the df2.columns = df1.columns before using ignore_index=True
-
jezrael almost 6 yearsI think your columns names are different, so need my solution. But if columns names are same, then need only
df = pd.concat([df1,df2,df3], ignore_index=True)
. -
MCG Code almost 6 yearsThe columns are identical I check it with all(df2.columns == df1.columns) and is returns True. But when I run the line df = pd.concat([df1,df2,df3], ignore_index=True) it just duplicates the columns, only when I use your full code (incl the replacement of columns) that it works
-
jezrael almost 6 yearsI think your solumns has to be different, maybe something like
0
as number and0
like string. I have same issue withall
some times before and very long time looking for problem. Because if duplicate columns it seems cannot allign so all duplicates column names are different some way. -
jezrael almost 6 yearsmaybe help check
print (df.columns.tolist())