Read text file data to pandas DataFrame
18,081
Yes, it is possible, but really data dependent:
- first
read_csv
with omit first3
rows and omit first whitespaces - omit trailing whitespaces in columns by
strip
- create column
TYPE
byextract
values between[]
and forward fill next rows - create helper column for distinguish each
DataFrame
bystartswith
andcumsum
- last remove by
contains
rows where first column starts with[
,--
or*
df = pd.read_csv(file, sep="!", skiprows=3, skipinitialspace=True)
df.columns = df.columns.str.strip()
df['TYPE'] = df['*BOHRKOPF'].str.extract('\[(.*)\]', expand=False).ffill()
df['G'] = df['*BOHRKOPF'].str.startswith('*').cumsum()
df = df[~df['*BOHRKOPF'].str.contains('^\[|^--|^\*')]
print (df)
*BOHRKOPF SPINDEL WK DELTA-X DELTA-Y DURCHMESSER KOMMENTAR \
2 A21 1 62 0.000 0.000 0.000 NaN
4 A12 -1 62 0.000 -160.000 0.000 NaN
5 A12 2 62 0.000 -128.000 3.000 70.0
6 A12 -3 62 0.000 -96.000 0.000 NaN
7 A12 4 62 0.000 -64.000 0.000 NaN
12 O11 -9 62 0.000 -96.000 0.000 NaN
13 O11 10 62 0.000 -128.000 5.000 70.0
TYPE G
2 NoValidForUse 0
4 V11 0
5 V11 0
6 V11 0
7 V11 0
12 V11 1
13 V11 1
and then filter by G
column:
df1 = df[df['G'] == 0].drop('G', axis=1)
print (df1)
*BOHRKOPF SPINDEL WK DELTA-X DELTA-Y DURCHMESSER KOMMENTAR \
2 A21 1 62 0.000 0.000 0.000 NaN
4 A12 -1 62 0.000 -160.000 0.000 NaN
5 A12 2 62 0.000 -128.000 3.000 70.0
6 A12 -3 62 0.000 -96.000 0.000 NaN
7 A12 4 62 0.000 -64.000 0.000 NaN
TYPE
2 NoValidForUse
4 V11
5 V11
6 V11
7 V11
df2 = df[df['G'] == 1].drop('G', axis=1)
print (df2)
*BOHRKOPF SPINDEL WK DELTA-X DELTA-Y DURCHMESSER KOMMENTAR TYPE
12 O11 -9 62 0.000 -96.000 0.000 NaN V11
13 O11 10 62 0.000 -128.000 5.000 70.0 V11
If in file is multiple DataFrames is possible use list comprehension
for list of DataFrames
:
dfs = [v.drop('G', axis=1) for k, v in df.groupby('G')]
print (dfs[0])
*BOHRKOPF SPINDEL WK DELTA-X DELTA-Y DURCHMESSER KOMMENTAR \
2 A21 1 62 0.000 0.000 0.000 NaN
4 A12 -1 62 0.000 -160.000 0.000 NaN
5 A12 2 62 0.000 -128.000 3.000 70.0
6 A12 -3 62 0.000 -96.000 0.000 NaN
7 A12 4 62 0.000 -64.000 0.000 NaN
TYPE
2 NoValidForUse
4 V11
5 V11
6 V11
7 V11
print (dfs[1])
*BOHRKOPF SPINDEL WK DELTA-X DELTA-Y DURCHMESSER KOMMENTAR TYPE
12 O11 -9 62 0.000 -96.000 0.000 NaN V11
13 O11 10 62 0.000 -128.000 5.000 70.0 V11
EDIT:
temp=u"""_MASCHINENNUMMER : >0-251-11-0950/51< SACHBEARB.: >BSTWIN32<
_PRODUKTSCHLUESSEL : >BST 500< DATUM : >05-20-2016<
---------------------------------------------------------------------------
*BOHRKOPF !SPINDEL!WK!DELTA-X !DELTA-Y !DURCHMESSER! KOMMENTAR
----------+----------+----------+----------+-----------+-------------------
[NoValidForUse]
A21 ! 1!62! 0.000! 0.000! 0.000!
[V11]
A12 ! -1!62! 0.000! -160.000! 0.000!
A12 ! 2!62! 0.000! -128.000! 3.000! 70.0
A12 ! -3!62! 0.000! -96.000! 0.000!
A12 ! 4!62! 0.000! -64.000! 0.000!
---------------------------------------------------------------------------
*BOHRKOPF ! !X-POS !Y-POS ! !
----------+----------+----------+----------+-----------+-------------------
[V11]
O11 ! ! 0.000! -96.000! !
O11 ! ! 0.000! -128.000! ! """
Add parameter header
for default columns names:
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep="!", skiprows=3, skipinitialspace=True, header=None)
df['TYPE'] = df[0].str.extract('\[(.*)\]', expand=False).ffill()
df['G'] = df[0].str.startswith('*').cumsum()
#dont remove rows start with *
df = df[~df[0].str.contains('^\[|^--')]
print (df)
0 1 2 3 4 5 \
0 *BOHRKOPF SPINDEL WK DELTA-X DELTA-Y DURCHMESSER
3 A21 1 62 0.000 0.000 0.000
5 A12 -1 62 0.000 -160.000 0.000
6 A12 2 62 0.000 -128.000 3.000
7 A12 -3 62 0.000 -96.000 0.000
8 A12 4 62 0.000 -64.000 0.000
10 *BOHRKOPF NaN X-POS Y-POS NaN NaN
13 O11 NaN 0.000 -96.000 NaN NaN
14 O11 NaN 0.000 -128.000 NaN NaN
6 TYPE G
0 KOMMENTAR NaN 1
3 NaN NoValidForUse 1
5 NaN V11 1
6 70.0 V11 1
7 NaN V11 1
8 NaN V11 1
10 NaN V11 2
13 NaN V11 2
14 NaN V11 2
For each loop remove column G
, rename all columns without last 2 by first row, remove first row by iloc
and last if necessary remove all columns fill NaN
s only by dropna
:
dfs = [v.drop('G', axis=1).rename(columns=v.iloc[0, :-2]).iloc[1:].dropna(axis=1, how='all') for k, v in df.groupby('G')]
print (dfs[0])
*BOHRKOPF SPINDEL WK DELTA-X DELTA-Y DURCHMESSER KOMMENTAR \
3 A21 1 62 0.000 0.000 0.000 NaN
5 A12 -1 62 0.000 -160.000 0.000 NaN
6 A12 2 62 0.000 -128.000 3.000 70.0
7 A12 -3 62 0.000 -96.000 0.000 NaN
8 A12 4 62 0.000 -64.000 0.000 NaN
TYPE
3 NoValidForUse
5 V11
6 V11
7 V11
8 V11
print (dfs[1])
*BOHRKOPF X-POS Y-POS TYPE
13 O11 0.000 -96.000 V11
14 O11 0.000 -128.000 V11
Related videos on Youtube
Author by
Arnoldas Bankauskas
Updated on June 04, 2022Comments
-
Arnoldas Bankauskas almost 2 years
I have specific file format from CNC (work center) data. saved like .txt . I want read this table to pandas dataframe but i never seen this format before.
_MASCHINENNUMMER : >0-251-11-0950/51< SACHBEARB.: >BSTWIN32< _PRODUKTSCHLUESSEL : >BST 500< DATUM : >05-20-2016< --------------------------------------------------------------------------- *BOHRKOPF !SPINDEL!WK!DELTA-X !DELTA-Y !DURCHMESSER! KOMMENTAR ----------+----------+----------+----------+-----------+------------------- [NoValidForUse] A21 ! 1!62! 0.000! 0.000! 0.000! [V11] A12 ! -1!62! 0.000! -160.000! 0.000! A12 ! 2!62! 0.000! -128.000! 3.000! 70.0 A12 ! -3!62! 0.000! -96.000! 0.000! A12 ! 4!62! 0.000! -64.000! 0.000! --------------------------------------------------------------------------- *BOHRKOPF !SPINDEL!WK!DELTA-X !DELTA-Y !DURCHMESSER! KOMMENTAR ----------+----------+----------+----------+-----------+------------------- [V11] O11 ! -9!62! 0.000! -96.000! 0.000! O11 ! 10!62! 0.000! -128.000! 5.000! 70.0
Questions: 1. Is it possible to read this and convert as pandas Dataframe? 2. Hou to do this ?
- why pandas dataFrame? I want this data use for some analysis by this characteristics of item. For analysis i always use pandas. Maybe for this i need do different ways ?
Expected outpu:
two pandas DataFrames first:
--------------------------------------------------------------------------------------- *BOHRKOPF !SPINDEL!WK!DELTA-X !DELTA-Y !DURCHMESSER! KOMMENTAR ! TYPE ----------+----------+----------+----------+-----------+------------------------------- A21 ! 1!62! 0.000! 0.000! 0.000! !NoValidForUse A12 ! -1!62! 0.000! -160.000! 0.000! !V11 A12 ! 2!62! 0.000! -128.000! 3.000! 70.0 !V11 A12 ! -3!62! 0.000! -96.000! 0.000! !V11 A12 ! 4!62! 0.000! -64.000! 0.000! !V11
And second:
--------------------------------------------------------------------------------------- *BOHRKOPF !SPINDEL!WK!DELTA-X !DELTA-Y !DURCHMESSER! KOMMENTAR ! TYPE ----------+----------+----------+----------+-----------+------------------------------- O11 ! -9!62! 0.000! -96.000! 0.000! !V11 O11 ! 10!62! 0.000! -128.000! 5.000! 70.0 !V11
Headers of Dataframe1 and dataframe2 can be different:
_MASCHINENNUMMER : >0-251-11-0950/51< SACHBEARB.: >BSTWIN32< _PRODUKTSCHLUESSEL : >BST 500< DATUM : >05-20-2016< --------------------------------------------------------------------------- *BOHRKOPF !SPINDEL!WK!DELTA-X !DELTA-Y !DURCHMESSER! KOMMENTAR ----------+----------+----------+----------+-----------+------------------- [NoValidForUse] A21 ! 1!62! 0.000! 0.000! 0.000! [V11] A12 ! -1!62! 0.000! -160.000! 0.000! A12 ! 2!62! 0.000! -128.000! 3.000! 70.0 A12 ! -3!62! 0.000! -96.000! 0.000! --------------------------------------------------------------------------- *BOHRKOPF ! !X-POS !Y-POS ! ! ----------+----------+----------+----------+-----------+------------------- [V11] O11 ! ! 0.000! -96.000! ! O11 ! ! 0.000! -128.000! !
- on file can be different number of dataframes between 5 and 10 but structure of file sesame separator "!" headers row starts whit "*"
-
jezrael about 6 yearsWhat is expected output?
-
Arnoldas Bankauskas about 6 yearsi add more info to post. :)
-
Arnoldas Bankauskas about 6 yearsThis solution will be good if headers always sesame like in line 4
-
jezrael about 6 years@ArnoldasBankauskas - I see edit. In each file are only 2 dataframes? Or more?
-
Arnoldas Bankauskas about 6 yearson file can be different number of dataframes between 5 and 10 but structure of file sesame separator "!" headers row starts whit "*"
-
Arnoldas Bankauskas about 6 yearsbrilliant now i understand how deal with this.