Pandas query function not working with spaces in column names
Solution 1
From pandas 0.25
onward you will be able to escape column names with backticks so you can do
a.query('`a b` == 5')
Solution 2
Pandas 0.25+
As described here:
DataFrame.query()
andDataFrame.eval()
now supports quoting column names with backticks to refer to names with spaces (GH6508)
So you can use:
a.query('`a b`==5')
Pandas pre-0.25
You cannot use pd.DataFrame.query
if you have whitespace in your column name. Consider what would happen if you had columns named a
, b
and a b
; there would be ambiguity as to what you require.
Instead, you can use pd.DataFrame.loc
:
df = df.loc[df['a b'] == 5]
Since you are only filtering rows, you can omit .loc
accessor altogether:
df = df[df['a b'] == 5]
Solution 3
It is not possible yet. Check github issue #6508:
Note that in reality .query is just a nice-to-have interface, in fact it has very specific guarantees, meaning its meant to parse like a query language, and not a fully general interface.
Reason is for query
need string to be a valid python expression, so column names must be valid python identifiers.
Solution is boolean indexing
:
df = df[df['a b'] == 5]
Solution 4
I am afraid that the query method does not accept column name with empty space. In any case you can query the dataframe in this way:
import pandas as pd
a = pd.DataFrame({'a b':[1,2,3,4], 'c':[5,6,7,8]})
a[a['a b']==1]
Related videos on Youtube
Bhushan Pant
Love building exciting products. Currently working as Research Scientist-Natural Language Processing in a Fintech Startup.
Updated on July 09, 2022Comments
-
Bhushan Pant almost 2 years
I have a dataframe with spaces in column names. I am trying to use
query
method to get the results. It is working fine with 'c' column but getting error for 'a b'import pandas as pd a = pd.DataFrame(columns=["a b", "c"]) a["a b"] = [1,2,3,4] a["c"] = [5,6,7,8] a.query('a b==5')
For this I am getting this error:
a b ==5 ^ SyntaxError: invalid syntax
I don't want to fill up space with other characters like '_' etc.
There is one hack using pandasql to put variable name inside brackets example: [a b]
-
Pablo Adames almost 4 yearsSimple and easy, great solution. Thanks!
-
Alexey K. over 3 yearsalso if a column has a string number '1', we need ``: a.query(' ` 1 ` == 5')