Creating Pandas Dataframe between two Numpy arrays, then draw scatter plot

117,823

Solution 1

There are a number of ways to create DataFrames. Given 1-dimensional column vectors, you can create a DataFrame by passing it a dict whose keys are column names and whose values are the 1-dimensional column vectors:

import numpy as np
import pandas as pd
x = np.random.randn(5)
y = np.sin(x)
df = pd.DataFrame({'x':x, 'y':y})
df.plot('x', 'y', kind='scatter')

Solution 2

Complementing, you can use pandas Series, but the DataFrame must have been created.

import numpy as np
import pandas as pd

x = np.linspace(0,2*np.pi)
y = np.sin(x)

#df = pd.DataFrame()
#df['X'] = pd.Series(x)
#df['Y'] = pd.Series(y)

# You can MIX
df = pd.DataFrame({'X':x})
df['Y'] = pd.Series(y) 

df.plot('X', 'Y', kind='scatter')

This is another way that might help

import numpy as np
import pandas as pd

x = np.linspace(0,2*np.pi)
y = np.sin(x)

df = pd.DataFrame(data=np.column_stack((x,y)),columns=['X','Y'])

And also, I find the examples from karlijn (DatacCamp) very helpful

import numpy as np
import pandas as pd

TAB = np.array([[''     ,'Col1','Col2'],
                 ['Row1' ,   1  ,   2  ],
                 ['Row2' ,   3  ,   4  ],
                 ['Row3' ,   5 ,   6  ]])

dados = TAB[1:,1:]
linhas = TAB[1:,0]
colunas = TAB[0,1:]

DF = pd.DataFrame(
    data=dados,
    index=linhas,
    columns=colunas
)

print('\nDataFrame:', DF)

Solution 3

In order to do what you want, I wouldn't use the DataFrame plotting methods. I'm also a former experimental physicist, and based on experience with ROOT I think that the Python analog you want is best accomplished using matplotlib. In matplotlib.pyplot there is a method, hist2d(), which will give you the kind of heat map you're looking for.

As for creating the dataframe, an easy way to do it is:

df=pd.DataFrame({'x':x, 'y':y})
Share:
117,823

Related videos on Youtube

n3utrino
Author by

n3utrino

Updated on July 26, 2020

Comments

  • n3utrino
    n3utrino almost 4 years

    I'm relatively new with numpy and pandas (I'm an experimental physicist so I've been using ROOT for years...). A common plot in ROOT is a 2D scatter plot where, given a list of x- and y- values, makes a "heatmap" type scatter plot of one variable versus the other.

    How is this best accomplished with numpy and Pandas? I'm trying to use the Dataframe.plot() function, but I'm struggling to even create the Dataframe.

    import numpy as np
    import pandas as pd
    x = np.random.randn(1,5)
    y = np.sin(x)
    df = pd.DataFrame(d)
    

    First off, this dataframe has shape (1,2), but I would like it to have shape (5,2). If I can get the dataframe the right shape, I'm sure I can figure out the DataFrame.plot() function to draw what I want.