How to make a matrix out of existing xyz data

14,621

Depending on whether you're generating z or not, you have at least two different options.

If you're generating z (e.g. you know the formula for it) it's very easy (see method_1() below).

If you just have just a list of (x,y,z) tuples, it's harder (see method_2() below, and maybe method_3()).

Constants

# min_? is minimum bound, max_? is maximum bound, 
#   dim_? is the granularity in that direction
min_x, max_x, dim_x = (-10, 10, 100)
min_y, max_y, dim_y = (-10, 10, 100)

Method 1: Generating z

# Method 1:
#   This works if you are generating z, given (x,y)
def method_1():
    x = np.linspace(min_x, max_x, dim_x)
    y = np.linspace(min_y, max_y, dim_y)

    X,Y = np.meshgrid(x,y)

    def z_function(x,y):
        return math.sqrt(x**2 + y**2)

    z = np.array([z_function(x,y) for (x,y) in zip(np.ravel(X), np.ravel(Y))])
    Z = z.reshape(X.shape)

    plt.pcolormesh(X,Y,Z)
    plt.show()

Which generates the following graph:

method_1

This is relatively easy, since you can generate z at whatever points you want.

If you don't have that ability, and are given a fixed (x,y,z). You could do the following. First, I define a function that generates fake data:

def gen_fake_data():
    # First we generate the (x,y,z) tuples to imitate "real" data
    # Half of this will be in the + direction, half will be in the - dir.
    xy_max_error = 0.2

    # Generate the "real" x,y vectors
    x = np.linspace(min_x, max_x, dim_x)
    y = np.linspace(min_y, max_y, dim_y)

    # Apply an error to x,y
    x_err = (np.random.rand(*x.shape) - 0.5) * xy_max_error
    y_err = (np.random.rand(*y.shape) - 0.5) * xy_max_error
    x *= (1 + x_err)
    y *= (1 + y_err)

    # Generate fake z
    rows = []
    for ix in x:
        for iy in y:
            z = math.sqrt(ix**2 + iy**2)
            rows.append([ix,iy,z])

    mat = np.array(rows)
    return mat

Here, the returned matrix looks like:

mat = [[x_0, y_0, z_0],
       [x_1, y_1, z_1],
       [x_2, y_2, z_2],
       ...
       [x_n, y_n, z_n]]

Method 2: Interpolating given z points over a regular grid

# Method 2:
#   This works if you have (x,y,z) tuples that you're *not* generating, and (x,y) points 
#   may not fall evenly on a grid.
def method_2():
    mat = gen_fake_data()

    x = np.linspace(min_x, max_x, dim_x)
    y = np.linspace(min_y, max_y, dim_y)

    X,Y = np.meshgrid(x, y)

    # Interpolate (x,y,z) points [mat] over a normal (x,y) grid [X,Y]
    #   Depending on your "error", you may be able to use other methods
    Z = interpolate.griddata((mat[:,0], mat[:,1]), mat[:,2], (X,Y), method='nearest')

    plt.pcolormesh(X,Y,Z)
    plt.show()

This method produces the following graphs:

error = 0.2 method_2(err=0.2)

error = 0.8 method_2(err=0.8

Method 3: No Interpolation (constraints on sampled data)

There's a third option, depending on how your (x,y,z) is set up. This option requires two things:

  1. The number of different x sample positions equals the number of different y sample positions.
  2. For every possible unique (x,y) pair, there is a corresponding (x,y,z) in your data.

From this, it follows that the number of (x,y,z) pairs must be equal to the square of the number of unique x points (where the number of unique x positions equals the number of unique y positions).

In general, with sampled data, this will not be true. But if it is, you can avoid having to interpolate:

def method_3():
    mat = gen_fake_data()

    x = np.unique(mat[:,0])
    y = np.unique(mat[:,1])

    X,Y = np.meshgrid(x, y)

    # I'm fairly sure there's a more efficient way of doing this...
    def get_z(mat, x, y):
        ind = (mat[:,(0,1)] == (x,y)).all(axis=1)
        row = mat[ind,:]
        return row[0,2]

    z = np.array([get_z(mat,x,y) for (x,y) in zip(np.ravel(X), np.ravel(Y))])
    Z = z.reshape(X.shape)

    plt.pcolormesh(X,Y,Z)
    plt.xlim(min(x), max(x))
    plt.ylim(min(y), max(y))
    plt.show()

error = 0.2 method_3(err=0.2)

error = 0.8 method_3(err=0.8)

Share:
14,621
J.A.Cado
Author by

J.A.Cado

Marine Ecologist, working at Knowledge institute in the Netherlands. Our focus and expertise revolves around water management and water protection. I use QGIS to make B-E-A tiful maps of coastal areas...or atleast, I try.

Updated on June 04, 2022

Comments

  • J.A.Cado
    J.A.Cado about 2 years

    I want to use matplotlib.pyplot.pcolormesh to plot a depth plot.

    What I have is a xyz file Three columns i.e. x(lat), y(lon), z(dep).

    All columns are of equal length

    pcolormesh require matrices as input. So using numpy.meshgrid I can transform the x and y into matrices:

    xx,yy = numpy.meshgrid(x_data,y_data)
    

    This works great...However, I don't know how to create Matrix of my depth (z) data... How do I create a matrix for my z_data that corresponds to my x_data and y_data matrices?

  • J.A.Cado
    J.A.Cado almost 8 years
    Tnx ! I'll take a look at it. To answer your question all my data is fixed.
  • J.A.Cado
    J.A.Cado almost 8 years
    Hey, thanks again so much. I'm trying out option two...but I'm not sure how to generate fake data. I mean, I know how the syntax work, but should I generate it in any specific way? Could you elaborate just a bit more please?
  • jedwards
    jedwards almost 8 years
    @J.A.Cado sure, and sorry for the confusion. I generated fake data because I didn't have yours. You shouldn't need to, just use your real data. The only thing you need to look out for is that your data is arranged similarly (see the few lines right under "Here, the returned matrix looks like:").
  • J.A.Cado
    J.A.Cado almost 8 years
    holy cow I did it! That's amazing! Thanks so much :D Just a side question, I was working with pandas dataframes, and just loading in the column of the dataframe in scypy.griddata method. girddata((df['x'],df['y']),df['z'],(X,Y),method='linear')But that never worked, it gave me errors about dimensions not being correct. Any idea why that wouldn't work? cheers!
  • J.A.Cado
    J.A.Cado almost 8 years
    Option 2 worked for me I have my own data set, with actual data from the field. Length of columns (x,y,z) are all equal. and each (x,y,z) pair is unique,