How to standard scale a 3D matrix?

25,757

Solution 1

You'll have to fit and store a scaler for each channel

from sklearn.preprocessing import StandardScaler

scalers = {}
for i in range(X_train.shape[1]):
    scalers[i] = StandardScaler()
    X_train[:, i, :] = scalers[i].fit_transform(X_train[:, i, :]) 

for i in range(X_test.shape[1]):
    X_test[:, i, :] = scalers[i].transform(X_test[:, i, :]) 

Solution 2

With only 3 line of code...

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train.reshape(-1, X_train.shape[-1])).reshape(X_train.shape)
X_test = scaler.transform(X_test.reshape(-1, X_test.shape[-1])).reshape(X_test.shape)

Solution 3

If you want to scale each feature differently, like StandardScaler does, you can use this:

import numpy as np
from sklearn.base import TransformerMixin
from sklearn.preprocessing import StandardScaler


class NDStandardScaler(TransformerMixin):
    def __init__(self, **kwargs):
        self._scaler = StandardScaler(copy=True, **kwargs)
        self._orig_shape = None

    def fit(self, X, **kwargs):
        X = np.array(X)
        # Save the original shape to reshape the flattened X later
        # back to its original shape
        if len(X.shape) > 1:
            self._orig_shape = X.shape[1:]
        X = self._flatten(X)
        self._scaler.fit(X, **kwargs)
        return self

    def transform(self, X, **kwargs):
        X = np.array(X)
        X = self._flatten(X)
        X = self._scaler.transform(X, **kwargs)
        X = self._reshape(X)
        return X

    def _flatten(self, X):
        # Reshape X to <= 2 dimensions
        if len(X.shape) > 2:
            n_dims = np.prod(self._orig_shape)
            X = X.reshape(-1, n_dims)
        return X

    def _reshape(self, X):
        # Reshape X back to it's original shape
        if len(X.shape) >= 2:
            X = X.reshape(-1, *self._orig_shape)
        return X

It simply flattens the features of the input before giving it to sklearn's StandardScaler. Then, it reshapes them back. The usage is the same as for the StandardScaler:

data = [[[0, 1], [2, 3]], [[1, 5], [2, 9]]]
scaler = NDStandardScaler()
print(scaler.fit_transform(data))

prints

[[[-1. -1.]
  [ 0. -1.]]

 [[ 1.  1.]
  [ 0.  1.]]]

The arguments with_mean and with_std are directly passed to StandardScaler and thus work as expected. copy=False won't work, since the reshaping does not happen inplace. For 2-D inputs, the NDStandardScaler works like the StandardScaler:

data = [[0, 0], [0, 0], [1, 1], [1, 1]]
scaler = NDStandardScaler()
scaler.fit(data)
print(scaler.transform(data))
print(scaler.transform([[2, 2]]))

prints

[[-1. -1.]
 [-1. -1.]
 [ 1.  1.]
 [ 1.  1.]]
[[3. 3.]]

just like in the sklearn example for StandardScaler.

Solution 4

An elegant way of doing this is using class Inheritance as follows:


from sklearn.preprocessing import MinMaxScaler
import numpy as np

class MinMaxScaler3D(MinMaxScaler):

    def fit_transform(self, X, y=None):
        x = np.reshape(X, newshape=(X.shape[0]*X.shape[1], X.shape[2]))
        return np.reshape(super().fit_transform(x, y=y), newshape=X.shape)

Usage:


scaler = MinMaxScaler3D()
X = scaler.fit_transform(X)

Solution 5

I used Normalization scheme for my spatio-temporal data having shape of (2500,512,642) --> (samples, timesteps, features/spatial-locations). The following code can be used for Normalization and its inverse also.

def Normalize_data(data):
    scaled_data = []
    max_values  = []
    min_values  = []
    for N in range(data.shape[0]):
        temp = []
        t1   = []
        t2   = []
        for i in range(data.shape[1]):
            max_val = np.max(data[N,i])
            min_val = np.min(data[N,i])
            norm = (data[N,i] - min_val)/(max_val - min_val)
            temp.append(norm)
            t1.append(max_val)
            t2.append(min_val)

        scaled_data.append(temp)
        max_values.append(t1)
        min_values.append(t2)
    return (np.array(scaled_data), np.array(max_values), np.array(min_values))

def InverseNormalize_data(scaled_data, max_values, min_values):
    res_data = []
    for N in range(scaled_data.shape[0]):
        temp = []
        for i in range(scaled_data.shape[1]):
            max_val = max_values[N,i]
            min_val = min_values[N,i]
            #print(max_val)
            #print(min_val)
            orig = (scaled_data[N,i] * (max_val - min_val)) + min_val
            temp.append(orig)
        res_data.append(temp)
    return np.array(res_data)
Share:
25,757

Related videos on Youtube

JPM
Author by

JPM

Updated on July 19, 2021

Comments

  • JPM
    JPM almost 3 years

    I am working on a signal classification problem and would like to scale the dataset matrix first, but my data is in a 3D format (batch, length, channels).
    I tried to use Scikit-learn Standard Scaler:

    from sklearn.preprocessing import StandardScaler
    sc = StandardScaler()
    X_train = sc.fit_transform(X_train)
    X_test = sc.transform(X_test)
    

    But I've got this error message:

    Found array with dim 3. StandardScaler expected <= 2

    I think one solution would be to split the matrix by each channel in multiples 2D matrices, scale them separately and then put back in 3D format, but I wonder if there is a better solution.
    Thank you very much.

  • DmytroSytro
    DmytroSytro almost 5 years
    It doesn't work. Shouldn't it be like this: for i in range(X_train.shape[1]):
  • Victor Sonck
    Victor Sonck almost 5 years
    No, I think it should be X_train[:, :, i] = scalers[i].fit_transform(X_train[:, :, i]). At least for me when my data is structured as (batch, samples, rows, columns)
  • Avv
    Avv almost 3 years
    Thank you. Does this work on pandas frame columns? I have over 291 columns, so how we can apply same thing on pandas frame please?
  • Avv
    Avv almost 3 years
    I have 291 columns in pandas dataframe, so I am wondering how we can apply same thing on pandas data frame please?
  • sugab
    sugab over 2 years
    True! this is elegant, shortest, simplest.