How can I cleanly normalize data and then "unnormalize" it later?
All the scalers in sklearn.preprocessing
have inverse_transform
method designed just for that.
For example, to scale and un-scale your DataFrame
with MinMaxScaler
you could do:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaled = scaler.fit_transform(df)
unscaled = scaler.inverse_transform(scaled)
Just bear in mind that the transform
function (and fit_transform
as well) return a numpy.array
, and not a pandas.Dataframe
.
maxbfuer
Updated on June 16, 2022Comments
-
maxbfuer almost 2 years
I am using Anaconda with a Tensorflow neural network. Most of my data is stored with
pandas
.
I am attempting to predict cryptocurrency markets. I am aware that this lots of people are probably doing this and it is most likely not going to be very effective, I'm mostly doing it to familiarize myself with Tensorflow and Anaconda tools.
I am fairly new to this, so if I am doing something wrong or suboptimally please let me know.Here is how I aquire and handle the data:
- Download datasets from quandl.com into pandas
DataFrames
- Select the desired columns from each downloaded dataset
- Concatenate the
DataFrames
- Drop all NaNs from the new, merged
DataFrame
- Normalize each column (independently) to
0.0-1.0
in the newDataFrame
using the codedf = (df - df.min()) / (df.max() - df.min())
- Feed the normalized data into my neural network
- Unnormalize the data (This is the part that I haven't implemented)
Now, my question is, how can I cleanly normalize and then unnormalize this data? I realize that if I want to unnormalize data, I'm going to need to store the initial
df.min()
anddf.max()
values, but this looks ugly and feels cumbersome.
I am aware that I can normalize data withsklearn.preprocessing.MinMaxScaler
, but as far as I know I can't unnormalize data using this.It might be that I'm doing something fundamentally wrong here, but I'll be very surprised if there isn't a clean way to normalize and unnormalize data with Anaconda or other libraries.
-
Robbie about 7 yearsIt's impossible to unnormalise without storing the minimum and maximum values. I'd wrap up the normalisation in a function and return the max and min (as well as normalised data) to use later.
-
maxbfuer about 7 years@Robbie That's what I was planning on doing, it just seems strange that something like this isn't implemented. Am I approaching this wrong? Should I even be normalizing? I am using this network for cryptocurrency market analysis
-
Robbie about 7 yearsYou don't have to normalise data to use it in neural network, though it is done for various reasons (see faqs.org/faqs/ai-faq/neural-nets/part2).
-
maxbfuer about 7 years@Robbie Thanks, lots of valuable information there.
- Download datasets from quandl.com into pandas