concat pandas DataFrame along timeseries indexes

12,383

Solution 1

It is possible to read the data with pandas and to concatenate it.

First import the data

In [449]: import pandas.io.data as web

In [450]: nab = web.get_data_yahoo('NAB.AX', start='2009-05-25',
                                   end='2009-06-05')[['Close', 'Volume']]

In [451]: cba = web.get_data_yahoo('CBA.AX', start='2009-05-26',
                                   end='2009-06-08')[['Close', 'Volume']]

In [453]: nab
Out[453]: 
            Close    Volume
Date                       
2009-05-25  21.15   9685100
2009-05-26  21.64   8541900
2009-05-27  21.74   9042900
2009-05-28  21.63   9701000
2009-05-29  22.02  14665700
2009-06-01  22.52   6782000
2009-06-02  22.80  10473400
2009-06-03  23.11   9931400
2009-06-04  22.21  17869000
2009-06-05  21.95   8214300

In [454]: cba
Out[454]: 
            Close    Volume
Date                       
2009-05-26  35.45   4529600
2009-05-27  35.13   4521500
2009-05-28  33.95   7945400
2009-05-29  35.14  12548500
2009-06-01  36.16   4509400
2009-06-02  36.33   4304900
2009-06-03  36.80   4845400
2009-06-04  36.79   4592300
2009-06-05  36.51   4417500
2009-06-08  36.51         0

Than concatenate it:

In [455]: keys = ['CBA.AX','NAB.AX']

In [456]: pd.concat([cba, nab], axis=1, keys=keys)
Out[456]: 
            CBA.AX            NAB.AX          
             Close    Volume   Close    Volume
Date                                          
2009-05-25     NaN       NaN   21.15   9685100
2009-05-26   35.45   4529600   21.64   8541900
2009-05-27   35.13   4521500   21.74   9042900
2009-05-28   33.95   7945400   21.63   9701000
2009-05-29   35.14  12548500   22.02  14665700
2009-06-01   36.16   4509400   22.52   6782000
2009-06-02   36.33   4304900   22.80  10473400
2009-06-03   36.80   4845400   23.11   9931400
2009-06-04   36.79   4592300   22.21  17869000
2009-06-05   36.51   4417500   21.95   8214300
2009-06-08   36.51         0     NaN       NaN

Solution 2

Try to join on outer.

When I am working with a number of stocks, I would usually have a frame titled "open high,low,close,etc" with column as a ticker. If you want one data structure, I would use Panels for this.

for Yahoo data, you can use pandas:

import pandas.io.data as data
spy = data.DataReader("SPY","yahoo","1991/1/1")
Share:
12,383

Related videos on Youtube

Matthew Brown
Author by

Matthew Brown

Internationally minded, independent and driven student at QUT Business School and part-time analyst at Innovare Group looking to develop his career in the Information Technology or Management Consulting space. I'd love to chat: twitter.com/mnbbrown, [email protected] or matthewbrown.io

Updated on September 14, 2022

Comments

  • Matthew Brown
    Matthew Brown over 1 year

    I have two largish (snippets provided) pandas DateFrames with unequal dates as indexes that I wish to concat into one:

               NAB.AX                                  CBA.AX
                Close    Volume                         Close    Volume
    Date                                    Date
    2009-06-05  36.51   4962900             2009-06-08  21.95         0
    2009-06-04  36.79   5528800             2009-06-05  21.95   8917000
    2009-06-03  36.80   5116500             2009-06-04  22.21  18723600
    2009-06-02  36.33   5303700             2009-06-03  23.11  11643800
    2009-06-01  36.16   5625500             2009-06-02  22.80  14249900
    2009-05-29  35.14  13038600   --AND--   2009-06-01  22.52  11687200
    2009-05-28  33.95   7917600             2009-05-29  22.02  22350700
    2009-05-27  35.13   4701100             2009-05-28  21.63   9679800
    2009-05-26  35.45   4572700             2009-05-27  21.74   9338200
    2009-05-25  34.80   3652500             2009-05-26  21.64   8502900
    

    Problem is, if I run this:

    keys = ['CBA.AX','NAB.AX']
    mv = pandas.concat([data['CBA.AX'][650:660],data['NAB.AX'][650:660]], axis=1, keys=stocks,) 
    

    the following DateFrame is produced:

                                     CBA.AX          NAB.AX        
                                  Close  Volume   Close  Volume
    Date                                                      
    2200-08-16 04:24:21.460041     NaN     NaN     NaN     NaN
    2203-05-13 04:24:21.460041     NaN     NaN     NaN     NaN
    2206-02-06 04:24:21.460041     NaN     NaN     NaN     NaN
    2208-11-02 04:24:21.460041     NaN     NaN     NaN     NaN
    2211-07-30 04:24:21.460041     NaN     NaN     NaN     NaN
    2219-10-16 04:24:21.460041     NaN     NaN     NaN     NaN
    2222-07-12 04:24:21.460041     NaN     NaN     NaN     NaN
    2225-04-07 04:24:21.460041     NaN     NaN     NaN     NaN
    2228-01-02 04:24:21.460041     NaN     NaN     NaN     NaN
    2230-09-28 04:24:21.460041     NaN     NaN     NaN     NaN
    2238-12-15 04:24:21.460041     NaN     NaN     NaN     NaN
    

    Does anybody have any idea why this might be the case?

    On another point: is there any python libraries around that pull data from yahoo and normalise it?

    Cheers.

    EDIT: For reference:

    data = {
    'CBA.AX': <class 'pandas.core.frame.DataFrame'>
        DatetimeIndex: 2313 entries, 2011-12-29 00:00:00 to 2003-01-01 00:00:00
        Data columns:
            Close     2313  non-null values
            Volume    2313  non-null values
        dtypes: float64(1), int64(1),
    
     'NAB.AX': <class 'pandas.core.frame.DataFrame'>
        DatetimeIndex: 2329 entries, 2011-12-29 00:00:00 to 2003-01-01 00:00:00
        Data columns:
            Close     2329  non-null values
            Volume    2329  non-null values
        dtypes: float64(1), int64(1)
    }
    
    • Wes McKinney
      Wes McKinney over 11 years
      What is your pandas version? This looks like a bug that was fixed in 0.8.1
    • Wes McKinney
      Wes McKinney over 11 years
      Can you e-mail me pickled versions of those DataFrames (wesmckinn AT gmail)? I'm not able to reproduce the issue here. Also check that you're using either NumPy 1.6.1 or a development version after 6/5/2012. Maybe best to move this discussion to GitHub