Python How to Import xlsx file using numpy

34,214

Solution 1

import pandas as pd
WS = pd.read_excel('ur.xlsx')
WS_np = np.array(WS)

Using pandas is simpler

Solution 2

The method's name is loadtxt, rather than loadtext. That explains the error that you report.

However, loadtxt won't be able to read an OpenXML .xlsx file. The .xlsx file is a binary format, and a rather complex one at that. You will need to use a module dedicated to reading such files in order to be able to read .xlsx files. For instance, xlrd and openpyxl can both read .xlsx files.

Depending on what your requirements are, it may be easier to supply a text file rather than a .xlsx file.

Solution 3

NumPy does not have any commmands to read Excel documents. Instead use openpyxl for OpenXML (Excel >= 2007) or xlrd for xls and xlsx as @David Heffernan suggests. You can use pip to install either. From the openpyxl documentation example:

>>> from openpyxl import load_workbook
>>> wb = load_workbook('First_Persons_PT.xlsx', read_only=True)
>>> print wb.sheetnames
['Sheet1', 'Sheet2', 'Sheet3']
>>> ws = wb.get_sheet_by_name('Sheet1')
>>> use_col = 0  # column index from each row to get value of
>>> x2 = np.array([r[use_col].value for r in ws.iter_rows()])

See my posts on reading Excel in Python.

Solution 4

Note that, as of Pandas version 1.2.0, the top answer now throws an exception for xlsx files because the default reader engine (xlrd) only support xls files (see https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html). A simple solution is to set the engine to be openpyxl (you'll have to pip/conda install it first):

import pandas as pd
import numpy as np
data = pd.read_excel('ur.xlsx',engine='openpyxl')
data_ar = np.array(data)
Share:
34,214
godofamerica
Author by

godofamerica

Updated on January 14, 2022

Comments

  • godofamerica
    godofamerica over 2 years

    I am having no trouble importing csv data using numpy, but keep getting an error for my xlsx file. How do I convert the xlsx file to csv or how to I import xlsx file to the x2 variable?

    from matplotlib import pyplot as pp
    import numpy as np
    
    #this creates a line graph comparing flight arrival time, arrival in queue, and processing time
    
    x,y = np.loadtxt ('LAX_flights.csv',
                    unpack = True,
                    usecols = (1,2),
                    delimiter = ',')
    
    print("Imported data set arrival time")
    
    x2 = np.loadtext ('First_Persons_PT.xlsx',
                   unpack = True,
                   usecols=(0))
    
    print ("Imported start of processing time")
    
    
    #y2=
    #print ("Imported final time when processed")
    
    pp.plot(x,y, 'g', linewidth = 1)
    #pp.plot(x2,y, 'y', linewidth = 1)
    pp.grid(b=True, which = 'major', color='0', linestyle='-')
    
    pp.title('Comparing Time of Arrival vs. Queue Arrival Time, Queue Finish Time')
    pp.ylabel('Arrival in queue (Green),Process Time (Yellow)')
    pp.xlabel('Time of arrival')
    
    pp.savefig('line_graph_comparison.png')
    

    Here is the error

    Imported data set arrival time
    Traceback (most recent call last):
      File "C:\Users\fkrueg1\Dropbox\forest_python_test\Graph_time_of_arrival.py", line 13, in <module>
        x2 = np.loadtext ('First_Persons_PT.xlsx',
    AttributeError: 'module' object has no attribute 'loadtext'
    

    The xlsx is just a single column of about 100 numbers