Read Excel file from a URL using the readxl package

17,292

Solution 1

This works for me on Windows:

library(readxl)
library(httr)
packageVersion("readxl")
# [1] ‘0.1.1’

GET(url1, write_disk(tf <- tempfile(fileext = ".xls")))
df <- read_excel(tf, 2L)
str(df)
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 20131 obs. of  8 variables:
# $ Code                        : chr  "C115388" "C115800" "C115801" "C115802" ...
# $ Codelist Code               : chr  NA "C115388" "C115388" "C115388" ...
# $ Codelist Extensible (Yes/No): chr  "No" NA NA NA ...
# $ Codelist Name               : chr  "6 Minute Walk Functional Test Test Code" "6 Minute Walk Functional Test Test Code" "6 Minute Walk Functional Test Test Code" "6 Minute Walk Functional Test Test Code" ...
# $ CDISC Submission Value      : chr  "SIXMW1TC" "SIXMW101" "SIXMW102" "SIXMW103" ...
# $ CDISC Synonym(s)            : chr  "6 Minute Walk Functional Test Test Code" "SIXMW1-Distance at 1 Minute" "SIXMW1-Distance at 2 Minutes" "SIXMW1-Distance at 3 Minutes" ...
# $ CDISC Definition            : chr  "6 Minute Walk Test test code." "6 Minute Walk Test - Distance at 1 minute." "6 Minute Walk Test - Distance at 2 minutes." "6 Minute Walk Test - Distance at 3 minutes." ...
# $ NCI Preferred Term          : chr  "CDISC Functional Test 6MWT Test Code Terminology" "6MWT - Distance at 1 Minute" "6MWT - Distance at 2 Minutes" "6MWT - Distance at 3 Minutes" ...

Solution 2

From this issue on Github (#278):

some functionality for supporting more general inputs will be pulled out of readr, at which point readxl can exploit that.

So we should be able to pass urls directly to read_excel() in the (hopefully near) future.

Solution 3

use rio R package. link. Here a reprex:

library(tidyverse)
library(rio)
url <- 'https://evs.nci.nih.gov/ftp1/CDISC/SDTM/SDTM%20Terminology.xls'
rio::import(file = url,which = 2) %>% 
  glimpse()
#> 
#> Rows: 30,995
#> Columns: 8
#> $ Code                           <chr> "C141663", "C141706", "C141707"...
#> $ `Codelist Code`                <chr> NA, "C141663", "C141663", "C141...
#> $ `Codelist Extensible (Yes/No)` <chr> "No", NA, NA, NA, "No", NA, NA,...
#> $ `Codelist Name`                <chr> "4 Stair Ascend Functional Test...
#> $ `CDISC Submission Value`       <chr> "A4STR1TC", "A4STR101", "A4STR1...
#> $ `CDISC Synonym(s)`             <chr> "4 Stair Ascend Functional Test...
#> $ `CDISC Definition`             <chr> "4 Stair Ascend test code.", "4...
#> $ `NCI Preferred Term`           <chr> "CDISC Functional Test 4 Stair ...

Solution 4

A simpler solution is using the openxlsx package. Here is an example, which can be adapted to your needs:

library(openxlsx)
df = read.xlsx("https://archive.ics.uci.edu/ml/machine-learning-databases/00242/ENB2012_data.xlsx",sheet=1)
Share:
17,292

Related videos on Youtube

userJT
Author by

userJT

user of SE

Updated on September 16, 2022

Comments

  • userJT
    userJT over 1 year

    Consider a file on the internet (like this one (note the s in https) https://evs.nci.nih.gov/ftp1/CDISC/SDTM/SDTM%20Terminology.xls

    How can the sheet 2 of the file be read into R?

    The following code is approximation of what is desired (but fails)

    url1<-'https://evs.nci.nih.gov/ftp1/CDISC/SDTM/SDTM%20Terminology.xls'
    p1f <- tempfile()
    download.file(url1, p1f, mode="wb")
    p1<-read_excel(path = p1f, sheet = 2)
    
    • userJT
      userJT over 7 years
      per this link even download.file() should not be necessary but I can't make it work. github.com/hadley/readxl/pull/77
    • IRTFM
      IRTFM over 7 years
      I don't think read_excel is capable of handling Excel workbook files that do not have a .xls extension.
  • Gabriel J. Odom
    Gabriel J. Odom over 3 years
    As of August 2020, this issue is still open. Subsequently, read_excel() will not yet read .xls files directly from the web.
  • panuffel
    panuffel over 3 years
    Works like a dream! Thanks :)