How to open an .xlsb file in R?
Solution 1
Use the RODBC
package:
library(RODBC)
wb <- "D:\\Data\\Masked Data.xlsb" # Give the file name
con2 <- odbcConnectExcel2007(wb)
data <- sqlFetch(con2, "Sheet1$") # Provide name of sheet
nrow(data)
Solution 2
One way could be to use ODBC:
require(RODBC)
if (any(grepl("*.xlsb", odbcDataSources(), fixed = TRUE))) {
download.file(url = "http://phx.corporate-ir.net/External.File?item=UGFyZW50SUQ9NTcwMjI1fENoaWxkSUQ9MjcxMjIxfFR5cGU9MQ==&t=1",
destfile = file.path(tempdir(), "test.xlsb"),
mode = "wb")
conn <- odbcConnectExcel2007( file.path(tempdir(), "test.xlsb"))
df <- sqlFetch(conn, sub("'(.*)\\$'", "\\1", sqlTables(conn)$TABLE_NAME)[4]) # read 4th sheet in the table name list
head(df, 10)
# F1 F2 F3 F4 F5 F6
# 1 <NA> <NA> <NA> <NA> <NA> NA
# 2 <NA> <NA> <NA> <NA> <NA> NA
# 3 <NA> <NA> <NA> <NA> <NA> NA
# 4 <NA> <NA> <NA> <NA> <NA> NA
# 5 Baker Hughes Gulf of Mexico Oil / Gas Split <NA> <NA> <NA> <NA> NA
# 6 <NA> <NA> <NA> <NA> <NA> NA
# 7 <NA> US Offshore Total\nGoM Gas\nGoM Oil \nGoM NA
# 8 1/7/00 127 123 116 7 NA
# 9 1/14/00 125 121 116 5 NA
# 10 1/21/00 125 121 116 5 NA
close(conn)
}
Solution 3
readxlsb
package can read Excel binary (.xlsb) files into R. Here are some info taken from the package vignettes:
read_xlsb(path, sheet, range, col_names, col_types, na, trim_ws, skip, ...)
sheet:
Either a name, or the index of the sheet to read. Index of the first sheet is 1. If the sheet name is embedded in the range argument, or implied if range is a named range, then this argument is ignored
range:
range can be specified as
- A named range. Named ranges are not case sensitive
- In Sheet!A1 notation
- In Sheet!R1C1 notation
- As a cellranger::cell_limits object
col_names
- TRUE: The first row is used for column names. Empty cells result in a column name of the form ‘column.i’
- FALSE: Column names will be ‘column.i’
- Character vector: vector containing column names.
col_types
Can be implied from the spreadsheet or specified in advanced. When specifying types, options are
- “logical” (or “boolean”), “numeric” (or “double”), “integer”, “date” and “string” (or “character”)
- Use “skip” (or “ignore”) to skip a column
na
A character string that is interpret as NA. This does not effect the implied data type for a column.
trim_ws
Should leading and trailing whitespaces be trimmed from character strings?
skip
The number of rows to skip before reading data.
library(readxlsb)
res = read_xlsb(path = system.file("extdata", "TestBook.xlsb", package = "readxlsb"),
range = "PORTFOLIO",
debug = TRUE)
ls(res$env)
#> [1] "content" "named_ranges" "sheets" "stream"
res$env$named_ranges
#> name range sheet_idx first_column first_row
#> 1 INFO_RELEASE FirstSheet!$A$11 0 1 11
#> 2 OUTLOOK 'My SecondTab'!$A$1:$C$13 1 1 1
#> 3 PORTFOLIO FirstSheet!$A$3:$C$9 0 1 3
#> 4 SAVED_DATETIME FirstSheet!$C$13 0 3 13
#> 5 TITLE FirstSheet!$A$1 0 1 1
#> last_column last_row
#> 1 1 11
#> 2 3 13
#> 3 3 9
#> 4 3 13
#> 5 1 1
Created on 2020-07-07 by the reprex package (v0.3.0)
Solution 4
If you get the following error in R trying to connect to .xlsb
:
[RODBC] ERROR: state IM002, code 0, message [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified
then, you are probably are missing to install the AccessDatabaseEngine_X64.exe
from Microsoft. I had this problem today, and after installing this file I've got no more error messages.
Related videos on Youtube
user2946746
Updated on March 13, 2021Comments
-
user2946746 over 3 years
I'm trying to open an .xlsb file in R and keep getting similar errors.
Any recommendations on how to solve this issue without having to download the data and save it in a different formate?
read.xlsx(paste(OutputLocation,FileName, sep=""), sheetName = "Master Data")
Error messages: Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : org.apache.poi.xssf.XLSBUnsupportedException: .XLSB Binary Workbooks are not supported
rigDataWB<-loadWorkbook(paste(OutputLocation,FileName, sep=""))
Error messages: Error: XLSBUnsupportedException (Java): .XLSB Binary Workbooks are not supported
Please note:
I cannot install Perl libraries.
I'm running 64bit R.
Reference: http://www.milanor.net/blog/?p=779
My data is from: http://phx.corporate-ir.net/phoenix.zhtml?c=79687&p=irol-reportsother
-
hubbs5 over 7 yearsJust a quick note, if you're getting an error, update your Windows drivers here and the solution works.
-
tchakravarty almost 4 yearsThis package is eye-wateringly slow for XLSB files of any meaningful size, and often does not complete successfully. Not that there is a real packaged alternative, but users should be mindful of this when they write code that depends on
readxlsb
. -
Adnan Hajizada almost 4 yearsThis package doesn't work and freezes the R Studio.
-
Ruli over 3 yearsCould you add an explanation on how your code works?
-
user2957945 almost 3 yearsHi. Is this answer obsolete now? There doesn't seem to be a
odbcConnectExcel2007
inRODBC
. Or is this Windows only? -
user2957945 almost 3 yearsThis is the only answer that worked for me on linux. Very slow for 100k records but got there.
-
Duffer over 2 yearsThis package has very limited features, for instance I cant skip NA or Blank cells while reading.
-
Tung over 2 years@Duffer: you can try Python packages (pyxlsb or pyxlsb2 via reticulate) to work around some of the limitations of the
readxlsb
package. I haven't tested them though.