R: Painfully slow read performance using RODBC & SQL Server
Solution 1
I would try RJDBC http://cran.r-project.org/web/packages/RJDBC/RJDBC.pdf
with these drivers https://msdn.microsoft.com/en-us/sqlserver/aa937724.aspx
library(RJDBC)
drv <- JDBC("com.microsoft.sqlserver.jdbc.SQLServerDriver","/sqljdbc4.jar")
con <- dbConnect(drv, "jdbc:sqlserver://server.location", "username", "password")
dbGetQuery(con, "select column_name from table")
Solution 2
I would make sure that your R timezone - Sys.setenv(TZ='GMT')
set to GMT for example - is same as the time zone of the SQL server from where you are pulling data. It could be that the date column is taking a long time to be interpreted especially if it has a timestamp.
RJDBC will run quicker because it converts date to character and everything else to numeric. RODBC will try to preserve the data type of the SQL table.
Related videos on Youtube
Jayhawk
Updated on September 16, 2022Comments
-
Jayhawk over 1 year
I am new to R but am interested in using Shiny to create dynamic charts using data stored in a SQL Server database. To enable interactivity, I want to bring in the raw data from the database and perform calculations within
R
rather than have the database summarize the data.I am able to connect to the database using
RODBC
, execute a query, and receive results in adata.frame
. However, the read time inR
is about 12x longer than than the same query executed in SQL Server Management Studio (SSMS). SSMS takes ~600 ms, whereasR
takes about 7.6 seconds. My question is whether I am doing something wrong, or isR
just really slow with database access? And if so, are there faster alternatives (e.g. writing the database output to a file and reading the file)?Some information about the query that may help: The query retrieves about 250K rows with 4 columns. The first column is a date and the other three are numeric values. The machine running
R
and SSMS is a high-end Win 7 workstation with 32GB of memory. TheR
command that I am running is:system.time(df <- sqlQuery(cn, query))
which returns:
user system elapsed 7.17 0.01 7.58
Interestingly, it appears that the data transfer from SQL to my machine is fast, but that
R
is busy doing things internally for several seconds before returning thedata.frame
. I see this because network utilization spikes in the first second and almost immediately returns to near 0. Then several seconds later, theR
data.frame
returns. -
Jayhawk almost 9 yearsThank you! This absolutely solved the problem. Elapsed time dropped to 0.84 seconds as a result.
-
Rishi over 5 yearsCan we use RJDBC in windows to speed up the process?
-
Ken Yeoh over 5 yearsYes. JDBC (Java Database Connectivity) can be used on OSX/Linux/Windows
-
Dorian Grv over 5 yearsJDBC(driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver", classPath = "~/HTS/Data/R/Functions/Driver/sqljdbc41.jar") Error in .jfindClass(as.character(driverClass)[1]) : class not found Any idea what is causing the problem ?
-
Andres Mora about 2 yearsWhat if both R and SQL server have the same TZ but queries are still performing really slow?
-
Andres Mora about 2 yearsIm having issues trying to use RJDBC as well. stackoverflow.com/questions/71578602/…