R and SPSS difference

73,010

Solution 1

I work at a company that uses SPSS for the majority of our data analysis, and for a variety of reasons - I have started trying to use R for more and more of my own analysis. Some of the biggest differences I have run into include:

  1. Output of tables - SPSS has basic tables, general tables, custom tables, etc that are all output to that nifty data viewer or whatever they call it. These can relatively easily be transported to Word Documents or Excel sheets for further analysis / presentation. The equivalent function in R involves learning LaTex or using a odfWeave or Lyx or something of that nature.
  2. Labeling of data --> SPSS does a pretty good job with the variable labels and value labels. I haven't found a robust solution for R to accomplish this same task.
  3. You mention that you are going to be scripting most of your work, and personally I find SPSS's scripting syntax absolutely horrendous, to the point that I've stopped working with SPSS whenever possible. R syntax seems much more logical and follows programming standards more closely AND there is a very active community to rely on should you run into trouble (SO for instance). I haven't found a good SPSS community to ask questions of when I run into problems.

Others have pointed out some of the big differences in terms of cost and functionality of the programs. If you have to collaborate with others, their comfort level with SPSS or R should play a factor as you don't want to be the only one in your group that can work on or edit a script that you wrote in the future.

If you are going to be learning R, this post on the stats exchange website has a bunch of great resources for learning R: https://stats.stackexchange.com/questions/138/resources-for-learning-r

Solution 2

Here is something that I posted to the R-help mailing list a while back, but I think that it gives a good high level overview of the general difference in R and SPSS:

When talking about user friendlyness of computer software I like the analogy of cars vs. busses:

Busses are very easy to use, you just need to know which bus to get on, where to get on, and where to get off (and you need to pay your fare). Cars on the other hand require much more work, you need to have some type of map or directions (even if the map is in your head), you need to put gas in every now and then, you need to know the rules of the road (have some type of drivers licence). The big advantage of the car is that it can take you a bunch of places that the bus does not go and it is quicker for some trips that would require transfering between busses.

Using this analogy programs like SPSS are busses, easy to use for the standard things, but very frustrating if you want to do something that is not already preprogrammed.

R is a 4-wheel drive SUV (though environmentally friendly) with a bike on the back, a kayak on top, good walking and running shoes in the pasenger seat, and mountain climbing and spelunking gear in the back.

R can take you anywhere you want to go if you take time to leard how to use the equipment, but that is going to take longer than learning where the bus stops are in SPSS.

There are GUIs for R that make it a bit easier to use, but also limit the functionality that can be used that easily. SPSS does have scripting which takes it beyond being a mere bus, but the general phylosophy of SPSS steers people towards the GUI rather than the scripts.

Solution 3

The initial workflow for SPSS involves justifying writing a big fat cheque. R is freely available.

R has a single language for 'scripting', but don't think of it like that, R is really a programming language with great data manipulation, statistics, and graphics functionality built in. SPSS has 'Syntax', 'Scripts' and is also scriptable in Python.

Another biggie is that SPSS squeezes its data into a spreadsheety table structure. Dealing with other data structures is probably very hard, but comes naturally to R. I wouldn't know where to start handling network graph type data in SPSS, but there's a package to do it for R.

Also with R you can integrate your workflow with your reporting by using Sweave - you write a document with embedded bits of R code that generate plots or tables, run the file through the system and out comes the report as a PDF. Great for when you want to do a weekly report, or you do a body of work and then the boss gives you an updated data set. Re-run, read it over, its done.

But you know, your call...

Solution 4

Well, are you a decent programmer? If you are, then it's worthwhile to learn R. You can do more with your data, both in terms of manipulation and statistical modeling, than you can with SPSS, and your graphs will likely be better too. On the other hand, if you've never really programmed before, or find the idea of spending several months becoming a programmer intimidating, you'll probably get more value out of SPSS. The level of stuff that you can do with R without diving into its power as a full-fledged programming language probably doesn't justify the effort.

There's another option -- collaborate. Do you know someone you can work with on your project (you don't say whether it's academic or industry, but either way...), who knows R well?

Solution 5

There's an interesting (and reasonably fair) comparison between a number of stats tools here

http://anyall.org/blog/2009/02/comparison-of-data-analysis-packages-r-matlab-scipy-excel-sas-spss-stata/

Share:
73,010
sfactor
Author by

sfactor

Dreamer, Analyst, Engineer, Programmer, Photographer.

Updated on January 26, 2021

Comments

  • sfactor
    sfactor over 3 years

    I will be analysing vast amount of network traffic related data shortly, and will pre-process the data in order to analyse it. I have found that R and SPSS are among the most popular tools for statistical analysis. I will also be generating quite a lot of graphs and charts. Therefore, I was wondering what is the basic difference between these two softwares.

    I am not asking which one is better, but just wanted to know what are the difference in terms of workflow between the two (besides the fact that SPSS has a GUI). I will be mostly working with scripts in either case anyway so I wanted to know about the other differences.

  • eyjo
    eyjo over 13 years
    1. For small tables I usually just copy-paste the screen output in R directly into Excel, then call 'text-to-column', alternatively you might use write.csv (or csv2) on the table. (or perhaps you were referring to some automatic reporting?). 2. The Hmisc package has variable labels, but value labels is for factors. This is nicely done in the foreign package if you were to import a SPSS (or Stata) dataset, the resulting R data keeps the labeling information from the original.
  • Chase
    Chase over 13 years
    @eyjo - "automatic reporting" is a relative term. Our current work flow entails: 1. Pull data from SQL into SPSS, 2. Use a VB script that goes through our surveys and pulls out variable and value labels automatically, edit them and apply to SPSS dataset. 3. Use another script that generates SPSS tables in the format we like. 4. Export to Word & Excel for further post processing that SPSS can't handle. 5. Make a "client ready" appendix as .DOC or .PDF. I would LOVE for R to replace the SPSS --> Word part of that. Ideally, workflow could be SQL --> R / Sweave --> Final product.
  • Chase
    Chase over 13 years
    It appears that we have been approaching and learning R for many of the same reasons, I'd be interested in hearing some more of your thoughts about this SPSS --> R transition that you guys are doing. I also noticed that you are down in Boston, I'm only a few hours away in Hanover. Have you done anything with the New England R Users group? Looks like they meet in Boston...
  • Harlan
    Harlan over 13 years
    Yes, there are some groups in my company that have scheduled R scripts that run, pull data from SQL databases, process it, generate Sweave PDF files, and email the results to relevant people. There are some issues with R and some databases on some architectures, but there's no way you're getting to that level of automaticity with SPSS alone!
  • eyjo
    eyjo over 13 years
    @Chase: I see, that's a bit more autonomous steps than I am used to. My work is mainly academic, mostly few unique outputs that I deal with. But what consulting I have done has made work more autonomous with R compared to the SPSS work flow it superseded.
  • Btibert3
    Btibert3 over 13 years
    I have been getting bogged down at work but have been dying to go. I am just starting out with R and trying to identify ways that my team and I can leverage the tool. My industry uses SPSS extensively, but as my exposure to different tools/methods increases, I see the need to explore other opportunities, if for nothing else other than effectively handling ad-hoc data requests. Feel free to contact me for my thoughts and experience on the transition.
  • daroczig
    daroczig over 13 years
    @Chase: I do not see why this could not be done only using R. I have developed some custom tools for companies which does exactly the same: get the data from SPSS or automatically fetch data from MySQL and applies labels/variable names (from another MySQL table of from online HTML survey's body) to the columns, generates tables with required format and exports it as an odt file, which can be opened in any MsWord (2007+) or OOWriter. The output can be themed (header, colors, images, font, table's margins etc.) easily. It can be a lot of work (moreover with a GUI) but may worth in the long run.
  • Joris Meys
    Joris Meys over 13 years
    If you want to compare, you should compare sensible things. That "benchmark" is not really the way to go for it. for-loops are pretty much avoidable in R, and should be avoided too. My experience is like Henriks for most tasks. Plus, from a statistical point of view both SAS and R perform better. Ever tried to do a one-sided T-test in SPSS?
  • Chase
    Chase over 13 years
    @daroczig - You just outlined the exact work-flow I'm trying to produce using R and related tools! I just need to find a chunk of time when I can sit down and hammer out the details. It's really the odfWeave / Sweave part of the equation I don't have a great grasp of presently. We pay an exorbitant amount of money in SPSS license fees for very trivial uses of SPSS...and SPSS isn't that good at what we are trying to ask it to do. It's good to know there are existing and working solutions out there!
  • daroczig
    daroczig over 13 years
    @Chase: the odfWeave package is very well documented, look for formatting.odt and its output in the sources of the package (odfWeave/inst/examples). Also: odfWeave might be a better choice over Sweave, as clients usually want to get an editable version of the reports. Let me know if you stuck somewhere in the outline/realization.
  • djhurio
    djhurio over 13 years
    @Joris, I agree completely with you. I was just curious to try the same test on SPSS.
  • richiemorrisroe
    richiemorrisroe over 12 years
    i don't know about that, I moved from SPSS to R without any programming experience, and although it took a while, I am orders of magnitude more productive than I was. Sweave alone has saved me at least two months worth of formatting for papers.
  • richiemorrisroe
    richiemorrisroe over 12 years
    do you have data for this? I'd love to compare this kind of thing as I always found SPSS faster than R for the same processes.
  • naught101
    naught101 almost 12 years
    There's a free and open source SPSS-style package called PSPP... Of course, it'd suffer from all your other comments, I suppose.
  • Jefferey Cave
    Jefferey Cave almost 9 years
    This is the most brilliant analogy I have ever read. I am using it for a lot of different programming environments from now on. Thank-you.
  • Rasmus Larsen
    Rasmus Larsen almost 9 years
    FWIW now 4 years later export to MS Word is now very easy using one click of a button in Rstudio. blog.rstudio.org/2014/06/18/r-markdown-v2 and rmarkdown.rstudio.com
  • SmallChess
    SmallChess almost 9 years
    PSPP is pretty basic in functionality.
  • KarthikS
    KarthikS about 8 years
    I have found SPSS to be a lot faster than R (a lot), when it comes to standard procedures. For instance, try mixed-effects modeling in R and SPSS.