What programming languages are good for statistics?

15,821

Solution 1

No contest -- R as the main implementation of S (and one that happens to be proper Open Source and a GNU project as well).

Not only as the S language designed precisely for this purpose (see the books by John Chambers), but the rather rich support of domain-specific packages at CRAN is second to none: over 2000 packages with proper quality control, often authored by experts in the field.

The ACM sees it the same way when it gave the ACM Software Systems Award to John Chambers in 1998 with the following citation

John M. Chambers

For The S system, which has forever altered how people analyze, visualize, and manipulate data.

For reference, other winners of this award were TeX, Smalltalk, Postscript, RPC, 'the web', Mosaic, Tcl/Tk, Java, Make, ... Not a bad company to be in.

Now, if you 'only' want to collect and summarize some data just about any procedural or functional language will do. But if you want something that was designed for programming with data then R as the main S implementation it is.

Solution 2

No question that R is the best language for statistics, as Dirk says. I just want to add a few points to this:

First, I think that the primary reason that you should use R is because of the community. It is used so heavily by experts in academia and industry at this stage, that no other language even comes close to rivalling the wealth on CRAN.

Second, it should be acknowledged that R the language is a joy to work with. It is my primary language, and having tried alternatives, I have no intention of abandoning it any time soon. But it also doesn't have a monopoly on it's strength for programming with data and this claim can be taken too far. All the Lisp and Functional languages are strong at data programming. Lisp, after all, was derived from "list programming", and it is Lisp's influence on R that make the language what it is.

There are members of the R community (eg. Ross Ihaka) who are actually viewing Lisp as the statistical languge of the future (see the "back to the future" paper for a reference) due to some deep design problems in the R language (eg. no multithreading).

So while R is undoubtedly the best language for statistical computing, I see some value in being familiar with another language like OCaml, Haskell, or (possibly) Clojure/Incanter.

Solution 3

Have a look at Incanter, based on clojure. "Incanter is a Clojure-based, R-like platform for statistical computing and graphics." Clojure is a lisp based language implemented on the top of the JVM. It has easy access to java libraries. Can't get more general purpose than that.

Solution 4

From my experience, R is an exceptionally powerful language in these areas:

  1. Manipulation and transformation of data.

  2. Statistical analysis.

  3. Graphics.

But R is by no means a three-trick pony. I have also applied the language to tasks that do not fit entirely into the above categories. Some examples are:

  • A script to assist in the creation of OSX universal binaries by identifying and matching static and dynamic libraries of different architectures and then running the resulting groups through lipo.

  • Scripts to scrape information from web pages.

  • A set of scripts to create georeferenced imagery, cut the images into tilesets using GDAL, form a JSON manifest that describes the output and upload the result to a website for immediate display by OpenLayers.

My favorite part of using the R is the frequency with which I get to say:

WHOA! There's a package that does THAT?!

Solution 5

You can have a look at the program sage, which is a re-implementation of the python interpreter that allow you to call different programming languages for statistics (R, matlab, octave, etc..) using a python syntax.

One of the major issues while writing programs to do statistics is that you may end up with having many different small scripts, each one doing a separate task, and you can end up with having messy folders and confusion in your results.

So, apart from choosing a programming language (I think other people have answered to your question already) you also need a syntax to define pipelines of scripts: you can make it with the program 'gnu/make' (e.g. read this) or with this sage, or there are other solutions.

Share:
15,821
Jason Baker
Author by

Jason Baker

I'm a developer on Google's Cloud Console.

Updated on June 03, 2022

Comments

  • Jason Baker
    Jason Baker about 2 years

    I'm doing a bit more statistical analysis on some things lately, and I'm curious if there are any programming languages that are particularly good for this purpose. I know about R, but I'd kind of prefer something a bit more general-purpose (or is R pretty general-purpose?).

    What suggestions do you guys have? Are there any languages out there whose syntax/semantics are particularly oriented towards this? Or are there any languages that have exceptionally good libraries?