"Why do we use R rather than Excel?"


I recently had cause to take a beginners course in R - a language I'm fairly familiar with.

One of the other students had never used it before, so we were buddied up in order for me to show them the ropes.

The first lesson of R is always the same. Read a CSV, manipulate it a bit, draw a graph. We did it all without much fuss - and a graph appeared on screen. Nifty!

"I don't get it," the student said, "Why wouldn't you just use Excel for this?"

To a programmer, it seems obvious - but it's a fair question! If you have a static set of data, you can drag your mouse over it, hit a few buttons, and a graph appears. Much easier than wrestling with esoteric syntax in a text-based interface. This isn't a Koan - where the student becomes enlightened at the end. It's a tricky question to answer. Here are the reasons I gave - feel free to add your own.

Visibility

How do you see the code inside an Excel document? How do you tell exactly what is going on? You have to go clicking through cells, or reverse engineer what settings a graph has.

With something like R, you automatically have all the code visible in front of you. Reading through the code in a linear fashion is possible. You can trace exactly what the code is doing without having to worry about whether there's some code hidden in cell Z44.

Track Changes

Related to the above, it's hard to visualise what changes have been made to an Excel document. I don't know any way to easily see how a formula has changed. With R and Git (or any other version control system) you can see exactly what has changed from one version to the next.

Repeatability

Typically, a user draws a graph on a single Excel document. If you want the same graph of a different data set, you're out of luck. You can copy the data from one Excel sheet into another - but there's no way to easily copy a bunch of manipulations and graph configurations to another document.

With R, you just change read.csv("1.csv") to read.csv("2.csv") and the exact same calculations are run on two different data sets.

Batch processing

Related to the above, you can read every CSV in a directory and produce a graph for each of them. You can read data from an API and run the same process on it that you did yesterday.

Extensibility

Excel has a wide range of graphs available - but R has more. Excel can do basic analysis - but R can do extensive, complex analysis. Excel has some decent tooling - but R has thousands of libraries which can do a bewildering array of clever stuff.

What else?

Those were the advantages that I could think of on the spur of the moment. Perhaps you can think of more.

But what I learned was that it is decidedly non-obvious why a user would want to use something like R or Python when Excel (seemingly) covers all the basics.


Share this post on…

  • Mastodon
  • Facebook
  • LinkedIn
  • BlueSky
  • Threads
  • Reddit
  • HackerNews
  • Lobsters
  • WhatsApp
  • Telegram

25 thoughts on “"Why do we use R rather than Excel?"”

  1. keithb says:

    http://www.eusprig.org/stories.htm

    The news stories above will provide some examples of why a script producing a report is safer than sending a spreadsheet around! The same organisation did a paper some time ago claiming a 5% error rate on spreadsheet entries.

    I see the spreadsheet as a sketchbook and a script as a finished work (possibly in oils:-)

    Reply
  2. Joseph E Dante says:

    All the things you describe R can do can also be done easier in excel using VB.

    Reply
    1. Waphle_Stomp says:

      So, in order to replicate R’s built-in, native functionality, you’d need to go out of the way to build, test, debug, patch, repair, retest with VB all the same functionality; meaning you need Excel plus VB. Firstly, any object or program, even R, can be enhanced using additional programming. Secondly, there’s no way to achieve any semblance of efficient work practice, thirdly, that’s outside of the (purist) scope of comparing why to use R vs Excel in their native states. By default, a Lamborghini will beat a Yugo every single time … unless you spend $200K to modify the Yugo so it looks the same and has the same set of capabilities as the Lambo.

      Reply
    2. Bayonett Bayonaro says:

      Go with custom C# extension - send it to your coworkers. Do a nice GUI. Everyone can now use your tools for their data without bothering you or learning R. Yes I know with R you can do the same.

      Reply
  3. I've led courses on spatial data analysis and mapping using Excel+QGIS, as well as using R, sometimes covering both in one course. Its definitely hard to convince students the value of R when they are used to being able to see and interact more directly with data.

    I usually explain that using Excel + QGIS is like baking a cake yourself. You do all the steps yourself, and if you realize at the end you forgot to add eggs, you have to throw the cake out and start all over again.

    Using R, on the other hand, is like writing a recipe. If you forgot a key ingredient, fixing the recipe is as simple as adding it to the list of ingredients. The catch is that you hand the recipe to a very stupid and literal chef, so you must specify everything in the recipe perfectly.

    Reply
    1. Joe Schmoe says:

      value of R when they are used to being able to see and interact more directly with data.

      Does R not have an easy to visualize repl like Jupyter Notebooks for python? It's pretty direct interaction with great visibility, at least with pandas and matplot.

      Reply
  4. Jane says:

    My one takeaway from your reasoning is that you don really know Excel. For example, changes can easily be tracked in Excel. And batch commands can be run in Excel, as well as proper scripts. Etc.

    Reply
    1. Duc Ottoman says:

      I agree, I was a programmer and had nothing but disdain for excel until for the last 10 years I had to do a lot of data and analytics and so I had to learn excel to things fast.

      excel is no longer just a tool and have all sorts of extensions to handle so much data and API integration and web services and complex mathematical derivation.

      I do not know R very well as a programming language ( need to know R inside out at architectural level not just as a coder consumer) but I think R or SAS or Python is not superior.

      Can R allow a team of 20 people working at different places collaborating code together? I have never been in a situation that R does something that ecxel cannot achieve (probably a little bit longer)

      i spent 2 or 3 months taking all excel and data courses from lynda.com later acquired by linkedin which in turn acquired by msft and it was my best investment of time and money. Felt like I should have taken it several years earlier rather than spendinf time with sql oracle and java c++. Unless you are in a real production house excel is just as fine

      Reply
  5. el timbero says:

    As said above, all things described here can be easily done with Excel VBA

    Reply
  6. Joshua Lee says:

    The article completely misses the point of R. Go ahead. Try and run a mixed effects model with random slopes in Excel, and plot the random effects and the fixed effects in a spaghetti plot. Let me how that went.

    Reply
  7. orion78fr says:

    The one million lines limit of excel is a pretty big reason for me. That’s why some covid cases got dropped at the beggining of lockdown (:

    Reply
    1. Tim says:

      The covid cases were lost due to using the outdated .xls format with its row limit of 65,536 rows. It's hard to see how a 1m line limit could possibly impact a daily covid case figure.

      Reply
  8. Joshua Johnson says:

    I hear what you're saying, but sometimes (e.g., trading/finance) it pays to be closer to the data rather than the things that operate on that data.

    It has to do with accountability. If you're being asked to put your job on the line every time that you (for example) run a model based on some data that's provided to you, then you're going to want to personally inspect the data for correctness before running your model.

    Often-times a user of an Excel-based financial model in the Banking industry will perform numerous ad-hoc checks (charts, pivot tables, random calculations, etc.) on the data before officially running their model. That's tough to do when the data isn't sitting in a software package like Excel.

    PS. I've never posted before -- your blog is excellent

    Reply
  9. Ope says:

    The amount of data. I work on epidemiological data sets, GWAS, thousands of rows, there is no chance to do it in Excel. Just to change the full stop to coma in such data set takes 20 min if I am lucky to get the descriptiv data not hang Excel completly.

    Reply
  10. Tim says:

    I can see problems with both the question and the proposed answer. The question is a bit like someone asking why they can't just write "Hello world!" on a piece of paper.

    In the answers, on the other hand, there is an inherent assumption that Excel is always going to be the wrong tool for any task, and has been entirely superceded by R. Almost any ad hoc use case is going to favour Excel, and anything more stable and repetitive is going to favour some form of programming solution. Excel also works well in hybrid situations where both creator and user of the data and analysis both want access to the source data and the ability to create their own analysis and do their own checks, or further ad hoc analysis.

    I see nothing wrong with an answer of, "In this case, you're right, Excel could do the job easier and quicker, but this is just an exercise to help you learn R. You will soon be doing things that Excel can't."

    Reply
  11. Shrinivas Iyer says:

    I am sorry to say but this is a very poor article.

    With Excel and it's in built VBA capabilities you can achive everything that R can do and even better.

    You have 100s of shortcuts in Excel that help you do these things.

    R is fantastic language but comparing Excel vs R like ypu have done is totally not justified.

    Reply
    1. anthony lambert says:

      I've worked in Banks for the last 20 years and spreadsheets are powerful, no doubt, but for any large bank they are also a nightmare. Spreadsheet are next too impossible to audit. They contain processes that can be changed by the user and can easily contain errors that no one knows about. That easy to create function that take a few clicks can cost Banks millions in or billions of dollars in errors, and sometime fraud.

      Reply
      1. anthony lambert says:

        looks like I have a problem with intermittent keyboard when typing "s" orry!

        Reply
  12. anthony lambert says:

    Excel spread sheets are hard to Unit test and code is hard to review, bug can creep in easily. Also when you make changes it is hard to review them in any auditable way like you can with text base languages and source code control systems. It's a no no for this kind of thing with millions of users and high stakes like people's time and lives.

    Reply

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.

Allowed HTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <p> <pre> <br> <img src="" alt="" title="" srcset="">