3. SpeedR, MATLAB and Python are interpreted languages, which by nature incur more processing time. While all now offer just-in-time (JIT) compilation, it may not always help much. Iterative loops are especially slow. The idea behind MATLAB is that this should not really matter, because it was designed for linear algebra, functioning as a front-end to numerical libraries programmed in FORTRAN or C. The same applies to R to a lesser extent.
Both languages use a variety of tricks to speed up computation, offloading common calculations to libraries in C or FORTRAN. If that fails, one can just code up C/C++/FORTRAN within these languages. However, from an implementation point of view, the problem is that all these tricks make the languages more complicated.
The same applies to Python. Cython is commonly used to speed up performance considerably by running portions of the code in C. One can use Numba, a JIT compiler involving minimal additional code. However, it can only be used in certain simple cases. For example, it does not support class definitions and exceptions.
Julia, with just-in-time compiling, promises to be as fast as FORTRAN or C. The user does not have to implement tricks to speed up the code, so the language becomes simpler and easier to programme.
To compare the speed of these languages, we implemented a simple iterative calculation in each. For reference, an implementation in C was also included. The calculation is the iterative loop for log-likelihood computation in a GARCH(1,1) model for a dataset of length 10,000. We repeated the calculation 1,000 times and recorded the best runtime in the following figure.
R is the slowest and Python the fastest, with Julia not far behind.
An expanded discussion of the speed comparison is available in our web appendix. For an alternative comparison, see Aruoba and Fernandez-Villaverde’s performance comparison.
4. Data handlingA lot of research involves large data sets, often in a variety of different data types such as integers, strings, reals, dates, logicals or lists. Processing such data may require filtering and transformation operations.
Data is often read from and written to a number of formats, including text files, CSV files, Excel, SQL databases, noSQL databases and proprietary data formats, either local or remote.
This is where R absolutely shines. It was designed for scientific data, and it shows. It can handle complicated data structures with a variety of formats and origins, with many packages that provide a variety of ways to access and process the data. It can handle data sets that are much bigger than what can fit into memory.
Python is also quite good at this, with its pandas and NumPy libraries able to do many of the same things including some which R cannot do. But it does not seem as fluid as R. NumPy arrays lack column names, which makes data retrieval less convenient. Numerical programming requires subsetting and changing elements in data structures quickly and efficiently. When using pandas, accessing and changing elements require special syntax like .iloc /.loc and often explicit type conversion from pandas dataseries to NumPy arrays and back.
For example, to access an element in DataFrame M, one may have to use
M.iloc[1,:].values[1]
It is much simpler in R
M[2,2]
MATLAB has improved in terms of its supporting different data types in recent updates, with different table types for heterogeneous data and categorical arrays. It has import functions for most common file types.
Julia's handling of data is lacking in terms of file types and options supported at present. Moreover, some packages are still going through reorganisation, like the CSV and DataFramespackages for importing CSV files.
So, when it comes to data handling, Julia is the worst, followed by MATLAB and Python, with R being the winner.
5. LibrariesEach of these four languages provides a basic infrastructure, but a lot of specialised functionality is offloaded to external libraries.
MATLAB was designed as a numerical language and has a lot of useful functions built in. Moreover, its available libraries are very rich, especially for numerous engineering applications (e.g. signal processing).
R is even better: there is probably a library for almost any statistical functionality one could possibly use. The downside is that some of these are of low quality or are badly documented, and there might be multiple libraries for the same functionality, often with different argument specifications and output types.
Python has a lot of libraries available, but not nearly as many as either R or MATLAB.
Julia, being the newcomer, has the fewest libraries by far.
So in terms of libraries, Julia is worst, followed by Python and MATLAB, with R the best.
That said, Python, Julia and R can all call functions from each other. Thus, libraries in one can be used in all, mitigating the problem somewhat. While this can be useful in special circumstances, it is more natural and stable to just work in one language.
6. LicensingThree of these languages (Julia, Python and R) are open source, while MATLAB is commercial. For pricing see here. This means that the first three are available on almost any platform and one can install them without paying or getting permission.
Heavy computations often get outsourced to either high performance computing clusters or the cloud. We can rent a 72-core machine on Amazon Cloud for $1.16 an hour, making that 20 times faster than most desktops. For MATLAB, one needs to purchase the Parallel Computing Toolbox and pay $0.18 ($0.07 educational) per core per hour (see here). Moreover, that requires considerable time to set up.
Hence in terms of licensing and cost, MATLAB is worst, and the other three equal.
7. Ease of useHowever, when it comes to ease of use, MATLAB has a good integrated development environment (IDE), the MATLAB desktop, with very good documentation.
R has come a long way, with the RStudio IDE even better than the MATLAB desktop. Shiny allows interactive web apps and dashboards to be built directly from R, providing online-friendly means of data presentation.
R has good plotting functionality, with MATLAB not far behind.
Python's Anaconda distribution bundles a good IDE, Spyder. Plots are mainly done through Matplotlib, with an interface similar to MATLAB's.
Juno for Julia is an IDE integrated with the Atom editor which looks and functions like Spyder. That said, we occasionally experienced teething issues, like error feedback failing to identify the exact source of error. Plots.jl is used for plotting, often relying on packages from other languages.
Being rather new, commonly used packages in Julia are still undergoing changes from time to time. This has resulted in incomplete or sparse documentation. On many occasions, while translating code from R/MATLAB to Julia, we had to look up the source code to figure out the required settings (if they even existed in the first place). Moreover, many packages still use deprecated subroutines, with frequent warnings popping up when executed.
All four could be used in Jupyter notebooks. A Jupyter notebook implementation of the code from Financial Risk Forecasting is available here. However, while Jupyter notebooks are certainly useful for demonstration and pedagogical purposes, we do not think they are the best environment for day-to-day programming.
Thus, in terms of ease of use, especially for novice users, MATLAB is the best. R and Python trail behind slightly, with Julia having some way to go.
SummaryNone of these four languages leads on all evaluation criteria.
R and MATLAB benefit from being the veterans, one can do almost anything one wants with them. However, their age shows: the languages are outdated, with considerable baggage and inefficiencies.
Python is more modern, but its libraries are lacking in comparison and numerical programming is clumsy.
So, what about Julia? It is a modern language, very elegant and fast. What it lacks at present is comprehensive library support for data handling and numerical calculations. It is also under heavy development and with core features breaking between releases. However, version 1.0 is expected later this summer, bringing with it feature stability.
So is there a clear ranking?
Recognising that this assessment is highly subjective: For our purposes, R is the best numerical language.
With Julia the one to look out for.