By Peng Zhao | January 28, 2019
Weeks ago, I gave a short training course at one of the top institutes in the world. The course was called 'R, Open Science and Reproducible Research', abbreviated as ROSS. It was given to the academic researchers who were interested in R and reproducible research. The R markdown family, including 'rticles', 'bookdown', 'xaringan' etc., were introduced. The audience were excited in the course. They felt, however, confused after the course by using these packages on their own work. It is not easy to connect their own work with the R Markdown ecosystem.
Figure 1. Academic work elements and R (adapted from the book Learning R)
Researchers often struggle with some annoying problems. In a conventional workflow, people get results from datasets, and thy have to copy these results (numbers, figures, tables, etc.) into their productions such as journal manuscripts, dissertations, slides, or posters. If the datasets or the calculations are updated, they have to repeat copying, which causes many potential problems. Can we avoid doing these stupid stuff?
Yes, if using R Markdown.
R Markdown is a document format that embeds code chunks (of R or other languages) in Markdown documents for literate programming. With years of development and continuous contributions from the open-source community, users nowadays can use R Markdown not only for writing documents and books, but also for producing reproducible academic articles, dissertations, slides, posters, and even websites. Although it has been seven years since the first idea of R Markdown came out, many people in academia nowadays are unaware of this handy tool yet, probably because of the difficulty at the beginning of learning R.
After the ROSS course, I decided to develop a package which integrates various academic elements, including data, codes, images, manuscripts, dissertations, slides and so on, into a single reproducible academic project. The datasets, codes and documents should be well connected so that they can be easily synchronized and updated. users don't have to repeat copying and pasting their results and figures from time to time. It will be easy for the scientific researchers to use, even if they are R beginners, or even non-R-users.
I chose the name 'rosr' rather than 'ross' so that users could easily use a search engine to find it, rather than the well-know scientist Ross Geller.
Figure 2. Dr. Ross Geller from Friends
What are the current features?
Briefly speaking, the current features of 'rosr' are as follows.
- Users can create a brand new project with demo files in user-selected directories. The project has a well-organized structure. The demo files can be demo datasets, images, codes, journal manuscripts, books, slides, posters, mind maps, or websites. They are well connected, which means that all the documents will update automatically if the data or the codes change.
Figure 3. The connections in a rosr project.
Users can add new sub-project to an existing project according the sub-project type they choose.
Users can create a skeleton project with no demo files when they get used to the way how 'rosr' works.
Users can use a single R markdown file for the mathematic equations they use, and cross refer them in different documents. The advantage is that users do not have to copy and paste the same equation to different documents. If they want to update a certain equation, they only have to update the equation file.
Figure 4. An equation sheet in a rosr project.
The detailed usage of 'rosr' is out of the scope of this post. A tutorial of 'rosr' will be available soon.
In the future, more features will be brought into 'rosr' as follows.
- The insertion of equations will be improved. Currently, equations are inserted via the function
eq()and users have to specify the number or label of an equation. A friendly GUI, which could be an RStudio addin or a shiny app, would be expected for code-haters.
- More choices of the sub-projects with demos will be added to the arguments of the functions.
- A GUI for creating or maintaining a project is expected for, again, code-haters.