By Peng Zhao | June 12, 2018
When I became active on steemit.com in July 2017, the first thing for me to do was to find an R package to process Steem data. As an R user and package developer, it would have been amazing if I could easily download my posts, my operations, my communications with others on Steem, and analyze the data with the power of R language.
Unfortunately, I found nothing.
Then I started to write R functions on my own based on Steemdb, SteemData and SteemSQL. Some of these functions were used in a series of data analysis posts with the title 'Steem Watch', and others were used to build the site steemr.org.
Logo for steemr. Designed by @maiyude
steemr.org gained an impressive reputation in the community. However, as most of its components were based on SteemData, its heart was broken when SteemData was down. It was a pity that these codes were not really shared in the community and now they are useless, --- unless SteemData comes back.
Visitor map of steemr.org
Luckily, the rest of steer.org still works. Then I decided to open the source of steemr.org and develop a brand new R package. It is like the organ transplantation. Steemr.org is dead, while the steemr package can use some useful parts from steemr.org.
One of steemr.org's features: display all the posts of a user
In the meanwhile, my book Leaning R: for Rookies (in Chinese) was published last month. Readers began to give me exciting feedback. It would be helpful to show them how to use R in the Steem community. I would expect that some of them could join me in developing steemr, which would be a valuable gift to the Steem community.
What is the project about?
steemr is an open source R package for playing with Steem data with R language. It is used to download, post process, analyze, and visualize Steem data on the basis of the powerful statistic power of R. Currently, the version 0.0.0 can:
- obtain the complete post list for a given account,
- obtain the complete account information from steemdb.com,
- organized the 'follower' and 'following' information for a given account,
- obtain the following history of a given account,
- download the vote records of a given post, and
- download the complete data of the latest 100 posts of a given account.
A screenshot of running the current version (v.0.0.0) of steemr with RStudio
steemr is developed in R language with the support of the RCurl package and the XML package.
In the future, I am going to add much more features to steemr, such as
- building a personal blog website from the posts of a given account,
- building a book (in html, pdf, epub, word) from the posts of a given account,
- visualization of the Steem data in a variety of ways, such as statistical distributions, time series, word clouds,
- support for SteemSQL query and data analysis, and
- building a user-friendly web UI for those who knows nothing of R language.
One of steemr's future features: easily plot a word cloud from an ID' posts. image taken from one of my Steem Watch posts.
How to contribute?
Anyone can get in touch with me by leaving me a reply on steem @dapeng or on https://github.com/pzhaonet/steemr.