Yesterday, I had the honour of presenting at The Data Science Conference in Chicago. My topic was Reproducible Data Science with R, and while the specific practices in the talk are aimed at R users, my intent was to make a general argument for doing data science within a reproducible workflow. Whatever your tools, a reproducible process:
- Saves time,
- Produces better science,
- Creates more trusted research,
- Reduces the risk of errors, and
- Encourages collaboration.
Sadly there's no recording of this presentation, but my hope is that the slides are sufficiently self-contained. Some of the images are links to further references, too. You can browse them below, or download (CC-BY) them from the SlideShare page.
Thanks to all who attended for the interesting questions and discussion during the panel session!
David,
Very useful post on why publishing the experimental results are critical. As an analogy, Ben Franklin did not become a world-renowned scientist, because he discovered lightening was an electrical phenomenon. This was already known. His fame came from publishing a reproducible method for demonstrating the verification process, so scientists with the court of King Louis the XVI could fly kites in stormy weather.
Best,
Randy
Posted by: Randy Betancourt | April 21, 2017 at 19:15