« In case you missed it: January 2018 roundup | Main | Because it's Friday: Bon Voyage, Starman »

February 08, 2018


Feed You can follow this conversation by subscribing to the comment feed for this post.

The vignette aircraft examples are a bit misleading, as data needs a bit more cleanup, I think. Airbus is in the list with two different strings, McDonnell Douglas with at least three, and Canada with two. If those were first lumped together into one each, before lumping the long tail together into an "other" bin, this could make a big difference in further modeling, as Airbus would jump to largest group by far, not the third, with about half of the Airbus data being lumped into "other". #oops

Great stuff! I really like the report. Is it possible to add the dataset name and the boxplots?

@JarnoPeschier, thanks for identifying that! I totally overlooked it. I have created issue #55 for tracking, and update it with the next release!

@btadams Yes, I plan to improve the reporting functionality with next release. See issue #41. Thanks for using DataExplorer!

I like the package, but why the inconsistent ggplot theming: Defaults for boxplots, but odd semi-transparent bars with not really prett black outlines for the barplots and histograms? Sticking to ggplot standards would have been nicer imho.

plot_str just gives:
Error: C stack usage 7970280 is too close to the limit

Do you know of a similar tool for python?

The comments to this entry are closed.

Search Revolutions Blog

Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid
Get this blog via email with Blogtrottr