In snooping around the US Government’s open data sets a few months back, I found out that NASA has an entire web site dedicated to their publicly available data: https://data.nasa.gov/
Surely, you understand why that would excite me!
I dug around a bit and pulled out some information on meteor landings in the United States, with tons of information, mass, date, lots of stuff.
To simplify the data set and make things tidy for R, I wrote a quick Python script to strip out some columns and clean up the dates. Here’s the gist if you want to have a go at the data as well.
I ended up looking to see if there was a trend between date and meteor mass, to see if maybe there were obvious cycles or other interesting stuff, but some super-massive meteors ended up shoving the data into pretty uninteresting visualizations, which is too bad.
We can do some simpler stuff, even with some super-massive meteors. For instance, here’s a log(mass) histogram of all of the meteors:
Check it out! It results in a somewhat normal, slightly right-skewed distribution. That means we can use inferential statistics on it, although I am not sure why you would want to! The R code is a super quick ggplot2 script.
It’s pretty amazing how easily we can access so, so much information. The trouble is figuring out how to use it in an actionable and simply explained way. The above histogram is accurate, and looks pretty (steelblue, the preferred default color of data folks everywhere), but it isn’t actually helpful in any way.
Just because we can transform a dense .csv into a readable chart doesn’t mean it’s going to be useful.