Monday, September 30, 2019

Lies, Damned Lies, and Statistics


Mark Twain may not have invented this distrustful phrase about data, but he did popularize it. If the phrase sounds a bit harsh to you, perhaps we should simply say that one should always exercise an appropriate amount of professional skepticism before accepting any metric at face value.

Why is this relevant to SILC? Nathan Kwan of RPS Group, a friend of SILC, responded to a previous blog post by noting that Statistica estimated total 2015 oil, gas, and petrochemical employment in the United States to be 1.39 million. That’s a far cry from the 187,000 employees cited in our previous blog post entitled “Making The Case For Solar.”

Why the discrepancy? The 187,000 figure in the blog post relied on a statistic from a 2017 Forbes articles entitled “Solar Employs More People In U.S. Electricity Generation Than Oil, Coal And Gas Combined.” The author of that article relied on an earlier report from the U.S. Department of Energy.

Can we reconcile the 1.39 million employment metric and the 187,000 metric? It’s not possible to develop a perfect reconciliation with publicly available data; nevertheless, we can indeed derive some insight about the differential. The 1.39 million estimate, for instance, includes what Statistica calls “all broad related occupations.” The 187,000 estimate, on the other hand, is limited to “coal, gas and oil power generation.”

SILC Co-Founder Stanley Goldstein has noted that one might wonder about the employment categories that are included in each estimate. For instance, what about the employees of gasoline stations and vehicle repair shops? Unfortunately, it can be impossible to fully understand the content of each metric without paying substantial sums to the publishers of the data for access to such information.

So, with this in mind, which metric is true? Which can we trust?

They are both likely to be true. The Statistica metric is simply far broader in scope than the Department of Energy metric.

Even Mark Twain wouldn’t call either metric a “lie.” But if he were alive, he’d likely embrace the differential as an illustration of the need to maintain a healthy sense of skepticism about data.


References: