Software Nerd

Tuesday, June 10, 2008

The Limitations of Correlation-coefficients

Warning: All the IQ data-values below are fictional. They're contrived to demonstrate the underlying point about statistics and about inheriting IQ.

Hypothetical Study: Suppose we get a list of all the adopted kids in a small geographical area: say one suburban city. We choose a single age -- say 10 years old. From this list, we randomly select a sample. Assume we remove from our sample only those kids who are now orphaned, but every other kid in our sample agrees to participate in our study. We administer a standard IQ test to every child in the study, and to the adoptive parents of the child and to the biological parents.

Hypothetical Findings: Next, suppose we measure the correlations and find the following:

... that, the IQs correlate perfectly between kids and their biological parents (Coefficient=1.0), but only a little (low positive, 0.3) between kids and their adoptive parents. Here's the graph of the data, for 5 kids:

The data points for 5 kids are shown. For example, Kid-1 has an IQ of 105 (X-axis), his biological parents have an average IQ of 92 (Y-axis), and his adoptive parents have an IQ of about 113. Kid-2 has an IQ of 110, and so on.

The straight lines show "best fit". One can see the perfect correlation (going through all the blue data points) between IQ of kids and their biological parents. On the other hand, the red data points are all over the page, with a noticeable positive correlation, but a lot of dispersion from the "best fit".

What conclusions could an we draw from such a (hypothetical) result?

  • Could we sat that inherited ability (e.g, genes) is a strong causal factor in IQ (as measured by standard tests)? Wouldn't the correlation of 1.0 indicate this?

  • Secondly, can we conclude that "environment factors" do not have a large impact on a child's IQ? Wouldn't the low positive correlation indicate this?
Neither of these conclusions is warranted by the data.

Scale is abstracted away: A correlation-coefficient does not reflect the absolute numbers of the two series. It abstracts away the particular unit of measure. For instance, suppose I have two series: people's weight and height. The correlation between these two will be the same, no matter what units I use for weight (pounds, kilograms or ounces).

Similarly, when we are calculating the correlations between the IQs, though we use the same scale, we are ignoring the absolute values of each series. Imagine a hypothetical, where something about the process of adoption gives kids a higher IQ. Imagine that adopted kids end up with 25% higher IQs than we have assumed above. Still, the correlations would remain the same: the kids would still have perfect correlation with their biological parents and a low positive with their adoptive parents.

Meaning: If we did get a perfect correlation in an IQ experiment like the one above, it would hint strongly to inheritance being a factor. However, it says nothing about how important a factor. What if further research found that the reasons behind that perfect correlation did indeed represent causation? Even so, it would not speak to the importance of inheritance in the final IQ.

Another example: Here's another example to illustrate this point. Suppose we look at the wealth of five men at the start and end of various years. Suppose we find that they grow their wealth at about 5% each year. Suppose we also find a strong positive correlation between the starting and ending wealth in any year.

Now, instead, suppose that during the last year each man placed a large bet on a game; but, each bet was "large" only in relation that man's wealth. Suppose, each man won, and ended the year with nearly twice the wealth he began with. Now, at the end of the year, their wealth is still positively correlated to their wealth at the beginning. Based on their "inheritance" of that last year, we would have expected their wealth to grow 5% each, but instead it grew by 100%. The betting was responsible for that, not the inheritance. Yet, the inheritance still demonstrates the same high positive correlation. Only by looking at the actual scale do we find the relative importance.

Summary: The take-away is this: be careful gleaning more information from a correlation coefficient than it is designed to tell. Many experiments are designed to ask something like this: "if we assume that all other things are equal, does varying this single factor have an impact?" Well and good. However, the other factors -- that have been abstracted away -- may be the crucial ones that vary in the relevant real-world situation.

Wednesday, June 04, 2008


On TV, pundits debate if we're already in recession, going to be there soon, or will avoid one. Meanwhile, my state of Michigan has been lagging the nation for a while. Our recent unemployment numbers (7%) show us at the bottom of the list. Michigan home-price did not rise like those in California and Florida; yet, we're experiencing a relatively bad housing market.

Consider this, though.... Today, while driving to the mall, I was watching the sprinklers water the wide, well-mowed median of our middle-class subdivision. In up-and-coming economies like India and China, this would be unthinkable luxury, reserved for rich enclaves. At the store, the sign said they were out of Wii's, and when they arrived they'd be limited to 1 per customer.

Even though I immigrated over a decade ago, the the relative wealth of the west still blows me away.

Here's a related, interesting read, from the WSJ.

Unintended Consequences

Sometimes actions have unintended consequences. However, many unintended consequences should have been anticipated by reasonable men, with a reasonable grasp of the particular area of action. Next time you hear a politician say his actions resulted in "unintended consequences", translate that to mean he's saying he is ignorant and negligent -- the odds favor that explanation. Here are some examples:

Congress creates incentives that divert corn from food to fuel. The unintended consequence: higher corn prices!

Rising food prices mean that farmers in India can get higher revenue. Instead, the Indian government bans export of rice, limiting the demand to the poorer Indian market. However, the farmers are still paying higher prices for inputs like gas, fertilizer (and seed). The unintended consequence: farmers planting less rice!

Or take the government trying to address economic problems (in part) by sending out "stimulus checks". The unintended consequence: future inflation that takes back what it gave.


Sunday, June 01, 2008

Targeted Marketing of Objectivism

Can the marketing of Objectivism be more narrowly targeted? Can specific messages be aimed at specific groups? I think the ARI does target businesspeople. School kids and their teachers are targets of the essay-competition. On the college-level, I think MBAs were targeted (along with broader marketing) for the Atlas essay. How about other groups? Doctors might be one relevant group. Some other industry-specific material might work as well: e.g. something aimed at energy industry executives. I think it should be possible to target pamphleteering and lectures as well.

I got thinking about this on reading "To Young Scientists" by Ayn Rand (remarks delivered to MIT, reproduced in "Voice of Reason"). Something in that vein -- updated with contemporary concretes -- might work better than more generalized pamphlets and lectures. A narrowly targeted message can allows the author (aka salesman) to (say) describe the importance philosophy in the life of that specific student (aka customer). One might have "To Young Artists", "To Young Writers", "To Young MBAs", "To Young CPAs", and so on.