## Applied Nate Silver – the noise is really LOUD!

Yesterday, in my post about preliminary results on 5.2 diet, I posted this graph:

This graphs shows the connection (correlation, maybe) between a day’s calorie consumption and Δweight the next day. The trendline seems both logical and somewhat statistically valid. The crossover (no gain/loss) occurs about the same place as other calculators would claim.

So it all made sense – UNTIL it didn’t!

Yesterday, my Sunday at Starbucks, involves eating more than usual, a pattern I’ve had for months, which is then usually following by an uptick on Monday weigh-in. Since I wanted to break the 187 barrier before going on camping trip I was more cautious than usual yesterday, but still, according to the graph above, I expected a modest uptick. IOW, that was my prediction based on what I thought the signal shows.

What did I get?

-1.8lbs

IOW, 185.2 absolute and my lowest sample value (which I’ve found can be a surrogate for average, plus dropping the lowest is more indicative of weight loss than dropping the highest sample) moved all the way down to 184.2. When I saw that I thought the scale was broken, but, no, the low values held up for the average of 185.2.

So while this is a move in the right direction and therefore delightful it’s just as crazy and anomalous as upticks I’ve agonized over before. In short, once again the noise is overwhelming the signal.

Now let’s me expand on that with two very similar graphs:

Here’s a graph I made two days ago:

Note: The r^2 value. And here’s a small update on the graph with a bit more data:

Can you spot the difference?

Well, the second graph shows a significant downward sloping (weight loss) trendline with a substantially higher r^2 value. But what’s causing that difference?

Well, the second graph has two more “new” points (x≥33.0), but also I added a couple of older points (x<30.7). IOW, I added more data on the left side that has high values and more data on the right side that has low values, and guess, what – now instead of a flat slope through the data I see a significant downtrend. So even with considerable data (over 2 weeks of daily weigh-ins) using linear regression is a lousy way to detect signal.

In fact, what really is the “signal” here? And what is the noise?

Can I only answer that question with much larger amounts of data? If so, how is any measurement and statistical approach at all useful for providing meaningful feedback on weight loss, which inherently is a slow-motion activity where getting more data isn’t really feasible.

The two graphs I showed are similar to the lies the financial advisors put out. They tell you stocks beat other investments. But they deliberately cherry-pick the data. They use data, like in the second graph, to make their point. Pick a different time interval and do the analysis you get an entirely different result. In short, it’s very easy to lie with statistics and people do it all the time, to cheat you.

But I’m worried about cheating myself. I don’t want to cook the books and get the answer I like. I want the truth. Because it’s the truth that will tell me whether I’m achieving my goal or not. I’m not trying to impress anyone or sell something to someone – I want to know what’s really happening. Of course I like results where I have big decreases in weight, it’s just now I don’t believe them. So when it upticks back again tomorrow or the next day I won’t be all bummed.

But I really need some way, with all this variability I’m up against, to get that truth, the signal.

And this experience of mine leads me to believe that other people, also on their weight plans, are faced with the same variability, and unlike me, they’re not fanatics about data so when they get anomalous results they don’t have the background to show those results are just noise.

So if I could pass on to others what I’ve learned:

1. Eat less (anything, it doesn’t matter what, despite nutrition “advice”), exercise more
2. Get lots of data and analyze it because whatever measurements you make are mostly noise
3. Thus, stick with your plan and only look at the data (the more the better) over longer-term