Episode 7: We’ve been due for a discussion about probability

With a finite set of topics which I could discuss its inevitable that I’d eventually have to talk about probability, isn’t it?  As I write more articles that don’t feature probability you have to think that the next one will feature it.  The longer I go not writing about probability, the more I’m due, obviously.

Are you sure about that?  Is there anything else going on here that could lead us to the wrong conclusion?  Probably.

We’re going to have to go back to the roots of probability.  Of course, I’m talking about toast.  Let’s assume that we have a piece of fair toast with a burned image of Stephen Hawking on one side.  A piece of fair toast is one that if tossed in the air has an equal chance to land on either side.  I toss that piece of toast and it lands Stephen Hawking down.  I toss it again and it’s Hawking down again.  One more toss and the late great professor is still staring at the floor.  Surely on the next toss it’ll land with the renowned physicist looking to the heavens.  We’re due to see him!

Probabilities are sometimes viewed as hard to understand, but that’s not true.  The basics are pretty simple.  When we temporarily levitate our bread over and over, each toss is independent of the one before it.  That means the results of the first toss do not affect the second, the second does not affect the third, and so on.  This is also known as classic probability.  Hawking’s toast, when tossed, has a 50/50 chance of landing physicist up.  If we try it 5 times and dear Stephen always stares at the floor we may feel like there is a greater chance he’ll be face up on the next try.  But that’s not the case as it’s still just 50/50.

Our trusty human brains are always trying to make sense out of nonsense which is very helpful in keeping us alive.  Sometimes it can lead us astray though.  These classic, or independent, probabilities make us want to think of them as dependent.  We know the toast can’t always land astrophysicist floorward and any time we see too many examples like that we start to think the next can’t be the same, that we’re more likely to get a Hawking.  This is what we have to be on guard for.  Our  brains want us to think the independent actions are actually dependent.

Lots of things that we think are dependent probabilities aren’t.  A sports team with a losing streak is due for a win simply because they’ve lost a bunch.  But really, the number of games lost doesn’t affect the next games outcome.  It’s more likely that player health, caliber of the opponent, venue or other factors would affect the outcome.  Weather is another example – it’s been so dry lately we’re due for some rain.  However, the number of dry days does not affect the next days weather.  I know that weather is extremely complex and we could argue that there is some effect but it’s so tenuous and difficult that the simpler answer is that one day’s weather does not affect the next.  Ask anyone in a long-term drought.  A poker player who’s had some bad hands may feel they are due for a good one but if the deck is shuffled between plays they are no more likely to get a better hand than on the last one.

There are conditional, or dependent, probabilities as well.  And we have to go back to toast for this, or more generally, breakfast.  Have you ever gone out for breakfast and only had the server bring syrup if you ordered pancakes?  If so, then what is the probability you’d get syrup without ordering pancakes?  Maybe it was in error, or it’s just standard practice if you want more maple-y bacon.  The probability of getting syrup is based on the type of breakfast ordered.  Syrup is dependent on pancakes.  This doesn’t mean you won’t get syrup if you don’t order pancakes, but that you are less likely to get your distilled tree blood without first ordering height challenged cake.

Was I really due to talk about probability?  No, there is a very large set of topics I could discuss.  With the passing of each article, the likelihood of my delving into probability would not change overly much.   If you hear a probability brought up just ask yourself – is it independent or dependent.  Independent – then the odds never change.  Only dependent occurrences can affect the odds.  With each toast flip we may wish we could see more Hawking, but alas we’re no more likely than the last time.

Episode 6 : This is just an average article

Here’s an oddity I ran across that I bet you didn’t know – people on average will do 3 silly walks a day.  A silly walk is defined as any walk not the standard one foot in front of the other such as: skipping, hopping, galloping, shuffling, moonwalking, sliding, crab walking, levitating, etc.  Pretty weird right that we do this.   I mean, who would have thought.

Speaking of mean, that brings up a question.  What does average even mean?  Well, doing some research we find that it’s just the sum of the values divided by the number of values.  In our silly walking case we have some number of people asked and how many ambulatory variations in total they performed.   Sounds pretty simple to do. 

Our study included six participants (I know, kind of a small sample set but this is just for demonstration).  Five of the those asked performed zero silly walks (wait, what?).  The sixth person, who it turns out is a huge Monty Python fan, does an astounding 18 silly walks per day.  If we do the quick math the total number of silly walks is 18 (just the one person since the rest did zero) divided by the 6 people which gives us 3.  You should be thinking, um wait, only one person did the walks, but the average person does 3 yet the rest did zero.  Something doesn’t seem right.

You are totally correct.  Averages are simple yet they hide a dangerous flaw.  They like evenly distributed data.  Now I’m trying to keep these articles math and statistic lite, so we’ll keep the definition of distribution pretty simple.  In evenly distributed numbers, there isn’t a lot of variation from one number to the next when they are sorted (stat people, I know this is super simplified).  Our data would look like: 0,0,0,0,0,18.  Everything is good until those last two when we jump from 0 to 18 – it’s a big change.  That’s what’s called an outlier – that means it lies outside the range of the rest of the numbers.  Sets of data like this can really mess with the average and render it pretty meaningless.  Obviously, most people do no silly walks.

If the data was better, we might have seen it look like this: 1,2,2,4,4,5.  There are still six observations and they total to 18.  So, while no one asked performed an exact 3 silly walks per day, 3 is the average.  We can see that there is no large change from one number to the next in the ordered set, so they are pretty evenly distributed.  In this case, the average is probably much more trustworthy.

The important question to ask when you see an average then is what the underlying numbers looked like.  Were they even distributed?  We’re there any outliers?  If there were outliers, we’re they removed (sometimes it makes sense to remove them, other times they are important, but that’s maybe another article).  Averages surround us.  If you start looking you will likely see them everywhere.  On average, how many do you see a day?  The danger is that you probably know nothing about the underlying data and you can’t assume the person making the number did either.  Averages are easy to abuse to make something come out the way you want even if it’s not correct just like I did with the silly walks data. 

Averages are just one type of method for checking the centricity of a set of data.  You may often hear average also called “mean” – that’s the more accurate name.  Re-read those first few paragraphs for some foreshadowing.  The other two are the median (what number is actually at the center) and the mode (what number occurs the most often).  I give this bit of reference as we’ll probably see them later.

Like most critical thinking exercises, if you can see the underlying data or at least know the sample population and logic used, you might find the number is just fine.  If you can’t find the data or no logic is given, there’s that flag to say it might be suspect.  This is especially frustrating when you hear averages thrown around on a news program and you have no way to check it.  If you find yourself with the chance to question the data, go ahead, on average you might find the answers enlightening.

Episode 5: Can you confirm that negation?

I’m still a bit new to this whole writing (and maybe educating a bit too, no?) thing.  That leads me to look for information on how to write better, get into the flow more quickly, keep my thoughts aligned, engage the reader, etc.  And there are a whole host of helpful articles and speeches out there such as : 5 best socks an aspiring writer should wear, dental habits of successful authors, hat choices to make the most of your next blogging experience, and so on.

What does reading articles like these make me do?  If you said “think”, then you’d be right.  If you said “eat chocolate chip cookies”, well, you’re also right.  I might find out that 80% of bloggers that were successful always wrote while wearing a beret.  What none of those pieces ever tell me is what did 80% of the unsuccessful writer’s wear?  It’s entirely possible that they also wore berets.  These types of articles never tell us the opposite.

Learning to ask such questions is known as “testing the negative” and it’s a really important concept.  Plus, it’s one of the best ways to help avoid the dreaded “confirmation bias”.  We inherently like things that agree with what we think.  A recent study might show that employees who nap at work are more productive.  What?  I like naps, why can’t I do this? But what is the flip side of the study?  How did those employees who didn’t nap fair?  Perhaps they had just average productivity but accomplished the same amount because they weren’t sleeping.

We are inundated with positive only articles.  Why?  Probably because we’re supposed to buy something.  Or maybe it just wants us to click on it (see episode 1).  The trick is to learn to ask yourself what is the inverse in the statement.  Did you know that 75% of Fortune 500 CEOs have two cups of coffee every morning (numbers for illustration only)?  Well, what do non-fortune 500 CEOs drink?  Could they have two cups of coffee too?  Or, how many people who drink two cups of coffee are Fortune 500 CEOs?  There are often several different ways to look at a number or results of a study, but we’re usually just given the one that confirms the statement.

This type of thinking is useful in a wide variety of situations to keep us from blindly doing or agreeing with things.  As someone who is aspiring to do a bit of writing, I’ll hear that a good way to get better is simply to get up early and write for at least 1 hour every day.  OK, sure, sounds plausible.  I’ll bet there are people who did this and got better.  But how many people did this and it made no difference, or perhaps even made them worse (maybe stress and schedule actually reduced their creativity)?

You can also learn to counter anecdotal wisdom with this kind of thinking.  Putting on my pants left leg first has made me the ink connoisseur that I am.  OK, but how many ink connoisseurs put their pants on right leg first?  How many don’t wear pants at all?

Testing the negative isn’t just a way to throw out bad information.  It can also serve to reinforce something.  70% of people who engaged in at least 20 mins of strenuous exercise each day had better overall lung capacity?  Well, we could ask what percentage engaged in the activity and saw no increase or even a decrease and if we found that 25% saw no increase and 5% saw a decrease then we know the 70% probably isn’t too bad.  Exercise is generally good for you.  We could ask how many people engaged in no strenuous activity each day and saw an increase in lung capacity.  If that is small, maybe 5%, then we know that the inverse is helping to show the validity of the main claim.  Again, these numbers are for illustration only.

So, go ahead, be a little more negative.  Testing the inverse is a skill and like most it takes practice (how many got better without practice, you should ask).  A bit of the negative might just make you more positive. 

Episode 4: Why County Fairs Need Free Ice Cream

I was recently told there’s been a resurgence of interest in county fairs.  You know the kind: local vendors, tons of awesomely bad-for-you food, a Ferris wheel, some animals, tasting hot sauces and these days wine sampling.  There really is something for everyone and I can enjoy the heck out of them (urge for funnel cakes rising).  Seems totally reasonable that their popularity is on the rise.

Of course, as a good denizen of the data world, I can’t leave well enough alone and I just have to ask “How do you know there’s a resurgence?  Heck, what does resurgence even mean?”.  Believe it or not, I got some numbers.  As we know, I live in East Kintertownsylvania (home to the world famous smallest ball of yarn!), and there is one fair held every year.  In 2017 the attendance of the 3 day fair was 12,295 people.  In 2000 the attendance was 8,374.  That’s over a 30% increase, not bad at all.

Is it the whole story, though?  I feel like we’re missing something.  What was it?  Yes, that’s right, what were the East Kintertownsylvania populations at these two periods in time?  Just knowing that some number got bigger over time usually isn’t enough to say it’s better.  I headed to the local courthouse, went into the basement and pulled out all the microfiche (the year 2000 was before the internet, right?) and discovered that the population was 54,291.  Which means, in 2000, about 15% of the population attended the fair.

Jumping forward, I googled the population of East Kintertownsylvania in 2017 and found out it was 76,397.  That’s a fair increase (see what I did there!) over 17 years, but everyone knows a yarn processing center opened up there in 2011 bringing in a lot of new people and businesses.  In 2017 then, the fair attracted 16% of the population, which is just one percent more than in 2000.  Hmmm, not as big of a rise as it seems.

Wait, there’s more.  We know the population went from 54,291 up to 76,397 meaning there was a near 29% increase in people over that time frame.  We know that the fair attendance only went up by 1% (15% vs 16%).  That means fair attendance certainly isn’t growing as fast as the population.  Even if we say attendance should have grown at half the population increase, that’s still an additional 8% on top of our 16% (meaning a total of 24%) which means attendance should have been about 18,000.

Unfortunately, then, fair attendance isn’t growing.  It’s actually declining.  By quite a bit too.  The fair organizers might want to look at ways to get more people.  I’m thinking free ice cream would do it.  Would certainly keep me attending.  Especially if its cookie dough flavored.

This article has a bunch of numbers, but there’s no need to fear them or get frazzled.  I did some simple things, basic percentages and the like.  The important take away is not to just trust some number you are given that might (or might not) show some increase or decrease over time.  If someone says they did checks like these and can provide them, that might be all you need to know the number is sound.  If you see no supporting info, and can’t find any other evidence yourself, then perhaps it’s not worth trusting.  Percentages can be easy to “sniff check” too – you don’t need to do major math (even though all us have high powered calculators in our pockets at all times), it’s pretty simple to just half or quarter a number and see if things look correct-ish.  I also made an assumption of half the population increase should have attended fairs – and I stated it.  Some assumptions aren’t stated.

Definitions are important and I know I said we should find out what “resurgence” even means.  In this case, looks like we don’t have to bother though.  I’m sure it’ll come up in another article.  For now, I need a funnel cake with cookie dough ice cream on top!

Episode 3 : Quick! Hide in that percentage!

Did you know that an astounding 92% of survey respondents said that chocolate chip cookie dough ice cream was the best flavor?  Or that only a mere 11% of people prefer sausage over peperoni as their primary pizza topping.  If you are a fan of cookie dough and peperoni then these numbers look great.  Shout them from the rooftops!

You already know what’s coming next, don’t you?  You’re at least 80% sure I’m about to say “but”.  Well, congratulations, you’re right.

But wait, what do those percentages even mean?  We often see numbers that look fine and seem like they were based on something good.  They’re numbers, right?  Numbers are good, scientific and mathematical things.  Aren’t they?

Percentages are tricky because they often hide what went into producing them.  On the cookie dough survey, just how many respondents were there?  What if there were only 13 respondents and 11 of those liked cookie dough.  While it’s a small group, still most liked cookie dough. Then I tell you the survey was done at the International Chocolate Chip Cookie Festival (wouldn’t that be awesome, by the way!) and you wonder if there was a bit of bias.  You should also be saying, um, 11 divided by 13 is 85%, not 92%, and you’d be right.  I also didn’t mention that one of the respondents said anchovies was the best ice cream flavor – because that person got the surveys confused and I excluded the response as an invalid outlier.  But should I have excluded it?  I never told you what my survey criteria was, or possible responses were, or what I considered valid.  That 92% isn’t looking so credible anymore is it?

The peperoni numbers shared above - let’s say there were 100 respondents (a little bit bigger population) and the question was do you prefer peperoni or sausage on your pizza.  That leaves a nice binary answer, so nothing weird can show up.  And we know that just 11 people checked sausage, so the math works out.  But who did this survey?  When you find out it was sponsored by the Mid Atlantic Peperoni Foundation, then maybe even those nice clean numbers are suspect.  Perhaps they had a bit of bias in who was surveyed.

When you see a nice clean percentage given to you, the first thing you should ask is what are the underlying numbers.  If those aren’t given, then right away it’s a suspect number.  If you see the underlying numbers, then ask if those seem valid?  If it was a survey, were there enough respondents, what was the methodology, were values removed or corrected.    The last big piece is who created the number or commissioned the study.  Who created it might reveal a bias that wasn’t apparent.

Is every percentage you see untrustworthy?  Well, no.  However, you shouldn’t just blindly accept a number without seeing what’s behind it.  There might be more lurking in there than you thought.

Episode 2 : Dr. Evil, Quiz Master

I’ve been working on a new online quiz that will let me guess your favorite brand of headlight fluid.  It’s just a few questions, it’s easy, and you’ll love it. 

Question 1: What is your mother’s maiden name?

Question 2: What hospital were you born in?

Question 3: What is your social security #?

With answers to those 3 simple questions, I can accurately predict that you just love using Brando’s Headlight Fluid.  And rightly so, it is the only brand used by true headlight aficionados.

But wait, you say, I thought I wasn’t supposed to give out my social security #.  Well, you’d be right.  That’s a really bad thing to give out.  Mother’s maiden name, hospital of birth, those are OK, right?  No, definitely not.  Absolutely not, no, no, NO!

You see, routine queries for personal bits of information might seem familiar to you.  These little pearls of personal stuff are often the answers to all the secret questions you created so you can reset your password.  This quiz, and heck, very few of them ever, were meant to provide you with amusement.  They exist only to get you to willingly give up exactly the information needed to hack your accounts.

Some are really nefarious.  All those “Only people from Timbuktwo will know these facts” are just looking for information on you.  Yeah, we often use hometown things as secret questions and answers.  When someone comments on the post, they might mention “Dirty Pig’s Foot was my absolute favorite restaurant growing up” and readily volunteer something it didn’t even ask for.  I’ve seen another that wants to guess your first car, because that’s a regular security question!  And the kicker is it doesn’t even need to guess correctly, many people will comment it was nowhere close, and then write in what their first car was.  That’s some Bugs Bunny level social engineering right there.

All those seemingly silly quizzes where you can find out your fantasy hand model name … they want your birth date, street you grew up on, first pet’s name, best use of a bob haircut, etc.  They all just gather a nice little bundle of information on everyone who uses them.

What can someone do with this?  Pretty simple, go to any kind of service that has a password reset function (especially one that doesn’t send you an automated link or have 2 factor authentication), look at the security questions that come up and see if those answers were already provided somewhere, enter them, reset the password and boom, that person now has your account.

Best thing to do when you see some online quiz? Ignore them.  Simply ignore them.  Maybe they’ll eventually go away and stop plaguing us all.

Episode 1 : Don't Share Your Pizza

You’ve just seen the perfect post, the absolute best news article that aligns precisely with what you believe.  The share button is calling out to you.  You must give this to everyone you know.  But stop.  That’s right, just stop.  I know, that button is so easy to click, but it won’t go anywhere, you’ve got some time to think.

Why is that article you want to share so badly just so perfect?  If you had a check list in your head of things you believe are true about a certain subject, would it check all of them?  If it did and the article gave you nothing to question there might be a reason.  The article may want you to share it.  OK, really, the writer of the article wants you to share, the article itself isn’t conscious and aware (hopefully).  Many articles, posts, blogs, news items and such can get more money for more likes, shares, upvotes, etc.  People are more likely to share things they like and agree with.  And since you likely have friends with similar views and attitudes, that little article can just get all kinds of likes and shares. 

So, you ask, what’s the problem?  Can I share it yet, please?  That article might not have been entirely truthful, or at all truthful in fact.  The author might have written whatever was necessary just to get the piece shared.  For instance, let’s say you like pizza, wish it was a truly healthy, nutritious meal to eat every day and then you come across a report from a very professional science type of place that says they did a study and found that people who ate pizza every day had no greater incidence of health problems than those who didn’t.  Plus, it says those who ate thin crust with peperoni were actually healthier than the control group – I mean, wow, this confirms everything I’ve ever wanted.  I better tell the world!

But was the article real, or are you about to share something either partially or totally fabricated?  Who did the study – is it someone you can look up?  Who paid for it (was it “Big Pizza”)?  Were the results independently verified?  How many people were in the study?  (we’ll talk about study sizes later, but for now just think small numbers = bad) If some or all of these questions are hard to answer, you might be looking at an article that just wants you to share it, with little to no truth or facts to be found.  It might have been perfectly crafted to tick all the right boxes just so it could propagate itself.  But now that you know, you paused, and maybe won’t hit share.  You might even feel like washing your hands.

We’re surrounded by news and media and all manner of things wanting to get our attention.  The people crafting these things are looking for any way to get into our conscious minds.  But telling the real from the fake is getting harder and harder.  Luckily, we can learn, adapt and apply some thinking to help us out and not let these things spread so far.  Don’t worry, it doesn’t involve researching citations or learning statistics!

First tip – if you hear something that perfectly aligns with your own thinking, assume that was intended and ask yourself why.  Someone may just be playing on your own likes to, well, get some likes.