Sunday, February 8, 2009

Statistically Significant fluff and Testing 101

Back to multi variate testing or to be honest any testing... so how often do you hear the first or even the only question asked when results are presented of whether they are "statistically significant". It's all well, if it's not the only and hopefully not the first question someone asks you about your test, but if it is - you know they know very little, if anything about testing! Well, they've given themselves away :)

Statistical significance is an important factor in considering the results, but it has nothing to do with wether you a looking at a 'good' test. You can run a test, which is terribly designed and produces absolutely meaningless results, but still ignorantly claim that if I do the sample size calculation I should have statistically significant results. Well, so what, you still haven't learned anything and if you just use those results, you only took a step back not forward, wasted your time and resources.

So, here is a crash course on Testing. Typically a test is conceived when you have 'specific' questions you would like to answer. It doesn't matter whether you are testing a page of your website or a wing for a new aircraft, testing is testing. The only difference will be the complexity of the test or the type of test, the defining variables involved in a test, the conditions under which the test is conducted that are relevant to that individual test, and of course very important aspects such as assumptions taken and constraints built into your particular test. These last two are the most critical aspects for your result interpretation.

In summary here are 'sample' questions you should ask yourself when designing each test, and this is just a sample:

What is the most cruitial question I would like to answer conducting this test?
What are my constraints?
Can I work within the given constraints?
- What are some possible ways I can still design a valid test to answer my questions with these constraints?
- How will these contraints impact result interpretation, i.e. will I still be able to answer my question?
What assumptions can I take in order to be able to conduct this test?
Are these valid assumptions?
How will these assumptions play in my result interpretation, and will my answer to the question, i.e. result, still be valid?

If any of the answers to these questions are 'no' you need to do more brainstorming and perhaps do a pre-test or thorough research to answer or see if someone already answered that other question before you begin your original test.

Once the 'sample' main questions are answered, then you dig into the details and decide on the test structure, location / conditions, variables, inputs, duration / time period, initial conditions / starting point, e.t.c.

So when it comes to result interpretation, knowing how the test was conducted (i.e. at least some of the things that the questions above answer) and what assumptions were taken and under what conditions the test was run, what initial conditions were taken, e.t.c. will give you the greatest insight on how to interpret results. Once you're comfortable with the way to interpret these results, then you ask for statistical significance just to be sure that the results also make sense from a statistical point of view.