One sample & Two Sample t-tests in Python

I have a doubt here on how to work this. New to the world of Stats and Python. A student is trying to decide between two Processing Units. He want to use the Processing Unit for his research to run high performance algorithms, so the only thing he is concerned with is speed. He picks a high performance algorithm on a large data set and runs it on both Processing Units 10 times, timing each run in hours. Results are given in the below lists TestSample1 and TestSample2.

from scipy import stats
import numpy as nupy
TestSample1 = nupy.array([11,9,10,11,10,12,9,11,12,9])
TestSample2 = nupy.array([11,13,10,13,12,9,11,12,12,11])

Assumption: Both the dataset samples above are random, independent, parametric & normally distributed

Hint: You can import ttest function from scipy to perform t tests

First T test One sample t-testCheck if the mean of the TestSample1 is equal to zero.

Null Hypothesis is that mean is equal to zero.
Alternate hypothesis is that it is not equal to zero.

Question 2Given, 1. Null Hypothesis : There is no significant difference between datasets 2. Alternate Hypothesis : There is a significant difference Do two-sample testing and check whether to reject Null Hypothesis or not.

Question 3 - Do two-sample testing and check whether there is significant difference between speeds of two samples: - TestSample1 & TestSample3

He is trying a third Processing Unit - TestSample3.

TestSample3 = nupy.array([9,10,9,11,10,13,12,9,12,12])

Assumption: Both the datasets (TestSample1 & TestSample3) are random, independent, parametric & normally distributed

1 Answer

Question 1

The way to do this with SciPy would be this:

stats.ttest_1samp(TestSample1, popmean=0)

It is not a useful test to perform in this context though, because we already know that the null hypothesis must be false. Negative times are impossible, so the only way for the population mean of times to be zero would be if every time measured were always zero, which is clearly not the case.

Question 2

Here's how to do a two-sample t-test for independent samples with SciPy:

stats.ttest_ind(TestSample1, TestSample2)

Output:

Ttest_indResult(statistic=-1.8325416653445783, pvalue=0.08346710398411555)

So the t-statistic is -1.8, but its deviation from zero is not formally significant (p = 0.08). This result is inconclusive. Of course it would be better to have more precise measurements, not rounded to hours.

In any case, I would argue that given your stated setting, you do not really need this test either. It is highly unlikely that two different CPU perform exactly the same, and you just want to decide which one to go with. Simply choosing the one with the lower average time, regardless of significance test results, is clearly the right decision here.

Question 3

This is analogous to Question 2.

Velvet Star Monitor

One sample & Two Sample t-tests in Python

1 Answer

Your Answer

Sign up or log in

Post as a guest

Similar Journal

Ability timers increasing when overused

How do I complete the "Everyone's A Critic" mission?

Which versions of Final Fantasy VI include multiplayer battle support?

What does Renown do?