Suppose there is a bus with a capacity of 50 passengers. It makes numerous trips every day. Suppose we collected data on a given day for 60 trips. We recorded number of passengers in the bus in each trip. The data is attached along with this lecture.
Now, suppose from the collected data, we want to conclude whether average bus occupancy is more than 30 or not. How do we do that?
We first check whether the data is normally distributed or not. We can do so visually by plotting histogram. Chart is as follows.
It is evident that the chart depicts a normal distribution with a bell shape.
Now, we do a hypothesis test. First let's find mean of the data.
We can do it in Excel by using Average() function.
Average value comes out to be 25.
Next, let's find standard deviation of the data. We get 8.8.
Now, we do something known as a hypothesis test. Our hypothesis is that the average occupancy of the bus is less than or equal to 30. An alternative to this hypothesis will be that bus occupancy is greater than 30.
Actually, we want our alternative hypothesis to be true. So, we are finding the probability that it is not true. If the probability is high, then we can conclude the alternative is not true.
To find the probability, we will find the Z-score for 30. We know Z-score = (30-25)/8.8 = 0.568
From the Z-table, we get probability as 0.7157. This is the area under normal distribution curve which is on the left hand side of the value 30.
Hence, probability that bus occupancy is less than or equal to 30 is 0.7157. This is quite high. Hence, we can't reject our main hypothesis that bus occupancy is less than or equal to 30.
Thus, our alternative hypothesis is false. Hence, we can safely conclude that on an average bus occupancy is not greater than 30.
P-Value : The probability is also called p-value. For instance, in above example, p-value was 0.7157