To illustrate what a stem leaf plot is, let's go back to the data set from the previous lecture. We had identified few outlier values visually by inspection. Now, let's load the data in R.
Now, let's create variables for each of the columns - building area and parking duration.
Now, we'll use something known as stem() function in R. To draw the stem leaf plot of building area (stored as "building_area", we'll use following syntax.
stem(building_area)
We get the following output -
The decimal point is 4 digit(s) to the right of the |
0 | 8
1 | 23356777899
2 | 44555677
3 | 05
What does this mean?
The digits on the left hand side indicates, starting digit of building area variables. Thus, values under building area start with either 0,1,2 or 3. The digits on the right hand side indicates immediate next digit after the first digit. For instance, there are two values starting with 3. First one has the second digit 0 and the next one has 5.
Thus, from the diagram, we can easily tell that most number of values start with 1 and least start with 0. Thus, outlier values will be the ones starting with 0 or 3.
Similarly, we can create a stem leaf plot for parking duration. Output is as shown.
The decimal point is 2 digit(s) to the right of the |
6 | 1
8 | 8826
10 | 247900349
12 | 114059
14 | 5
16 | 0
Here, the outlying values will be the ones starting with either 6 or 14 or 16.