Velvet Star Monitor

Standout celebrity highlights with iconic style.

updates

Stem and Leaf Plot for Strongly Skewed Data

Writer Sophia Terry
$\begingroup$

How to draw a stem and leaf plot for the following large data, the data is for different EU states so there is a large difference between the data. I have to plot stem and leaf for each table i.e. area, population as well as size of motorway. How do I arrange it in stem and leaf,How many stems would be there (let's say for motorway length) do I have to split the data. Do I have to round or truncate the data?

I have arranged the length of motorway in ascending order;

0, 0, 11, 140, 152, 257, 309, 392, 419, 541, 644, 751, 770, 810, 897, 1295, 1340, 1419, 1482, 1515, 1719, 1763, 1891, 2005, 2127, 2631, 2988, 3686, 6726 11465, 12917, 14701

enter image description here

$\endgroup$ 2

1 Answer

$\begingroup$

I would not guess that a stemplot is the best way to visualize these data, and so I would not recommend one way of setting up the stems as better than another. Particularly so, because the data are strongly skewed to the right, spanning a couple of orders of magnitude.

Nevertheless, stem plots can be made: Here is the default stemplot of these 32 observations from R, followed by the one from Minitab, in which the line beginning (8) has eight observations, one of which is the median 1318.

 R The decimal point is 3 digit(s) to the right of the | 0 | 00012334456888933455789 2 | 01607 4 | 6 | 7 8 | 10 | 5 12 | 9 14 | 7 Minitab: Stem-and-leaf of Motorway N = 32 Leaf Unit = 100 15 0 000112334567788 (8) 1 23445778 9 2 0169 5 3 6 4 4 4 5 4 6 7 3 7 3 8 3 9 3 10 3 11 4 2 12 9 1 13 1 14 7

With the parameter scale=.5 the R function stem returns the abbreviated stemplot below.

 The decimal point is 4 digit(s) to the right of the | 0 | 0000000001111111111222222334 0 | 7 1 | 13 1 | 5

I do not see how to make a stemplot on any scale without losing some of the detail of the data. If information is to be lost in making a graphical presentation of the data, perhaps a histogram is a better choice. Below is a Minitab histogram of these data.

enter image description here

For some purposes, it might be better to make a histogram of $\log_{10}$ of motorway lengths for the 30 EU states that have motorways.

enter image description here

$\endgroup$ 3

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy