Why plot on log scale




















The second plot sacrifices some immediate interpretability, particularly for non-statisticians because of this, I would normally now actually use a logarithmic scale on the axes, rather than transform the data and have the scale showing the logarithmic value , but gives a lot more visual differentiation. For example, you can clearly spot the few outliers which turned out to be data editing errors where total spend was less than spend in New Zealand.

Perhaps more importantly, you could use this graph with different colors or faceting to show how different market countries or purpose of visit eg holiday v. Turning this plot into something useful would involve somehow dealing with the high density data eg by adding some transparency to the points, or replacing points with hexagonal bins colored according to density but any useful visual solution will almost certainly involve logarithmic axes.

Another plot to illustrate what I meant by the hexagonal bins, using color to represent density when there is a large dataset in this case, about respondents to a survey about Rugby World Cup experiences in New Zealand.

Note again this is another example where I've used a logarithmic scale for expenditure. One other nifty thing about log scales is that they make ratios appear symmetric. For example, like this:. Sign up to join this community. The best answers are voted up and rise to the top. Stack Overflow for Teams — Collaborate and share knowledge with a private group. Create a free Team What is Teams? Learn more. When are Log scales appropriate?

Ask Question. Asked 9 years, 6 months ago. Active 1 year, 8 months ago. Viewed 82k times. Improve this question. In terms of charting, you may fruitfully interpret "dependent variable" as "y axis". Then take a look at the many closely related questions which have appeared here.

I had seen some of those, but not all and I'm working my way through them now. If we graph that on a normal scale, we might start with 0, then increase the scale in increments of 50, until we reach , that would capture our entire shrew to moose range.

Our axis would look like this:. What happens if we change to a log scale. This time, instead of equal increments of 50,, we are going to increase by an order of magnitude a factor of 10 for each step along the axis:.

Notice how this drastically changes where the actual numbers occur - in the normal graph, the majority of the scale is taken up by the range between 50, and , Some of this code may be difficult to understand without a familiarity with data science because it uses Pandas and NumPy.

This is ironic, as Pandas was created particularly to make working with table-type data easier. The first step takes the data we have created as a dictionary and converts it to a Pandas dataframe. The index for this data will be the company name. Next, we must convert the revenue strings to numbers. Otherwise they will be sorted by matplotlib as letters.

Now we want to create a series , which is a dataframe with only one column. The values will be the index of the previous dataframe. Next, we add the values of the series we just created as another column in the dat dataframe. Then we plot the scatter chart giving it dataframes for the x and y values. Shape is a sometimes difficult NumPy concept. It basically means the dimension of the array.

In an xy plot they must be the same. Using the base 2 avoids this problem. Next week we will discuss alternative ways of labeling log scales. Dot plot of data of Figure 2 shown on a log scale with base of A dot plot is judged by its position along an axis; in this case, the horizontal or x axis. A bar chart is judged by the length of the bar. That is a second reason that I prefer dot plots over bar charts for these data.

In Figure 2, the value of each tick mark is double the value of the preceding one. The top axis emphasizes the fact the data are logs. The bottom axis shows the values in the original scale.

This labeling follows the advice of William Cleveland with the top and bottom axes interchanged. The data values are spread out better with the logarithmic scale. This is what I mean by responding to skewness of large values. The revenue for Boeing is about 2 6 billion dollars while the revenue for Ford Motor is about 2 7. In Figure 1, the linear scale, the revenue for Ford is the revenue for Boeing plus the difference between these two revenues.

We call this additive. In Figure 2 the difference is multiplicative.



0コメント

  • 1000 / 1000