R program for Finance: Multiple Histograms in R

We talked about the drawing histogram in R.
[Drawing histogram in R]

What if there are two variables that need to be drawn? We can solve this issue just by adding "add=TRUE" option. Please, keep in mind that TRUE FALSE value has to be large capital in R.

[Codes]
iris <- read.csv(url("http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"), header = FALSE) #There are great sample data offered by UCI. Let's use this!

#This function allows to make your bar color transparent!
t_col <- function(color, percent = 50, name = NULL) {
# color = color name
# percent = % transparency
# name = an optional name for the color
## Get RGB values for named color
rgb.val <- col2rgb(color)
## Make new color using input color as base and alpha set by transparency
t.col <- rgb(rgb.val[1], rgb.val[2], rgb.val[3],
max = 255,
alpha = (100-percent)*255/100,
names = name)
## Save the color
return(t.col)
}

#Unfortunately, this
names(iris) <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species")

#Iris-setosa, Iris-virginica, Iris-versicolor

setosa <- iris[iris$Species=="Iris-setosa",]
virginica <- iris[iris$Species=="Iris-virginica",]
versicolor <- iris[iris$Species=="Iris-versicolor",]

mycolor<-rainbow(3)

hist(setosa$Sepal.Width, col=t_col(mycolor[1]), xlim=c(1.7, 5))
hist(virginica$Sepal.Width, col=t_col(mycolor[2]), add=TRUE)
hist(versicolor$Sepal.Width, col=t_col(mycolor[3]), add=TRUE)

[Result]

[Are we done?]

As you can see above, the graph that we've drawn is not beautiful. The size of the bin that each histogram has is not consistent. The term "Frequency" means that these data are not normalized in terms of the data size. Here's the way to get around issue. We are going to use "freq" option and "breaks" option.

"freq" option gets the data normalized, making sure that it looks to have the same number of data points. Above case, as the setosa have more data points than others, it dominates the graph in terms of the height. From the data visualization perspective, that's not good. We can do better.

"breaks" option ensures that each histogram has the same bin size. Above graph again, the red bar is wider than others. We are going to fix this issue with the break keyword. Again, seeing is believing.

hist(setosa$Sepal.Length, col=t_col(mycolor[1]), xlim=c(4.0, 8.5), ylim=c(0, 1.05), breaks=seq(4,8,by=0.25),freq=FALSE)
hist(virginica$Sepal.Length, col=t_col(mycolor[2]), add=TRUE, breaks=seq(4,8,by=0.25),freq=FALSE)
hist(versicolor$Sepal.Length, col=t_col(mycolor[3]), add=TRUE, breaks=seq(4,8,by=0.25),freq=FALSE)

Now, we have the graph that has the same bar width and the same height. That looks much better.

R program for Finance

Google Ad sense

Monday, April 11, 2016

Multiple Histograms in R

1 comment: