[Drawing histogram in R]
What if there are two variables that need to be drawn? We can solve this issue just by adding "add=TRUE" option. Please, keep in mind that TRUE FALSE value has to be large capital in R.
[Codes]
iris <- read.csv(url("http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"), header = FALSE) #There are great sample data offered by UCI. Let's use this!
#This function allows to make your bar color transparent!
t_col <- function(color, percent = 50, name = NULL) {
# color = color name
# percent = % transparency
# name = an optional name for the color
## Get RGB values for named color
rgb.val <- col2rgb(color)
## Make new color using input color as base and alpha set by transparency
t.col <- rgb(rgb.val[1], rgb.val[2], rgb.val[3],
max = 255,
alpha = (100-percent)*255/100,
names = name)
## Save the color
return(t.col)
}
#Unfortunately, this
names(iris) <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species")
#Iris-setosa, Iris-virginica, Iris-versicolor
setosa <- iris[iris$Species=="Iris-setosa",]
virginica <- iris[iris$Species=="Iris-virginica",]
versicolor <- iris[iris$Species=="Iris-versicolor",]
mycolor<-rainbow(3)
hist(setosa$Sepal.Width, col=t_col(mycolor[1]), xlim=c(1.7, 5))
hist(virginica$Sepal.Width, col=t_col(mycolor[2]), add=TRUE)
hist(versicolor$Sepal.Width, col=t_col(mycolor[3]), add=TRUE)
[Result]
[Are we done?]
As you can see above, the graph that we've drawn is not beautiful. The size of the bin that each histogram has is not consistent. The term "Frequency" means that these data are not normalized in terms of the data size. Here's the way to get around issue. We are going to use "freq" option and "breaks" option.
"freq" option gets the data normalized, making sure that it looks to have the same number of data points. Above case, as the setosa have more data points than others, it dominates the graph in terms of the height. From the data visualization perspective, that's not good. We can do better.
"breaks" option ensures that each histogram has the same bin size. Above graph again, the red bar is wider than others. We are going to fix this issue with the break keyword. Again, seeing is believing.
hist(setosa$Sepal.Length, col=t_col(mycolor[1]), xlim=c(4.0, 8.5), ylim=c(0, 1.05), breaks=seq(4,8,by=0.25),freq=FALSE)
hist(virginica$Sepal.Length, col=t_col(mycolor[2]), add=TRUE, breaks=seq(4,8,by=0.25),freq=FALSE)
hist(versicolor$Sepal.Length, col=t_col(mycolor[3]), add=TRUE, breaks=seq(4,8,by=0.25),freq=FALSE)
Now, we have the graph that has the same bar width and the same height. That looks much better.
Positive site, where did u come up with the information on this posting?I have read a few of the articles on your website now, and I really like your style. Thanks a million and please keep up the effective work.
ReplyDeleteCompany Formation Singapore