Histogram with "negative" logarithmic scale in R
Solution 1
I am not sure I understand your goal, but when you want a log-like transformation yet have zeroes or negative values, the inverse hyperbolic sine transformation asinh()
is often a good option. It is log-like for large values and is defined for all real values. See Rob Hyndman's blog and this question on stats.stackexchange.com for discussion, details, and other options.
If this is an acceptable approach, you can create a custom scale for ggplot. The code below demonstrates how to create and use a custom scale (with custom breaks), along with a visualization of the asinh() transformation.
library(ggplot2)
library(scales)
limits <- 100
step <- 0.005
demo <- data.frame(x=seq(from=-1*limits,to=limits,by=step))
asinh_trans <- function(){
trans_new(name = 'asinh', transform = function(x) asinh(x),
inverse = function(x) sinh(x))
}
ggplot(demo,aes(x,x))+geom_point(size=2)+
scale_y_continuous(trans = 'asinh',breaks=c(-100,-50,-10,-1,0,1,10,50,100))+
theme_bw()
ggplot(demo,aes(x,x))+geom_point(size=2)+
scale_x_continuous(trans = 'asinh',breaks=c(0,1,10,50,100))+
scale_y_log10(breaks=c(0,1,10,50,100))+ # zero won't plot
xlab("asinh() scale")+ylab("log10 scale")+
theme_bw()
Solution 2
Realizing that the question is fairly old, I decided to answer it anyway since I ran into exactly the same problem.
I see that some answers above misunderstood your original question. I think it is a valid visualization question and I outline below my solution that will hopefully be useful for others as well.
My approach was to use ggplot
and create custom log transform for x
and y
axis (as well as custom break generators)
library(ggplot2)
library(scales)
# Create custom log-style x axis transformer (...,-10,-3,-1,0,1,3,10,...)
custom_log_x_trans <- function()
trans_new("custom_log_x",
transform = function (x) ( sign(x)*log(abs(x)+1) ),
inverse = function (y) ( sign(y)*( exp(abs(y))-1) ),
domain = c(-Inf,Inf))
# Custom log x breaker (...,-10,-3,-1,0,1,3,10,...)
custom_x_breaks <- function(x)
{
range <- max(abs(x), na.rm=TRUE)
return (sort( c(0,
sapply(0:log10(range), function(z) (10^z) ),
sapply(0:log10(range/3), function(z) (3*10^z) ),
sapply(0:log10(range), function(z) (-10^z) ),
sapply(0:log10(range/3), function(z) (-3*10^z) )
)))
}
# Create custom log-style y axis transformer (0,1,3,10,...)
custom_log_y_trans <- function()
trans_new("custom_log_y",
transform = function (x) ( log(abs(x)+1) ),
inverse = function (y) ( exp(abs(y))-1 ),
domain = c(0,Inf))
# Custom log y breaker (0,1,3,10,...)
custom_y_breaks <- function(x)
{
max_y <- length(x)
range <- max(abs(max_y), na.rm=TRUE)
return (sort( c(0,
sapply(0:log10(range), function(z) (10^z) ),
sapply(0:log10(range/3), function(z) (3*10^z) )
)))
}
ggplot(data=mat) +
geom_histogram(aes(x=x,fill=..count..),
binwidth = 1, color="black", size=0.1) +
scale_fill_gradient("Count", low = "steelblue", high = "red") +
coord_trans(x="custom_log_x",y="custom_log_y") +
scale_x_continuous(breaks = custom_x_breaks(mat$x)) +
scale_y_continuous(breaks = custom_y_breaks(mat$x)) +
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5)) +
theme_bw()
which gives me the following plot.
Note that:
- the plot also includes coloring scheme to show visually the absolute value of each bar.
- the bins become increasingly thinner as
x
increases (side effect of log-transform)
In either case, the two outliers are clearly visible.
Solution 3
I found a way to cheat on it. I say "cheat", because it actually plot negative and positive parts of the data separately. Thus you can not compare the negative and positive data. But only can show the distribution of negative and positive parts separately.
And one of the problem is if there is zero values in your data, it will not be shown in the plot.
reverselog_trans <- function(base = exp(1)) {
trans <- function(x) -log(x, base)
inv <- function(x) base^(-x)
trans_new(paste0("reverselog-", format(base)), trans, inv,
log_breaks(base = base),
domain = c(1e-100, Inf))
}
quartz();
dist1 <- ggplot(data=df.meltFUAC) +
geom_point(alpha=1,aes(x=deltaU.deltaUltrasensitivity,y=deltaF.deltaFitness,
colour=deltaF.w_c)) +
scale_x_continuous(name = expression(Delta * S[ult]),
limits=c(1e-7,1),trans = "log10",breaks=c(1e-01,1e-03,1e-05),
labels=c("1e-01","1e-03","1e-05")) +
scale_y_continuous(name = expression(paste(Delta, " Fitness")),trans = "log10",
limits = c(1e-7,1), breaks=c(1e-01,1e-03,1e-05),
labels=c("1e-01","1e-03","1e-05")) +
theme_bw() +
theme(legend.position = "none", axis.title.x=element_blank(),strip.background=element_blank(),
panel.border=element_rect(colour = "black"),panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),plot.background=element_blank(),
plot.margin=unit(c(0,0,0,-11),"mm"))
dist2 <- ggplot(data=df.meltFUAC, aes(x=-deltaU.deltaUltrasensitivity,y=deltaF.deltaFitness,
colour=deltaF.w_c)) +
geom_point(alpha=1) +
scale_x_continuous(name = expression(Delta * sqrt(S[ult] %.% S[amp])),limits=c(1,1e-7),
trans = reverselog_trans(10),breaks=c(1e-01,1e-03,1e-05),
labels=c("-1e-01","-1e-03","-1e-05")) +
scale_y_continuous(name = expression(paste(Delta, " Fitness")),trans = "log10",
limits = c(1e-7,1), breaks=c(1e-01,1e-03,1e-05),
labels=c("1e-01","1e-03","1e-05")) +
theme_bw() +
theme(legend.position = "none",strip.background=element_blank(),panel.border=element_rect(colour = "black"),
axis.text.y=element_blank(), axis.ticks.y=element_blank(), axis.title.y=element_blank(),
axis.line.y=element_line(colour="black",size=1,linetype="solid"),axis.title.x=element_blank(),
panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),
plot.margin=unit(c(0,-8,0,2.5),"mm"))
hist0 <- ggplot(data=df.meltFUAC, aes(deltaF.deltaFitness,fill=deltaF.w_c)) +
#geom_histogram(alpha = 0.5, aes(y=..density..),position = 'identity') +
geom_density(alpha = 0.5, aes(colour=deltaF.w_c)) +
scale_x_continuous(name = expression(paste(Delta, " Fitness")),
limits=c(1e-7,1),trans = "log10",breaks=c(1e-01,1e-03,1e-05),
labels=c("1e-01","1e-03","1e-05")) +
scale_y_continuous(name = "Density", limits=c(0,0.6)) +
theme_bw() +
theme(legend.position = "none", axis.title.x=element_blank(),strip.background=element_blank(),
axis.text.y=element_blank(), axis.ticks.y=element_blank(), axis.title.y=element_blank(),
axis.text.x=element_blank(), axis.ticks.x=element_blank(), axis.title.x=element_blank(),
panel.border=element_rect(colour = "black"),panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),plot.background=element_blank(),
plot.margin=unit(c(0,5,2.5,-2.5),"mm")) +
coord_flip()
hist1 <- ggplot(data=df.meltFUAC, aes(deltaU.deltaUltrasensitivity,fill=deltaF.w_c)) +
#geom_histogram(alpha = 0.5, aes(y=..density..),position = 'identity') +
geom_density(alpha = 0.5, aes(colour=deltaF.w_c)) +
scale_x_continuous(name = expression(Delta * S[ult]),
limits=c(1e-7,1),trans = "log10",breaks=c(1e-01,1e-03,1e-05),
labels=c("1e-01","1e-03","1e-05")) +
scale_y_continuous(name = "Density", limits=c(0,0.6)) +
theme_bw() +
theme(legend.position = "none", axis.title.x=element_blank(),strip.background=element_blank(),
axis.text.y=element_blank(), axis.ticks.y=element_blank(), axis.title.y=element_blank(),
axis.text.x=element_blank(), axis.ticks.x=element_blank(), axis.title.x=element_blank(),
axis.line.x=element_line(colour="black",size=1,linetype="solid"),
panel.border=element_rect(colour = "black"),panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),plot.background=element_blank(),
plot.margin=unit(c(5,0,-2.5,2),"mm"))
hist2 <- ggplot(data=df.meltFUAC, aes(-deltaU.deltaUltrasensitivity,fill=deltaF.w_c)) +
#geom_histogram(alpha = 0.5, aes(y=..density..),position = 'identity') +
geom_density(alpha = 0.5, aes(colour=deltaF.w_c)) +
scale_x_continuous(name = expression(Delta * S[ult]),limits=c(1,1e-7),
trans = reverselog_trans(10),breaks=c(1e-01,1e-03,1e-05),
labels=c("-1e-01","-1e-03","-1e-05")) +
scale_y_continuous(name = "Density", limits=c(0,0.6)) +
theme_bw() +
theme(legend.position = "none",strip.background=element_blank(),panel.border=element_rect(colour = "black"),
axis.text.y=element_blank(), axis.ticks.y=element_blank(), axis.title.y=element_blank(),
axis.text.x=element_blank(), axis.ticks.x=element_blank(), axis.title.x=element_blank(),
axis.line.y=element_line(colour="black",size=1,linetype="solid"),
axis.line.x=element_line(colour="black",size=1,linetype="solid"),
panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),
plot.margin=unit(c(5,-8,-2.5,2.5),"mm"))
grid.newpage();
pushViewport(viewport(layout = grid.layout(3, 3, widths = unit(c(4,4,2),"null"),
heights=unit(c(2,7.5,0.5),"null"))));
vplayout <- function(x, y) viewport(layout.pos.row = x, layout.pos.col = y);
print(dist2, vp = vplayout(2, 1));
print(dist1, vp = vplayout(2, 2));
print(hist2, vp = vplayout(1, 1));
print(hist1, vp = vplayout(1, 2));
print(hist0, vp = vplayout(2, 3));
grid.text(expression(Delta * Ultrasensitivity),vp = vplayout(3,1:2),x = unit(0.55, "npc"),
y = unit(0.9, "npc"),gp=gpar(fontsize=12, col="black"));
dev.copy2pdf(file=sprintf("%s/_dist/dist_hist_deltaF_deltaU_wc_01vs10.pdf", resultDir));
dev.off();
Here is the graph it got (but you need to manually to put the legend on):
Or a simpler one:
reverselog_trans <- function(base = exp(1)) {
trans <- function(x) -log(x, base)
inv <- function(x) base^(-x)
trans_new(paste0("reverselog-", format(base)), trans, inv,
log_breaks(base = base),
domain = c(1e-100, Inf))
}
quartz();
hist1 <- ggplot(deltaF, aes(deltaFitness,fill=w_c)) + guides(fill=guide_legend(title=expression(omega[c]))) + geom_histogram(alpha = 0.5, aes(y=..density..),position = 'identity') + geom_density(alpha = 0.05, aes(colour=w_c)) + scale_x_continuous(name = expression(paste(Delta, " Fitness")),trans = "log10");
hist1 <- hist1 + scale_y_continuous(name = "Density", limits=c(0,1));
#hist1 <- hist1 + theme(panel.background=element_blank(),panel.border=element_blank(),axis.line.x=element_blank(),axis.line.y=element_line(colour="black",linetype="solid",size=1),axis.title.x=element_blank(),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),plot.margin=unit(c(5,5,0,5),"mm"));
hist1 <- hist1 + theme_bw();
hist1 <- hist1 + theme(strip.background=element_blank(),panel.border=element_rect(colour = "black"),axis.title.x=element_blank(),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),plot.margin=unit(c(5,5,0,5),"mm"));
hist1 <- hist1 + scale_color_discrete(name=expression(omega[c]));# + geom_vline(xintercept=0, colour="grey", size = 1);# + geom_hline(yintercept=0, colour="grey", size = 0.5);
hist2 <- ggplot(deltaU, aes(deltaUltrasensitivity,fill=w_c)) + geom_histogram(alpha = 0.5, aes(y=..density..),position = 'identity') + geom_density(alpha = 0.05, aes(colour=w_c)) + scale_x_continuous(name = expression(paste(Delta, " Ultrasensitivity")), limits=c(1e-7,1),trans = "log10",breaks=c(1e-01,1e-03,1e-05),labels=c("1e-01","1e-03","1e-05"));
hist2 <- hist2 + scale_y_continuous(name = "Density",limits=c(0,1)) ;#+ geom_vline(xintercept=0, colour="grey", size = 1);# + geom_hline(yintercept=0, colour="grey", size = 0.5);
#hist2 <- hist2 + theme(legend.position = "none", axis.title.x=element_blank(),panel.background=element_blank(),panel.border=element_blank(),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),plot.margin=unit(c(0,5,0,-7.5),"mm"));
hist2 <- hist2 + theme_bw();
hist2 <- hist2 + theme(legend.position = "none", axis.title.x=element_blank(),strip.background=element_blank(),panel.border=element_rect(colour = "black"),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),plot.margin=unit(c(0,5,0,-7.5),"mm"));
# + ggtitle("Positive part")
hist3 <- ggplot(deltaU, aes(-deltaUltrasensitivity,fill=w_c)) + geom_histogram(alpha = 0.5, aes(y=..density..),position = 'identity') + geom_density(alpha = 0.05, aes(colour=w_c)) + scale_x_continuous(name = expression(paste(Delta, " Ultrasensitivity")), limits=c(1,1e-7),trans = reverselog_trans(10),breaks=c(1e-01,1e-03,1e-05),labels=c("-1e-01","-1e-03","-1e-05"));
hist3 <- hist3 + scale_y_continuous(name = "Density", limits=c(0,1));# + geom_hline(yintercept=0, colour="black", size = 0.5);
#hist3 <- hist3 + theme(legend.position = "none",panel.background=element_blank(),axis.text.y=element_blank(), axis.ticks.y=element_blank(), axis.title.y=element_blank(),axis.line.y=element_line(colour="black",size=1,linetype="solid"),axis.title.x=element_blank(),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),plot.margin=unit(c(0,-7.5,0,5),"mm"));
hist3 <- hist3 + theme_bw();
hist3 <- hist3 + theme(legend.position = "none",strip.background=element_blank(),panel.border=element_rect(colour = "black"),axis.text.y=element_blank(), axis.ticks.y=element_blank(), axis.title.y=element_blank(),axis.line.y=element_line(colour="black",size=1,linetype="solid"),axis.title.x=element_blank(),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),plot.margin=unit(c(0,-7.5,0,5),"mm"));
# + ggtitle("Negative part")
grid.newpage();
pushViewport(viewport(layout = grid.layout(4, 2, widths = unit(c(5,5),"null"),heights=unit(c(4.6,0.4,4.6,0.4),"null"))));
vplayout <- function(x, y) viewport(layout.pos.row = x, layout.pos.col = y);
print(hist1, vp = vplayout(1, 1:2)); # key is to define vplayout
grid.text(expression(paste(Delta, " Fitness")),vp = vplayout(2,1:2),x = unit(0.5, "npc"), y = unit(0.9, "npc"),gp=gpar(fontsize=12, col="black"));
print(hist3, vp = vplayout(3, 1));
print(hist2, vp = vplayout(3, 2));
grid.text(expression(paste(Delta, " Ultrasensitivity")),vp = vplayout(4,1:2),x = unit(0.5, "npc"), y = unit(0.9, "npc"),gp=gpar(fontsize=12, col="black"));
dev.copy2pdf(file=sprintf("%s/deltaF_deltaU_wc_01vs10.pdf", resultDir));
dev.off();
Here is the graph I got:
Jim Bo
Updated on July 28, 2022Comments
-
Jim Bo almost 2 years
I have a dataset with some outliers, such as the following
x <- rnorm(1000,0,20) x <- c(x, 500, -500)
If we plot this on a linear x axis scale at this we see
histogram(x)
I worked out a nice way to put it on a log scale using this useful thread: how to use a log scale for y-axis of histogram in R? :
mat <- data.frame(x) ggplot(ee, aes(x = xx)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10()
However, I would like the x axis labels from this 2nd example to match that of the first example, except with a kind of "negative log" - i.e. first tick (moving from the centre to the left) could be -1, then the next could be -10, the next -100, but all equidistant. Does that make sense?
-
Jim Bo over 11 yearsThis looks good, I just need to get my head around it and make sure I have the xaxis in the right places!
-
Jim Bo over 11 yearsHmm - it seems to cause a "dip" in the histogram around 0.. If it helps clarify my question, I am looking for: something like the first histogram, but with the xaxis brought in - so it's plotting the same data, but instead of the xaxis marks being -600 -400 -200 0 200 400 600, i would like the marks to be something like -1000 -100 -10 -1 0 - 10 100 1000, with the numbers equal distance apart. As I understand it, this should lead to a more stretched out histogram, and bring in the outliers, but the peak would be around 0? Though I may be wrong.
-
Jim Bo over 11 yearsHmm - what if I perform 10^x and then plot on the log scale - then I just need to rejig the axes.. does that make sense?
-
MattBagg over 11 yearsNo the peak will not be around 0. You will get a dip because you are making unequal width bins and there are fewer items in smaller bins. Even if you make a custom mirrored log scale like I demo'd using ifelse() or another logical test to do different things for positive, zero, and negative values (i.e. -log(-x) for negative and 0 for 0), you will still have a histogram that has two peaks.
-
MattBagg over 11 yearsYou could calculate the bins and their contents outside of ggplot, maybe using cut() or Hmisc's cut2() and make sure that the bins around zero are wider so you get one peak. I think you would need to use geom_bar() and explicitly pass the x,y, and width rather than having ggplot calculate.
-
Jim Bo over 11 yearsOK great. this has worked out harder than I expected! I will have a play around and hopefully figure it out, in which case I will post back here!
-
Jim Bo over 11 yearsThe problem is, the data should be peaking around 0, and by doing log10 you lose all the negative data. What I want is something like the first histogram, but with the axis on a log scale, both negative and positive (I am aware this may not make sense...) basically I want the axis on such a scale so you can see the outliers (the +500/-500) but also you can see the distribution around 0 a bit better. This makes sense in my head but it seems to be causing confusion so it's quite possible that what I'm saying doesn't make sense!
-
aliocee over 9 yearsw_c which color is that i want to use your colors please?
-
LifeWorks over 9 years@aliocee I just used the ggplot. when you choose colour for two data, it will automatic generate such different colours. You can emulate this colour palette by referring: stackoverflow.com/questions/8197559/…