Histogram with "negative" logarithmic scale in R

13,345

Solution 1

I am not sure I understand your goal, but when you want a log-like transformation yet have zeroes or negative values, the inverse hyperbolic sine transformation asinh() is often a good option. It is log-like for large values and is defined for all real values. See Rob Hyndman's blog and this question on stats.stackexchange.com for discussion, details, and other options.

If this is an acceptable approach, you can create a custom scale for ggplot. The code below demonstrates how to create and use a custom scale (with custom breaks), along with a visualization of the asinh() transformation.

library(ggplot2)
library(scales)

limits <- 100
step <- 0.005
demo <- data.frame(x=seq(from=-1*limits,to=limits,by=step))

asinh_trans <- function(){
  trans_new(name = 'asinh', transform = function(x) asinh(x), 
            inverse = function(x) sinh(x))
}

ggplot(demo,aes(x,x))+geom_point(size=2)+
     scale_y_continuous(trans = 'asinh',breaks=c(-100,-50,-10,-1,0,1,10,50,100))+
     theme_bw()

enter image description here

ggplot(demo,aes(x,x))+geom_point(size=2)+
     scale_x_continuous(trans = 'asinh',breaks=c(0,1,10,50,100))+
     scale_y_log10(breaks=c(0,1,10,50,100))+ # zero won't plot
     xlab("asinh() scale")+ylab("log10 scale")+
     theme_bw()

enter image description here

Solution 2

Realizing that the question is fairly old, I decided to answer it anyway since I ran into exactly the same problem.

I see that some answers above misunderstood your original question. I think it is a valid visualization question and I outline below my solution that will hopefully be useful for others as well.

My approach was to use ggplot and create custom log transform for x and y axis (as well as custom break generators)

library(ggplot2)
library(scales)

# Create custom log-style x axis transformer (...,-10,-3,-1,0,1,3,10,...)
custom_log_x_trans <- function()
  trans_new("custom_log_x",
            transform = function (x) ( sign(x)*log(abs(x)+1) ),
            inverse = function (y) ( sign(y)*( exp(abs(y))-1) ),
            domain = c(-Inf,Inf))

# Custom log x breaker (...,-10,-3,-1,0,1,3,10,...)
custom_x_breaks <- function(x)
{ 
  range <- max(abs(x), na.rm=TRUE)

  return (sort( c(0,
                  sapply(0:log10(range), function(z) (10^z) ),
                  sapply(0:log10(range/3), function(z) (3*10^z) ),
                  sapply(0:log10(range), function(z) (-10^z) ),
                  sapply(0:log10(range/3), function(z) (-3*10^z) )
  )))
}

# Create custom log-style y axis transformer (0,1,3,10,...)
custom_log_y_trans <- function()
  trans_new("custom_log_y",
            transform = function (x) ( log(abs(x)+1) ),
            inverse = function (y) ( exp(abs(y))-1 ),
            domain = c(0,Inf))

# Custom log y breaker (0,1,3,10,...)
custom_y_breaks <- function(x)
{ 
  max_y <- length(x)

  range <- max(abs(max_y), na.rm=TRUE)

  return (sort( c(0,
                  sapply(0:log10(range), function(z) (10^z) ),
                  sapply(0:log10(range/3), function(z) (3*10^z) )
  )))
}

ggplot(data=mat) +
  geom_histogram(aes(x=x,fill=..count..), 
                 binwidth = 1, color="black", size=0.1) +
  scale_fill_gradient("Count", low = "steelblue", high = "red") +
  coord_trans(x="custom_log_x",y="custom_log_y") +
  scale_x_continuous(breaks = custom_x_breaks(mat$x)) +
  scale_y_continuous(breaks = custom_y_breaks(mat$x)) +
  theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))  + 
  theme_bw()

which gives me the following plot.

enter image description here

Note that:

  • the plot also includes coloring scheme to show visually the absolute value of each bar.
  • the bins become increasingly thinner as x increases (side effect of log-transform)

In either case, the two outliers are clearly visible.

Solution 3

I found a way to cheat on it. I say "cheat", because it actually plot negative and positive parts of the data separately. Thus you can not compare the negative and positive data. But only can show the distribution of negative and positive parts separately.

And one of the problem is if there is zero values in your data, it will not be shown in the plot.

reverselog_trans <- function(base = exp(1)) {
  trans <- function(x) -log(x, base)
  inv <- function(x) base^(-x)
  trans_new(paste0("reverselog-", format(base)), trans, inv, 
            log_breaks(base = base), 
            domain = c(1e-100, Inf))
}

quartz();


dist1 <- ggplot(data=df.meltFUAC) +
  geom_point(alpha=1,aes(x=deltaU.deltaUltrasensitivity,y=deltaF.deltaFitness, 
                         colour=deltaF.w_c)) + 
  scale_x_continuous(name = expression(Delta * S[ult]), 
                     limits=c(1e-7,1),trans = "log10",breaks=c(1e-01,1e-03,1e-05),
                     labels=c("1e-01","1e-03","1e-05")) + 
  scale_y_continuous(name = expression(paste(Delta, " Fitness")),trans = "log10", 
                     limits = c(1e-7,1), breaks=c(1e-01,1e-03,1e-05),
                     labels=c("1e-01","1e-03","1e-05")) +
  theme_bw() +
  theme(legend.position = "none", axis.title.x=element_blank(),strip.background=element_blank(),
        panel.border=element_rect(colour = "black"),panel.grid.major=element_blank(),
        panel.grid.minor=element_blank(),plot.background=element_blank(),
        plot.margin=unit(c(0,0,0,-11),"mm"))

dist2 <- ggplot(data=df.meltFUAC, aes(x=-deltaU.deltaUltrasensitivity,y=deltaF.deltaFitness, 
                                      colour=deltaF.w_c)) +
  geom_point(alpha=1) + 
  scale_x_continuous(name = expression(Delta * sqrt(S[ult] %.% S[amp])),limits=c(1,1e-7),
                     trans = reverselog_trans(10),breaks=c(1e-01,1e-03,1e-05),
                     labels=c("-1e-01","-1e-03","-1e-05")) +
  scale_y_continuous(name = expression(paste(Delta, " Fitness")),trans = "log10", 
                     limits = c(1e-7,1), breaks=c(1e-01,1e-03,1e-05),
                     labels=c("1e-01","1e-03","1e-05")) +
  theme_bw() +
  theme(legend.position = "none",strip.background=element_blank(),panel.border=element_rect(colour = "black"),
        axis.text.y=element_blank(), axis.ticks.y=element_blank(), axis.title.y=element_blank(),
        axis.line.y=element_line(colour="black",size=1,linetype="solid"),axis.title.x=element_blank(),
        panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),
        plot.margin=unit(c(0,-8,0,2.5),"mm"))

hist0 <- ggplot(data=df.meltFUAC, aes(deltaF.deltaFitness,fill=deltaF.w_c)) +
  #geom_histogram(alpha = 0.5, aes(y=..density..),position = 'identity') +
  geom_density(alpha = 0.5, aes(colour=deltaF.w_c)) +
  scale_x_continuous(name = expression(paste(Delta, " Fitness")), 
                     limits=c(1e-7,1),trans = "log10",breaks=c(1e-01,1e-03,1e-05),
                     labels=c("1e-01","1e-03","1e-05")) + 
  scale_y_continuous(name = "Density", limits=c(0,0.6)) + 
  theme_bw() +
  theme(legend.position = "none", axis.title.x=element_blank(),strip.background=element_blank(),
        axis.text.y=element_blank(), axis.ticks.y=element_blank(), axis.title.y=element_blank(),
        axis.text.x=element_blank(), axis.ticks.x=element_blank(), axis.title.x=element_blank(),
        panel.border=element_rect(colour = "black"),panel.grid.major=element_blank(),
        panel.grid.minor=element_blank(),plot.background=element_blank(),
        plot.margin=unit(c(0,5,2.5,-2.5),"mm")) +
  coord_flip()

hist1 <- ggplot(data=df.meltFUAC, aes(deltaU.deltaUltrasensitivity,fill=deltaF.w_c)) +
  #geom_histogram(alpha = 0.5, aes(y=..density..),position = 'identity') +
  geom_density(alpha = 0.5, aes(colour=deltaF.w_c)) +
  scale_x_continuous(name = expression(Delta * S[ult]), 
                     limits=c(1e-7,1),trans = "log10",breaks=c(1e-01,1e-03,1e-05),
                     labels=c("1e-01","1e-03","1e-05")) + 
  scale_y_continuous(name = "Density", limits=c(0,0.6)) + 
  theme_bw() +
  theme(legend.position = "none", axis.title.x=element_blank(),strip.background=element_blank(),
        axis.text.y=element_blank(), axis.ticks.y=element_blank(), axis.title.y=element_blank(),
        axis.text.x=element_blank(), axis.ticks.x=element_blank(), axis.title.x=element_blank(),
        axis.line.x=element_line(colour="black",size=1,linetype="solid"),
        panel.border=element_rect(colour = "black"),panel.grid.major=element_blank(),
        panel.grid.minor=element_blank(),plot.background=element_blank(),
        plot.margin=unit(c(5,0,-2.5,2),"mm"))

hist2 <- ggplot(data=df.meltFUAC, aes(-deltaU.deltaUltrasensitivity,fill=deltaF.w_c)) +
  #geom_histogram(alpha = 0.5, aes(y=..density..),position = 'identity') +
  geom_density(alpha = 0.5, aes(colour=deltaF.w_c)) +
  scale_x_continuous(name = expression(Delta * S[ult]),limits=c(1,1e-7),
                     trans = reverselog_trans(10),breaks=c(1e-01,1e-03,1e-05),
                     labels=c("-1e-01","-1e-03","-1e-05")) +
  scale_y_continuous(name = "Density", limits=c(0,0.6)) + 
  theme_bw() +
  theme(legend.position = "none",strip.background=element_blank(),panel.border=element_rect(colour = "black"),
        axis.text.y=element_blank(), axis.ticks.y=element_blank(), axis.title.y=element_blank(),
        axis.text.x=element_blank(), axis.ticks.x=element_blank(), axis.title.x=element_blank(),
        axis.line.y=element_line(colour="black",size=1,linetype="solid"),
        axis.line.x=element_line(colour="black",size=1,linetype="solid"),
        panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),
        plot.margin=unit(c(5,-8,-2.5,2.5),"mm"))



grid.newpage();
pushViewport(viewport(layout = grid.layout(3, 3, widths = unit(c(4,4,2),"null"),
                                           heights=unit(c(2,7.5,0.5),"null"))));
vplayout <- function(x, y) viewport(layout.pos.row = x, layout.pos.col = y);

print(dist2, vp = vplayout(2, 1));
print(dist1, vp = vplayout(2, 2));
print(hist2, vp = vplayout(1, 1));
print(hist1, vp = vplayout(1, 2));
print(hist0, vp = vplayout(2, 3));
grid.text(expression(Delta * Ultrasensitivity),vp = vplayout(3,1:2),x = unit(0.55, "npc"), 
          y = unit(0.9, "npc"),gp=gpar(fontsize=12, col="black"));

dev.copy2pdf(file=sprintf("%s/_dist/dist_hist_deltaF_deltaU_wc_01vs10.pdf", resultDir));
dev.off();

Here is the graph it got (but you need to manually to put the legend on):

enter image description here

Or a simpler one:

reverselog_trans <- function(base = exp(1)) {
  trans <- function(x) -log(x, base)
  inv <- function(x) base^(-x)
  trans_new(paste0("reverselog-", format(base)), trans, inv, 
            log_breaks(base = base), 
            domain = c(1e-100, Inf))
}

quartz();

hist1 <- ggplot(deltaF, aes(deltaFitness,fill=w_c)) + guides(fill=guide_legend(title=expression(omega[c]))) + geom_histogram(alpha = 0.5, aes(y=..density..),position = 'identity') + geom_density(alpha = 0.05, aes(colour=w_c)) + scale_x_continuous(name = expression(paste(Delta, " Fitness")),trans = "log10");
hist1 <- hist1 + scale_y_continuous(name = "Density", limits=c(0,1));
#hist1 <- hist1 + theme(panel.background=element_blank(),panel.border=element_blank(),axis.line.x=element_blank(),axis.line.y=element_line(colour="black",linetype="solid",size=1),axis.title.x=element_blank(),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),plot.margin=unit(c(5,5,0,5),"mm"));
hist1 <- hist1 + theme_bw();
hist1 <- hist1 + theme(strip.background=element_blank(),panel.border=element_rect(colour = "black"),axis.title.x=element_blank(),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),plot.margin=unit(c(5,5,0,5),"mm"));
hist1 <- hist1 + scale_color_discrete(name=expression(omega[c]));# + geom_vline(xintercept=0, colour="grey", size = 1);# + geom_hline(yintercept=0, colour="grey", size = 0.5);

hist2 <- ggplot(deltaU, aes(deltaUltrasensitivity,fill=w_c)) + geom_histogram(alpha = 0.5, aes(y=..density..),position = 'identity') + geom_density(alpha = 0.05, aes(colour=w_c))  + scale_x_continuous(name = expression(paste(Delta, " Ultrasensitivity")), limits=c(1e-7,1),trans = "log10",breaks=c(1e-01,1e-03,1e-05),labels=c("1e-01","1e-03","1e-05"));
hist2 <- hist2 + scale_y_continuous(name = "Density",limits=c(0,1)) ;#+ geom_vline(xintercept=0, colour="grey", size = 1);# + geom_hline(yintercept=0, colour="grey", size = 0.5);
#hist2 <- hist2 + theme(legend.position = "none", axis.title.x=element_blank(),panel.background=element_blank(),panel.border=element_blank(),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),plot.margin=unit(c(0,5,0,-7.5),"mm"));
hist2 <- hist2 + theme_bw();
hist2 <- hist2 + theme(legend.position = "none", axis.title.x=element_blank(),strip.background=element_blank(),panel.border=element_rect(colour = "black"),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),plot.margin=unit(c(0,5,0,-7.5),"mm"));
# + ggtitle("Positive part")

hist3 <- ggplot(deltaU, aes(-deltaUltrasensitivity,fill=w_c)) + geom_histogram(alpha = 0.5, aes(y=..density..),position = 'identity') + geom_density(alpha = 0.05, aes(colour=w_c)) + scale_x_continuous(name = expression(paste(Delta, " Ultrasensitivity")), limits=c(1,1e-7),trans = reverselog_trans(10),breaks=c(1e-01,1e-03,1e-05),labels=c("-1e-01","-1e-03","-1e-05"));
hist3 <- hist3 + scale_y_continuous(name = "Density", limits=c(0,1));# + geom_hline(yintercept=0, colour="black", size = 0.5);
#hist3 <- hist3 + theme(legend.position = "none",panel.background=element_blank(),axis.text.y=element_blank(), axis.ticks.y=element_blank(), axis.title.y=element_blank(),axis.line.y=element_line(colour="black",size=1,linetype="solid"),axis.title.x=element_blank(),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),plot.margin=unit(c(0,-7.5,0,5),"mm"));
hist3 <- hist3 + theme_bw();
hist3 <- hist3 + theme(legend.position = "none",strip.background=element_blank(),panel.border=element_rect(colour = "black"),axis.text.y=element_blank(), axis.ticks.y=element_blank(), axis.title.y=element_blank(),axis.line.y=element_line(colour="black",size=1,linetype="solid"),axis.title.x=element_blank(),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),plot.margin=unit(c(0,-7.5,0,5),"mm"));
# + ggtitle("Negative part")

grid.newpage();
pushViewport(viewport(layout = grid.layout(4, 2, widths = unit(c(5,5),"null"),heights=unit(c(4.6,0.4,4.6,0.4),"null"))));
vplayout <- function(x, y) viewport(layout.pos.row = x, layout.pos.col = y);
print(hist1, vp = vplayout(1, 1:2));  # key is to define vplayout
grid.text(expression(paste(Delta, " Fitness")),vp = vplayout(2,1:2),x = unit(0.5, "npc"), y = unit(0.9, "npc"),gp=gpar(fontsize=12, col="black"));
print(hist3, vp = vplayout(3, 1));
print(hist2, vp = vplayout(3, 2));
grid.text(expression(paste(Delta, " Ultrasensitivity")),vp = vplayout(4,1:2),x = unit(0.5, "npc"), y = unit(0.9, "npc"),gp=gpar(fontsize=12, col="black"));


dev.copy2pdf(file=sprintf("%s/deltaF_deltaU_wc_01vs10.pdf", resultDir));
dev.off();

Here is the graph I got:

enter image description here

Share:
13,345
Jim Bo
Author by

Jim Bo

Updated on July 28, 2022

Comments

  • Jim Bo
    Jim Bo almost 2 years

    I have a dataset with some outliers, such as the following

    x <- rnorm(1000,0,20)
    x <- c(x, 500, -500)
    

    If we plot this on a linear x axis scale at this we see

    histogram(x)
    

    non log x-axis

    I worked out a nice way to put it on a log scale using this useful thread: how to use a log scale for y-axis of histogram in R? :

    mat <- data.frame(x)
    ggplot(ee, aes(x = xx)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10()
    

    log x-axis

    However, I would like the x axis labels from this 2nd example to match that of the first example, except with a kind of "negative log" - i.e. first tick (moving from the centre to the left) could be -1, then the next could be -10, the next -100, but all equidistant. Does that make sense?

  • Jim Bo
    Jim Bo over 11 years
    This looks good, I just need to get my head around it and make sure I have the xaxis in the right places!
  • Jim Bo
    Jim Bo over 11 years
    Hmm - it seems to cause a "dip" in the histogram around 0.. If it helps clarify my question, I am looking for: something like the first histogram, but with the xaxis brought in - so it's plotting the same data, but instead of the xaxis marks being -600 -400 -200 0 200 400 600, i would like the marks to be something like -1000 -100 -10 -1 0 - 10 100 1000, with the numbers equal distance apart. As I understand it, this should lead to a more stretched out histogram, and bring in the outliers, but the peak would be around 0? Though I may be wrong.
  • Jim Bo
    Jim Bo over 11 years
    Hmm - what if I perform 10^x and then plot on the log scale - then I just need to rejig the axes.. does that make sense?
  • MattBagg
    MattBagg over 11 years
    No the peak will not be around 0. You will get a dip because you are making unequal width bins and there are fewer items in smaller bins. Even if you make a custom mirrored log scale like I demo'd using ifelse() or another logical test to do different things for positive, zero, and negative values (i.e. -log(-x) for negative and 0 for 0), you will still have a histogram that has two peaks.
  • MattBagg
    MattBagg over 11 years
    You could calculate the bins and their contents outside of ggplot, maybe using cut() or Hmisc's cut2() and make sure that the bins around zero are wider so you get one peak. I think you would need to use geom_bar() and explicitly pass the x,y, and width rather than having ggplot calculate.
  • Jim Bo
    Jim Bo over 11 years
    OK great. this has worked out harder than I expected! I will have a play around and hopefully figure it out, in which case I will post back here!
  • Jim Bo
    Jim Bo over 11 years
    The problem is, the data should be peaking around 0, and by doing log10 you lose all the negative data. What I want is something like the first histogram, but with the axis on a log scale, both negative and positive (I am aware this may not make sense...) basically I want the axis on such a scale so you can see the outliers (the +500/-500) but also you can see the distribution around 0 a bit better. This makes sense in my head but it seems to be causing confusion so it's quite possible that what I'm saying doesn't make sense!
  • aliocee
    aliocee over 9 years
    w_c which color is that i want to use your colors please?
  • LifeWorks
    LifeWorks over 9 years
    @aliocee I just used the ggplot. when you choose colour for two data, it will automatic generate such different colours. You can emulate this colour palette by referring: stackoverflow.com/questions/8197559/…