我需要在ggplot2中创建-log10 p值的qq图,其中137个点的子集("目标")使用颜色友好的调色板以黄色突出显示我&# 39; m使用被叫cbbPalette。我无法在备用软件包中执行此操作,因为我最终需要使用grid.arrange包中与gridExtra一起使用的ggplot2将多个qq图组合到一个网格中。


cbbPalette <- c("#E69F00", "#000000") #part of my palette; gold & black


p_values = c(
  runif(100000, min = 0, max = 1),
  runif(132, min = 1e-7, max = 1),
  c(6e-20, 6e-19, 7e-9, 7.5e-9, 4e-8)

#labels for the p-values
names_letters <-
  do.call(paste0, replicate(2, sample(LETTERS, 100137, TRUE), FALSE))
names = paste0(names_letters, sprintf("%04d", sample(9999, 100137, TRUE)))
targets = names[100001:100137] #last 137 are targets

df = as.data.frame(p_values)
df$names = names
df <-
  df[sample(nrow(df)), ] #shuffles the df to place targets randomly w/in
df$Category = ifelse(df$names %in% targets, "Target", "Non-Target")


head(df, 4) 
           p_values  names   Category
89863 0.4821147 NZ3385 Non-Target
20209 0.3998835 SQ3793 Non-Target
29200 0.7893478 ZT5497 Non-Target
71623 0.3459360 QF5311 Non-Target

融化df使用reshape2与观察(o)&amp;预期(e)-log10 p值:

df.m = melt(df)
df.m$o = -log10(sort(df.m$value, decreasing = F))
df.m$e = -log10(1:nrow(df.m) / nrow(df.m))


   names   Category variable     value         o        e
1 NZ3385 Non-Target p_values 0.4821147 19.221849 5.000595
2 SQ3793 Non-Target p_values 0.3998835 18.221849 4.699565
3 ZT5497 Non-Target p_values 0.7893478  8.154902 4.523473
4 QF5311 Non-Target p_values 0.3459360  8.124939 4.398535


df_qq = ggplot(df.m, aes(e, o)) +
  geom_point(aes(color = Category)) +
  scale_colour_manual(values = cbbPalette) +
  geom_abline(intercept = 0, slope = 1) +
  ylab("Observed -log[10](p)") +
  xlab("Theoretical -log[10](p)")


QQ-plot I get w/ no highlighting of 137 targets

2 个答案:

答案 0 :(得分:1)

你可以在非目标之后的单独geom调用中绘制目标,按顺序绘制geom,使目标最终位于顶部:


我还在调色板中添加了名称,以确保在更改cbbPalette <- c(Target = "#E69F00", `Non-Target` = "#000000") df_qq = ggplot(df.m, aes(e, o)) + geom_abline(intercept = 0, slope = 1) + geom_point(aes(color = Category), data = df.m[df.m$Category == "Non-Target", ]) + geom_point(aes(color = Category), data = df.m[df.m$Category == "Target", ]) + scale_colour_manual(values = cbbPalette) + ylab("Observed -log[10](p)") + xlab("Theoretical -log[10](p)") 调用的顺序时将正确的颜色附加到每个类别,否则会混淆。


enter image description here

答案 1 :(得分:1)


df.m %>%
    arrange(Category) %>%



df.m %>%
    mutate(Category = as.factor(Category) %>% fct_relevel("Target")) %>%
    arrange(desc(Category)) %>%

我正在使用fct_relevel包中的forcats,因为这是操纵因子级别的一种非常简单的方法;你也可以用基数R订购等级。 fct_relevel将目标等级放在第一位,所以当我按类别排列时,我正在反向进行,以便最后再次绘制目标。

