绘制时间序列数据(宽格式)

时间:2017-09-27 15:07:08

标签: r ggplot2

我在R Studio中合并了dataframe,如下所示:

head(my_df)
     Year        Month.x      val1       Month.y        val2
     2005-01     January     22.43099    January        26.3339814993271
     2005-02     February    26.62969    February       30.8841743766816
     2005-03     March       31.67926    March          27.9245803297443
     2005-04     April       23.65202    April          30.9088206490251
     2005-05     May         25.39969    May            26.494307897712
     2005-06     June        20.30036    June           18.9395527997218

year列包含2005年1月至2015年12月期间的数据。数据框中的"NA"列中只有一个val2,但我将其修复为:

my_df[is.na(my_df)] <- ""

我需要在R中绘制此时间序列数据(val1val2year)。最初,我试图将val1与{{1}进行对比使用R基础图形:

year

但是我看到一个带有此警告的空白图表。

plot(my_df$Year, my_df$val1, type = "b", col = "blue", xlim=c(2005, 2015),
     lwd=1, pch = 1, cex = 0.2, xlab="Year", ylab="Value")

任何人都可以帮我弄清楚事情搞砸了吗?无论如何我怎么能解决这个问题。

EDIT1:

根据@Santosh的建议,Warning message: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion 就在这里:

dput(my_df)

EDIT2:

structure(list(Year = c("2005-01", "2005-02", "2005-03", "2005-04", "2005-05", "2005-06", "2005-07", "2005-08", "2005-09", "2005-10", "2005-11", "2005-12", "2006-01", "2006-02", "2006-03", "2006-04", "2006-05", "2006-06", "2006-07", "2006-08", "2006-09", "2006-10", "2006-11", "2006-12", "2007-01", "2007-02", "2007-03", "2007-04", ............ "2014-11", "2014-12", "2015-01", "2015-02", "2015-03", "2015-04", "2015-05", "2015-06", "2015-07", "2015-08", "2015-09", "2015-10", "2015-11", "2015-12"), Month.x = c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", ............. "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"), val1 = c(22.4309863561828, 26.629689578869, 31.6792564287634, 23.6520192347222, 25.3996868508065, 20.3003602638889, 20.2621707795699, 22.3685403172043, 30.7087719888889, .......... 13.8973171652778, 11.3131837150538, 13.2869582405914, 16.4443315347222, 17.5448029758065, 22.8475848819444, 15.2890522220727), Month.y = c("January", "February", "March", "April", "May", "June", "July", "August", ............ "July", "August", "September", "October", "November", "December" ), val2 = c("26.3339814993271", "30.8841743766816", "27.9245803297443", "30.9088206490251", "26.494307897712", "18.9395527997218", "21.9441695597826", .......... "18.3722117002688", "17.8116471652778", "19.684253344086", "25.0107780152778", "20.6051117175464")), .Names = c("Year", "Month.x", "val1", "Month.y", "val2"), row.names = c(NA, -132L), class = "data.frame" 函数中删除xlim(2005, 2015)也会产生错误:

plot

1 个答案:

答案 0 :(得分:1)

以下是解决方法,如何过滤NA个案例并绘制数据:

  1. 选择您要使用的列
  2. Year字符串转换为日期
  3. 将数据从宽格式转换为长格式
  4. 过滤掉NA个案例
  5. 使用ggplot2
  6. 进行绘图

    tidyverse包加载了您需要的所有包(例如dplyrggplot2

    library(tidyverse)
    
    my_df %>%
        select(Year, val1, val2) %>%
        mutate(Year = as.Date(paste0(Year, "-01"))) %>%
        gather(val, value, -Year) %>%
        filter(complete.cases(.)) %>%
        ggplot(aes(Year, value, color = val)) +
            geom_line()