Question

这里我有温度时间序列面板数据，我打算对它进行分段回归或三次样条回归。首先，我在SO中快速查看了分段回归概念及其在R中的基本实现，初步了解了如何继续我的工作流程。在我的第一次尝试中，我尝试使用splines::ns包中的splines来运行样条回归，但我没有得到正确的条形图。对我来说，使用基线回归或分段回归或样条回归可以起作用。

以下是我的面板数据规范的一般情况：在下面显示的第一行是我的因变量，以自然对数项和自变量表示：平均温度，总降水量和11个温度箱和每个箱宽（AKA），bin的窗口）是3摄氏度。（＆lt; -6，-6~-3，-3~0，...> 21）。

可重复的示例：

以下是使用实际温度时间序列面板数据模拟的可重现数据：

set.seed(1) # make following random data same for everyone
dat <- data.frame(index=rep(c("dex111", "dex112", "dex113", "dex114", "dex115"), 
                          each=30),
                year=1980:2009,
                region= rep(c("Berlin", "Stuttgart", "Böblingen", 
                              "Wartburgkreis", "Eisenach"), each=30),
                ln_gdp_percapita=rep(sample.int(40, 30), 5), 
                ln_gva_agr_perworker=rep(sample.int(45, 30), 5),
                temperature=rep(sample.int(50, 30), 5), 
                precipitation=rep(sample.int(60, 30), 5), 
                bin1=rep(sample.int(32, 30), 5), 
                bin2=rep(sample.int(34, 30), 5), 
                bin3=rep(sample.int(36, 30), 5),
                bin4=rep(sample.int(38, 30), 5), 
                bin5=rep(sample.int(40, 30), 5), 
                bin6=rep(sample.int(42, 30), 5),
                bin7=rep(sample.int(44, 30), 5), 
                bin8=rep(sample.int(46, 30), 5), 
                bin9=rep(sample.int(48, 30), 5),
                bin10=rep(sample.int(50, 30), 5), 
                bin11=rep(sample.int(52, 30), 5))

请注意，除了极端温度值之外，每个箱子的温度间隔均等，因此每个箱子给出了相应温度间隔内的天数。

更新2：回归规范：

这是我的回归规范：

地区按i编制索引，年份由t编制索引。 y_it衡量产出， y_it∈ {ln GDP per capita, ln GVA per capita (by six sectors respectively)}，μ_i是一组区域固定效应，可解释各地区之间未观察到的常数差异。 θ_t是一组年度固定效应，可以灵活地解释共同趋势。 T_it ^ m is the number of days in the district i and year t`，其在第m个温度箱中具有一天的平均温度。每个室内温度箱宽3℃。当我对其进行样条曲线回归时，我需要添加两个固定的方式（由年份固定并按地区固定）。

新更新1 ：

我想完全重新定义我的意图。最近我发现了非常有趣的R包plm，它适用于面板数据。这是我的新解决方案，使用plm，效果很好：

library(plm)
pdf <- pdata.frame(dat, index = c("region", "year"))
model.b <- plm(ln_gdp_percapita ~ bin1+bin2+bin3+bin4+bin5+bin6+bin7+bin8+bin9+bin10+bin11, data = pdf, model = "pooling", effect = "twoways")

library(lmtest)    
coeftest(model.b)
res <- summary(model.b, cluster=c("c"))  ## add standard clustered error on it

新更新3 ：

summary(model.b, cluster=c("c"))$coefficients  # only render coefficient estimates table

新更新2：我的输出：

    > coeftest(model.b)

t test of coefficients:

         Estimate  Std. Error t value  Pr(>|t|)    
bin1   1.7773e-04  4.8242e-04  0.3684 0.7125716    
bin2   2.4031e-03  4.3999e-04  5.4617 4.823e-08 ***
bin3   7.9238e-04  3.9733e-04  1.9943 0.0461478 *  
bin4  -2.0406e-05  3.7496e-04 -0.0544 0.9566001    
bin5   9.9911e-04  3.6386e-04  2.7459 0.0060451 ** 
bin6   6.0026e-05  3.4915e-04  0.1719 0.8635032    
bin7   2.5621e-04  3.0243e-04  0.8472 0.3969170    
bin8  -9.5919e-04  2.7136e-04 -3.5347 0.0004099 ***
bin9  -1.8195e-04  2.5906e-04 -0.7023 0.4824958    
bin10 -5.2064e-04  2.7006e-04 -1.9279 0.0538948 .  
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

所需的散点图：

下面是我想要实现的散点图。这只是一个模拟的散点图，其灵感来自NBER工作文件第32页，标题为Temperature Effects on Productivity and Factor Reallocation: Evidence from a Half Million Chinese Manufacturing Plants - 一个非门户版本here，并且可以通过从命令行运行以下内容来修复整个文件中的页面方向：
pdftk w23991.pdf cat 1-31 32-37east 38-40 41east 42-44 45east 46 output w23991-oriented.pdf

所需的散点图：

在该图中，黑点线是估计回归（基线或限制样条回归）系数，点蓝线是基于聚类标准误差的95％置信区间。

我刚与纸质作者联系过，他们只是简单地使用Excel来获得该情节。基本上，他们只使用Estimate，95％置信区间数据的左右两侧来产生一个图。我知道Excel中的那种情节非常容易，但我有兴趣在R中这样做。那可行吗？有什么想法吗？

我希望使用R代替使用Excel来渲染绘图的程序化方法。任何聪明的举动？

Answer 1

前言：我对这个问题的统计数据一点都不熟悉。接下来的内容可能对ggplot2入门很有帮助。让我知道你的想法。

set.seed(1) # make following random data same for everyone
dat <- data.frame(index=rep(c("dex111", "dex112", "dex113", "dex114", "dex115"), 
                              each=30),
                    year=1980:2009,
                    region= rep(c("Berlin", "Stuttgart", "Böblingen", 
                                  "Wartburgkreis", "Eisenach"), each=30),
                    ln_gdp_percapita=rep(sample.int(40, 30), 5), 
                    ln_gva_agr_perworker=rep(sample.int(45, 30), 5),
                    temperature=rep(sample.int(50, 30), 5), 
                    precipitation=rep(sample.int(60, 30), 5), 
                    bin1=rep(sample.int(32, 30), 5), 
                    bin2=rep(sample.int(34, 30), 5), 
                    bin3=rep(sample.int(36, 30), 5),
                    bin4=rep(sample.int(38, 30), 5), 
                    bin5=rep(sample.int(40, 30), 5), 
                    bin6=rep(sample.int(42, 30), 5),
                    bin7=rep(sample.int(44, 30), 5), 
                    bin8=rep(sample.int(46, 30), 5), 
                    bin9=rep(sample.int(48, 30), 5),
                    bin10=rep(sample.int(50, 30), 5), 
                    bin11=rep(sample.int(52, 30), 5))

library(plm)
pdf <- pdata.frame(dat, index=c("region", "year"))
model.b <- plm(ln_gdp_percapita ~ 
               bin1+bin2+bin3+bin4+bin5+bin6+bin7+bin8+bin9+bin10+bin11,
                   data=pdf, model="pooling", effect="twoways")
pdf$ln_gdp_percapita_predicted <- plm:::predict.plm(model.b, pdf)

library(ggplot2)
x <- ggplot(pdf, aes(y=ln_gdp_percapita_predicted, x=temperature))+
            geom_point()+
            geom_smooth(method=lm, formula=y~x, se=TRUE, level=.95)+ # see ?geom_smooth
            ylab("ln_gdp_percapita_predicted")+
            ggtitle("ln_gdp_percapita modeled as temperature")

ggsave("scatter_plot_2.png")
x

参考：R: Plotting panel model predictions using plm & pglm

更新：

从res进行绘图（有关更多信息，请参见??coefplot）：

res <- plm:::summary.plm(model.b, cluster=c("c"))

library(coefplot)
coefplot::coefplot(res)
ggsave("model.b.coefplot.png")

如何在R（新更新）中对纵向温度序列执行分段/样条回归？

1 个答案: