当Row有数据时,长到宽和重复列

时间:2016-11-02 17:50:36

标签: r vba r-table

想知道其他人如何应对这一挑战。

背景

数据用于植被监测。它包括基本的每个地块信息,并确定这些物种的种类和覆盖率。

有几行特定于绘图的信息 - 日期,位置,距离后面是物种行。在物种内,行数值包括由该列表示的图中物种的%覆盖率。

简化视图将是这样的网格:

plot        1           4            5
date     5/3/2016     6/20/2016     6/22/2016
location    A           F             K
sp1                    15            30
sp2         5                        100
sp3         T           3             5

我希望得到的是这样的网格,它有助于将csv导入到数据库中( species%cover需要引用RMDB中的情节信息)。 最左列=表字段名称。

plot        1        1          4        4            5          5       5
date     5/3/2016   5/3/2016  6/20/2016 6/20/2016 6/22/2016  6/22/2016 6/22/2016
location    A        A          F        F            K         K        K
species    sp2      sp3        sp1       sp3        sp1        sp2       sp3
cover %     5        T         15        3           30        100       5

数据库可以很容易地“消化”这种宽格式,并正确填充两个表格(Plot& CoverPercent)。

途径吗

我想过几种方法,但我认为有一种更好的方式让我失踪。

这是我到目前为止所提出的:

  • 将数据从长到大

  • 翻转
  • 添加speciescover

  • 计算给定地块的物种数量

  • 根据物种数重复绘图列

  • 填充剧情的“物种”和“掩盖”行

最初我以为我可以在VBA中做到这一点,但看起来R似乎是更好/更快/更清洁的方法。但问题是“如何”?

我最近用表包完成了一些R工作,但过去一年我在VBA / SQL项目上花了很多钱。

我很好奇别人会如何应对这种变化。有什么想法吗?

2 个答案:

答案 0 :(得分:1)

我会使用OO方法。定义一个包含绘图和数据信息的简单类,并有一个物种和覆盖百分比字典:

'Plot.cls
Option Explicit

Private Type PlotMembers
    PlotId As Long
    DataDate As Date
    Location As String
End Type

Private this As PlotMembers
Private mCover As Scripting.Dictionary

Private Sub Class_Initialize()
    Set mCover = New Scripting.Dictionary
End Sub

Public Property Get PlotId() As Long
    PlotId = this.PlotId
End Property

Public Property Let PlotId(inValue As Long)
    this.PlotId = inValue
End Property

Public Property Get DataDate() As Date
    DataDate = this.DataDate
End Property

Public Property Let DataDate(inValue As Date)
    this.DataDate = inValue
End Property

Public Property Get Location() As String
    Location = this.Location
End Property

Public Property Let Location(inValue As String)
    this.Location = inValue
End Property

Public Sub AddSpeciesCover(species As String, cover As String)
    mCover.Add species, cover
End Sub

然后给它一个属性,用于显示CSV数据行列表:

'Also in Plot.cls
Public Property Get CsvRows() As String
    Dim key As Variant
    Dim output() As String
    ReDim output(mCover.Count - 1)
    Dim i As Long
    For Each key In mCover.Keys
        Dim temp(4) As String
        temp(0) = this.PlotId
        temp(1) = this.DataDate
        temp(2) = this.Location
        temp(3) = key
        temp(4) = mCover(key)
        output(i) = Join(temp, ",")
        i = i + 1
    Next key
    CsvRows = Join(output, vbCrLf)
End Property

然后,您需要做的就是从输入数据填充它们。请注意,此处的示例用法假定您问题中的顶部网格基本上看起来像A1左上角的活动工作表。更改此选项以匹配您需要收集数据的方式应该相当容易:

Public Sub SampleUsage()
    Dim plots As New Collection

    With ActiveSheet
        Dim col As Long
        For col = 2 To 4
            Dim current As Plot
            Set current = New Plot
            current.PlotId = .Cells(1, col).Value
            current.DataDate = .Cells(2, col).Value
            current.Location = .Cells(3, col).Value
            Dim r As Long
            For r = 4 To 6
                Dim cover As String
                cover = .Cells(r, col).Value
                If cover <> vbNullString Then
                    current.AddSpeciesCover .Cells(r, 1).Value, cover
                End If
            Next
            plots.Add current
        Next

    End With

    For Each current In plots
        Debug.Print current.CsvRows
    Next
End Sub

请注意,这只是一个演示方法要点的框架 - 它需要错误处理,更强大的格式化等等,以便生产就绪。

答案 1 :(得分:1)

使用reshape2包的melt()方法简单地在R中重塑数据框。下面假设您发布的数据的转置视图是您在评论中提到的实际格式:

library(reshape2)

data = 'plot    date    location    sp1 sp2 sp3
1   5/3/2016    A       5   T
4   6/20/2016   F   15      3
5   6/22/2016   K   30  100 5'

df <- read.table(text=data, header=TRUE, sep="\t", stringsAsFactors = FALSE)
df    
#   plot      date location sp1 sp2 sp3
# 1    1  5/3/2016        A  NA   5   T
# 2    4 6/20/2016        F  15  NA   3
# 3    5 6/22/2016        K  30 100   5

mdf <- melt(df, id.vars=c("plot", "date", "location"),
            variable.name="species", na.rm = TRUE, value.name="cover %")
mdf <- mdf[with(mdf, order(date)),]               # ORDER BY DATE
rownames(mdf) <- seq_len(nrow(mdf))               # RESET ROW NAMES
mdf

#   plot      date location species cover %
# 1    1  5/3/2016        A     sp2       5
# 2    1  5/3/2016        A     sp3       T
# 3    4 6/20/2016        F     sp1      15
# 4    4 6/20/2016        F     sp3       3
# 5    5 6/22/2016        K     sp1      30
# 6    5 6/22/2016        K     sp2     100
# 7    5 6/22/2016        K     sp3       5