使用Stata将多个数据集合并为一个

时间:2018-05-17 20:15:56

标签: dataframe stata

我正在尝试merge来自多个较小数据集的完整数据集:

cd "\\files
use "\\files\Creatinine.dta"

*merging with report data for baseline demographics *
merge m:1 id using "Archive\Report.dta"
* keeping only those tranplanted 2002-2015 *
drop if tx1 <= date("01/01/2002", "DMY") | tx1 >= date("31/12/2015", "DMY")
drop _merge
* labelling variables *
label define org 1 "Heart" 2 "Lung" 3 "Liver" 5 "Multiple" 6 "Small Bowel" 7 
"Pancreas" 8 "Stomach" 
label values organ1 organ2 organ3 org  
label values multi1 multi2 multi3 multi4 org
label variable organ1 "First Organ"
label variable organ2 "Second Organ"
label variable organ3 "Third Organ"
label variable donor_type1 "First Donor Type"
label variable tx1 "Date of First Transplant"
label variable tx2 "Date of Second Transplant"
label variable tx3 "Date of Third Transplant"
label variable dob "Date of Birth"
label variable tx1_loc "First Transplant Location"
label variable multi1 "Multiple Organ 1"
label variable multi2 "Multiple Organ 2"
label variable multi3 "Multiple Organ 3"
label variable multi4 "Multiple Organ 4"
label variable censor_date "Censor Date"
label define loc 1 "Hospital" 
label values tx1_loc loc
label define sex1 1 "Male" 2 "Female"
label values sex sex1
label variable sex "Sex of Child"
label define donor 1 "Living" 2 "Deceased" 
label values donor_type1 donor
order dob sex tx1 tx1_loc organ1 donor_type1 multi1 multi2 multi3 multi4 organ2 tx2_date organ3 tx3_date censor_date DeathDate, after(id)


***Data Cleaning *
generate dateCollected = date(DateCollected, "DMY")**
format %tdCCYY/NN/DD dateCollected
codebook dateCollected
drop DateCollected
rename dateCollected DateCollected
order DateCollected TimeCollected, after (Test)

*dropping duplicates *
sort id DateCollected TimeCollected Result
quietly by id DateCollected TimeCollected Result: gen dup=cond(_N==1,0,_n)
drop if dup > 1
drop dup

*save *
 save "\\files\Injury.dta"

我在代码中已经到了这一行:

generate dateCollected = date(DateCollected, "DMY")

然而,它给我一个类型不匹配错误。

我认为这是由于creatinine文件和report文件之间的日期格式造成的。

请看一下并提出建议。非常感谢。

数据

Creatine.dta(仅显示一个结果,每个ID多个结果)

id      dob         sex     tx1         tx1_loc     organ1  donor_type1  censor_date    DeathDate   DateCollected   Test               Result   Units
2010003 15-Apr-07   Female  29-Jan-09   Hospital    Heart   Deceased     30-Jun-16                  12/5/2007       Creatinine,blood   25       umol/L

Screenshot of Creatinine

Report.dta(仅显示一个ID)

id      dob         sex  tx1        tx1_loc organ1  donor_type1 multi1  multi2  multi3  multi4  organ2  tx2_date    organ3  tx3_date censor_date DeathDate
2010003 15-Apr-07   2    29-Jan-09  1       1       2                                                                                30-Jun-16  

Screenshot of Report.dta

1 个答案:

答案 0 :(得分:3)

请注意,问题表面之后执行了merge

您收到r(109)错误,因为您尝试使用 numeric 变量上的generate函数date()新变量。此函数需要字符串(变量)作为输入。

我不确定你为什么要这样做,但是如果你只是想创建并使用dateCollected进行进一步的工作,同时保留DateCollected作为备份,你可以简单地克隆它:

clonevar dateCollected = DateCollected

修改

阐述我的评论:

. clear
. set obs 1
number of observations (_N) was 0, now 1

. generate DateCollected_String = "12/05/2007"

. generate DateCollected = date(DateCollected_String, "DMY")
. format %tdDD/NN/CCYY DateCollected

. browse

. generate dateCollected = date(DateCollected, "DMY")
type mismatch
r(109);