合并/匹配两个数据帧

时间:2016-12-12 19:22:27

标签: r

我想合并两个数据框y$genessymbol_annotations y的行名和symbol_annotations的#34; hgnc_symbol",并创建一个标有"符号",y$genes$Symbol的列,列出所有比赛。如果" hgnc_symbol"之间没有匹配和行名称,我想要' NA'填充而不是空单元格。我一直收到错误,因为这两个数据框不是相同的尺寸并且包含NA,我不知道如何纠正它。

>read.counts <- read.table("gene_counts.txt", header=TRUE) 
>row.names(read.counts) <- read.counts$Geneid 
>treatment <- factor(treatment)
> head(treatment)
[1] T0          IL2         IL2.ZA      IL2.OKT3    IL2.OKT3.ZA T0         
Levels: T0 IL2 IL2.OKT3 IL2.OKT3.ZA IL2.ZA
>y <- DGEList(read.counts, group=treatment, genes=read.counts)
>head(y$genes)
                SM01 SM02 SM03 SM04 SM05 SM06 SM07 SM08 SM09 SM10 SM11 SM12 SM13 SM14 SM15 SM16 SM17 SM18 SM19
ENSG00000223972    0    1    1    1    0    0    1    0    0    3    0    0    1    2    0    0    0    0    1
ENSG00000227232   33   31   13   15   20   43   36   32   43   43   61   42   92   73   80   64   33   25   28
ENSG00000278267    1    0    1    0    0    5    3    1    1    2    1    0    2    4    6    0    2    2    1
ENSG00000243485    0    0    0    0    0    0    0    0    0    0    0    0    0    0    2    0    0    0    0
ENSG00000237613    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
ENSG00000268020    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
                SM20 SM21 SM22 SM23 SM24 SM25 SM26 SM27 SM28 SM29 SM30
ENSG00000223972    0    0    0    0    1    0    0    0    0    0    0
ENSG00000227232   15   60   13   29   22   28   87   42   61   67   74
ENSG00000278267    2    3    5    1    3    4    4    3    2    4    3
ENSG00000243485    0    0    0    0    0    1    0    0    0    0    1
ENSG00000237613    0    0    0    0    0    0    0    0    0    0    0
ENSG00000268020    0    0    0    0    0    0    0    0    0    0    0
>head(symbol_annotations, n=10)
   ensembl_gene_id hgnc_symbol
1  ENSG00000210049       MT-TF
2  ENSG00000211459     MT-RNR1
3  ENSG00000210077       MT-TV
4  ENSG00000210082     MT-RNR2
5  ENSG00000209082      MT-TL1
6  ENSG00000198888      MT-ND1
7  ENSG00000210100       MT-TI
8  ENSG00000223795        <NA>
9  ENSG00000210107       MT-TQ
10 ENSG00000210112       MT-TM
>dim(symbol_annotations)
[1] 58069     2
>dim(y$genes)
[1] 58051    30
>y$genes$Symbol <- merge((rownames(y)), symbol_annotations[,c(2)])
Error in if (n > 0) c(NA_integer_, -n) else integer() : 
  missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In rep.fac * nx : NAs produced by integer overflow
2: In .set_row_names(as.integer(prod(d))) :
  NAs introduced by coercion to integer range

0 个答案:

没有答案