如何索引异常值?

时间:2012-12-30 11:34:00

标签: r dataframe which outliers

我有以下数据。如何确定哪个作者的出版物数量最多?

我试试这个

   (which(status$researchers==max(status$publications)) 

但它似乎不起作用。

#PUBLICATIONS

researchers = c("Smith", "Johnson", "Williams", "Brown", "Jones", "Miller", "Davis", "García", "Rodriguez", "Wilson", "Martinez", "Anderson", "Taylor", "Thomas", "Hernandez", "Moore", "Martin", "Jackson", "Thompson", "White", "Lopez", "Lee", "Gonzalez", "Harris", "Clark", "Lewis", "Robinson", "Walker", "Perez", "Hall", "Young", "Allen", "Sanchez", "Wright", "King", "Scott", "Green", "Baker", "Adams", "Nelson", "Hill", "Ramirez", "Campbell", "Mitchell", "Roberts", "Carter", "Phillips", "Evans", "Turner", "Stapel", "Torres", "Parker", "Collins", "Edwards", "Stewart", "Flores", "Morris", "Nguyen", "Murphy", "Rivera", "Cook", "Rogers", "Morgan", "Peterson", "Cooper", "Reed", "Bailey", "Bell", "Gomez", "Kelly", "Howard", "Ward", "Cox", "Diaz", "Richardson", "Wood", "Watson", "Brooks", "Bennett", "Gray", "James", "Reyes", "Cruz", "Hughes", "Price", "Myers", "Long", "Foster ", "Sanders", "Ross", "Morales", "Powell", "Sullivan", "Russell", "Ortiz", "Jenkins", "Gutierrez", "Perry", "Butler", "Barnes", "Fisher", "De Jong", "Jansen", "De Vries", "vd Berg", "Van Dijk", "Bakker", "Janssen", "Visser", "Smit", "Meijer", "De Boer", "Mulder", "De Groot", "Bos", "Smeesters", "Vos", "Peters", "Hendriks", "Van Leeuwen", "Dekker", "Brouwer", "De Wit", "Dijkstra", "Smits", "De Graaf", "Van der Meer", "Muller", "Schmidt", "Schneider", "Fischer", "Meyer", "Weber", "Schulz", "Wagner", "Becker", "Hoffmann", "Wagemakers",  "Molenaar", "Jansen", "White", "Bargh", "Dijksterhuis", "Poldermans", "Kanazawa", "Lynne", "Ling", "Vorst", "Borsboom", "Wicherts")

articles = data.frame(cbind(researchers, publications))
write.table(articles, file = "scientific status.txt", sep = " ")

status = read.table("scientific status.txt", header = TRUE, sep = "", quote = "\"'")     

2 个答案:

答案 0 :(得分:2)

这不是一般性的回应,但在这里你只需要提取重复的内容。

researchers[duplicated(researchers)]
[1] "Jansen" "White"  ## this 2 authors have 1 publications more than others!

要查看ouliers,您可以执行此操作,例如:

plot(table(researchers))

enter image description here

答案 1 :(得分:2)

目前尚不清楚您的数据代表什么。如果已经按照作者聚合,即每个作者有一行,而publications列包含出版物的数量,请执行:

status$researchers[which.max(status$publications)]

如果相反,您的数据未汇总,即每篇文章有一个,您可以这样做:

tail(sort(table(status$researchers)), 1)