如何计算dataexplorer中每个帖子最常用的CloseReasonTypes?

时间:2014-05-12 12:25:49

标签: sql sql-server join sql-server-2014 dataexplorer

我开始撰写this query,我觉得很难理解为什么要关闭这个问题。

select
   TOP ##Limit:int?38369## -- The maximum value the hardware can handle.
   Posts.Id as [Post Link], -- Question title.
   Count(PendingFlags.PostId) as [Number of pending flags], -- Number of pending flags per questions.
   Posts.OwnerUserId as [User Link], -- Let click on the colum to see if the same user ask off-topic questions often.
   Reputation as [User Reputation], -- Interesting to see that such questions are sometimes asked by high rep users.
   Posts.Score as [Votes], -- Interesting to see that some questions have more than 100 upvotes.
   Posts.AnswerCount as [Number of Answers], -- I thought we shouldn't answer on off-  topic post.
   Posts.FavoriteCount as [Number of Stars], -- Some questions seems to be very helpfull :) .
   Posts.CreationDate as [Asked on], -- The older is the question, the more is the chance that flags on them can't get reviewed.
   Posts.LastActivityDate as [last activity], -- Similar effect as with Posts.CreationDate.
   Posts.LastEditDate as [modified on],
   Posts.ViewCount
from posts
   LEFT OUTER JOIN Users on Users.id = posts.OwnerUserId
   INNER JOIN PendingFlags on PendingFlags.PostId = Posts.Id
where ClosedDate IS NULL -- The question is not closed.
group by Posts.id, Posts.OwnerUserId, Reputation, Posts.Score, Posts.FavoriteCount, Posts.AnswerCount, Posts.CreationDate, Posts.LastActivityDate, Posts.LastEditDate, Posts.ViewCount
order by Count(PendingFlags.PostId) desc; -- Questions with more flags have more chance to get them handled, and the higher is the probabilty that the question is off-topic (since several users already reviewed the question).

鉴于每个问题都有几个标志,我不能用一个简单的表来显示每个标志所用的标志,但我认为它应该与揭示CloseReasonTypes.Id的最常见值有关。对于每个帖子:这引出了两个问题:

  • 首先:查看this query后,我应该加入CloseReasonTypes PendingFlags 以显示原因名称他们的数字。由于帖子 PendingFlags 之间没有公共字段,但由于我使用from posts作为连接表的基础,我不知道怎么做 JOIN

  • Secound :我不知道在每一行上选择最常用的关闭原因。虽然有几个问题似乎已经讨论了类似的情况,但是当我们询问如何在整个表上找到最常见的值时,我不能使用他们的答案,从而产生一个包含单列和单行的表,而我需要对每个帖子上的标志数进行此操作。

1 个答案:

答案 0 :(得分:1)

虽然不完全符合您的要求,但我相信query会为您提供一个良好的开端。

select
    PostId as [Post Link], 
    duplicate = sum(case when closereasontypeid = 101 then 1 else 0 end), 
    offtopic = sum(case when closereasontypeid = 102 then 1 else 0 end),
    unclear = sum(case when closereasontypeid = 103 then 1 else 0 end),
    toobroad = sum(case when closereasontypeid = 104 then 1 else 0 end),
    opinion = sum(case when closereasontypeid = 105 then 1 else 0 end),
    ot_superuser = sum(case when CloseAsOffTopicReasonTypeId = 4 then 1 else 0 end),
    ot_findexternal = sum(case when CloseAsOffTopicReasonTypeId = 8 then 1 else 0 end),
    ot_serverfault = sum(case when CloseAsOffTopicReasonTypeId = 7 then 1 else 0 end),
    ot_lackinfo = sum(case when CloseAsOffTopicReasonTypeId = 12 then 1 else 0 end),
    ot_typo = sum(case when CloseAsOffTopicReasonTypeId = 11 then 1 else 0 end)
from pendingflags
where 
    flagtypeid in (13,14)   -- Close flags
    and creationdate > '2014-04-15'
group by PostId

这只是关注今年4月15日以来关闭的帖子,并返回约23,500条记录。

我认为数据资源管理器不包含已删除的帖子,因此这些不包含在结果中。

如果/当添加或删除新的关闭原因时,将需要修改。