四分位数范围 - 下限,上限和中位数

时间:2015-04-29 13:52:01

标签: sql-server vb.net excel

我试图根据一系列可以是任意长度的数字来计算四分位数范围,例如。

1,  1,  5,  6,  7,  8,  2,  4,  7,  9,  9,  9,  9

我需要从这个四分位数范围内得到的值是:

  • 上四分位
  • 中值
  • 下四分位

如果我将上述数字数组转储到Microsoft Excel(A:M列),那么我可以使用以下公式:

  • =QUARTILE.INC(A1:M1,1)
  • =QUARTILE.INC(A1:M1,2)
  • =QUARTILE.INC(A1:M1,3)

得到我的答案:

  • 4
  • 7
  • 9

我现在需要在SQL Server或VB.NET中计算出这3个值。我可以使用这些语言中的任何一种格式或对象获取数组值,但是我找不到任何存在的函数,例如Excel具有的QUARTILE.INC函数。

有谁知道如何在SQL Server或VB.NET中实现这一目标?

3 个答案:

答案 0 :(得分:4)

可能有一种更简单的方法,但要获得Quartiles,您可以使用NTILE (Transact-SQL)

  

将有序分区中的行分配到指定数量的组中。这些组从一开始编号。对于每一行,NTILE返回该行所属的组的编号。

所以对于你的数据:

SELECT  1 Val
INTO    #temp
UNION ALL
SELECT  1
UNION ALL
SELECT  5
UNION ALL
SELECT  6
UNION ALL
SELECT  7
UNION ALL
SELECT  8
UNION ALL
SELECT  2
UNION ALL
SELECT  4
UNION ALL
SELECT  7
UNION ALL
SELECT  9
UNION ALL
SELECT  9
UNION ALL
SELECT  9
UNION ALL
SELECT  9

-- NTILE(4) specifies you require 4 partitions (quartiles)
SELECT  NTILE(4) OVER ( ORDER BY Val ) AS Quartile ,
        Val
INTO #tempQuartiles
FROM    #temp

SELECT * 
FROM #tempQuartiles

DROP TABLE #temp
DROP TABLE #tempQuartiles

这会产生:

Quartile    Val
1           1
1           1
1           2
1           4
2           5
2           6
2           7
3           7
3           8
3           9
4           9
4           9
4           9

通过这个你可以找出你想要的东西。

所以修改SELECT你可以这样做:

SELECT Quartile, MAX(Val) MaxVal
FROM #tempQuartiles
WHERE Quartile <= 3
GROUP BY Quartile

生产:

Quartile    MaxVal
1           4
2           7
3           9

答案 1 :(得分:3)

我们创建了一个User-Defined-Type,将其用作函数参数,然后以这种方式使用它。

我们的实现使用与Excel Percentile函数相同的计算。

CREATE TYPE [dbo].[floatListType] AS TABLE (
    [value] FLOAT NOT NULL
);

GO

CREATE FUNCTION [dbo].[getPercentile]
(
    @data floatListType readonly,
    @percentile float
)
RETURNS float
AS
BEGIN
    declare @values table
    (
        value float,
        idx   int
    );

    insert into @values
    select value, ROW_NUMBER() OVER (order by value) - 1 as idx
    from @data;

    declare @cnt int = (select count(*) from @values)
        , @n float = (@cnt - 1) * @percentile + 1
        , @k int = FLOOR(@n)
        , @d float = @n - @k;

    if (@k = 0)
        return (select value from @values where idx = 0)
    if (@k = @cnt)
        return (select value from @values where idx = @cnt - 1)
    if (@k > 0 AND @k < @cnt)
        return (select value from @values where idx = @k - 1)
            + @d * ((select value from @values where idx = @k)
            - (select value from @values where idx = @k - 1))

    return null;
END

您可以像这样使用它来获得中位数和四分位数(因为Q1仅为0.25百分位数),例如:

declare @values floatListType;

insert into @values
select value from #mytable

select getPercentile(@values, 0.25) as Q1,
    getPercentile(@values, 0.5) as median,
    getPercentile(@values, 0.75) as Q3

答案 2 :(得分:1)

如果您想要一个SQL Server解决方案,几年前I posted an Interquartile Range procedure on my blog。它基于动态SQL,因此您可以将有权访问的任何列插入其中。它还没有经过充分测试,我当时还在学习绳索,现在代码有点旧了,但它可以满足您的开箱即用需求,或至少提供代码的起点你自己的解决方案以下是代码的要点 - 请点击我的博客链接进行深入讨论。

CREATE PROCEDURE [Calculations].[InterquartileRangeSP]
@DatabaseName as nvarchar(128) = NULL, @SchemaName as nvarchar(128), @TableName as nvarchar(128),@ColumnName AS nvarchar(128), @PrimaryKeyName as nvarchar(400), @OrderByCode as tinyint = 1, @DecimalPrecision AS nvarchar(50)
AS
SET @DatabaseName = @DatabaseName + ‘.’
DECLARE @SchemaAndTableName nvarchar(400)
SET @SchemaAndTableName = ISNull(@DatabaseName, ”) + @SchemaName + ‘.’ + @TableName
DECLARE @SQLString nvarchar(max)

SET @SQLString = ‘DECLARE @OrderByCode tinyint,
@Count bigint,
@LowerPoint bigint,
@UpperPoint bigint,
@LowerRemainder decimal(38,37), — use the maximum precision and scale for these two variables to make the
 procedure flexible enough to handle large datasets; I suppose I could use a float
@UpperRemainder decimal(38,37),
@LowerQuartile decimal(‘ + @DecimalPrecision + ‘),
@UpperQuartile decimal(‘ + @DecimalPrecision + ‘),
@InterquartileRange decimal(‘ + @DecimalPrecision + ‘),
@LowerInnerFence decimal(‘ + @DecimalPrecision + ‘),
@UpperInnerFence decimal(‘ + @DecimalPrecision + ‘),
@LowerOuterFence decimal(‘ + @DecimalPrecision + ‘),
@UpperOuterFence decimal(‘ + @DecimalPrecision + ‘) 

SET @OrderByCode = ‘ + CAST(@OrderByCode AS nvarchar(50)) + ‘ SELECT @Count=Count(‘ + @ColumnName + ‘)
FROM ‘ + @SchemaAndTableName +
‘ WHERE ‘ + @ColumnName + ‘ IS NOT NULL

SELECT @LowerPoint = (@Count + 1) / 4, @LowerRemainder =  ((CAST(@Count AS decimal(‘ + @DecimalPrecision + ‘)) + 1) % 4) /4,
@UpperPoint = ((@Count + 1) *3) / 4, @UpperRemainder =  (((CAST(@Count AS decimal(‘ + @DecimalPrecision + ‘)) + 1) *3) % 4) / 4; –multiply by 3 for the left s’ + @PrimaryKeyName + ‘e on the upper point to get 75 percent

WITH TempCTE
(‘ + @PrimaryKeyName + ‘, RN, ‘ + @ColumnName + ‘)
AS (SELECT ‘ + @PrimaryKeyName + ‘, ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY ‘ + @ColumnName + ‘ ASC) AS RN, ‘ + @ColumnName + ‘
FROM ‘ + @SchemaAndTableName + ‘
WHERE ‘ + @ColumnName + ‘ IS NOT NULL),
TempCTE2 (QuartileValue)
AS (SELECT TOP 1 ‘ + @ColumnName + ‘ + ((Lead(‘ + @ColumnName + ‘, 1) OVER (ORDER BY ‘ + @ColumnName + ‘) – ‘ + @ColumnName + ‘) * @LowerRemainder) AS QuartileValue
FROM TempCTE
WHERE RN BETWEEN @LowerPoint AND @LowerPoint + 1 

UNION


SELECT TOP 1 ‘ + @ColumnName + ‘ + ((Lead(‘ + @ColumnName + ‘, 1) OVER (ORDER BY ‘ + @ColumnName + ‘) – ‘ + @ColumnName + ‘) * @UpperRemainder) AS QuartileValue
FROM TempCTE
WHERE RN BETWEEN @UpperPoint AND @UpperPoint + 1)

SELECT @LowerQuartile = (SELECT TOP 1 QuartileValue
 FROM TempCTE2 ORDER BY QuartileValue ASC), @UpperQuartile = (SELECT TOP 1 QuartileValue
 FROM TempCTE2 ORDER BY QuartileValue DESC)

SELECT @InterquartileRange = @UpperQuartile – @LowerQuartile
SELECT @LowerInnerFence = @LowerQuartile – (1.5 * @InterquartileRange), @UpperInnerFence = @UpperQuartile + (1.5 * @InterquartileRange), @LowerOuterFence = @LowerQuartile – (3 * @InterquartileRange), @UpperOuterFence = @UpperQuartile + (3 * @InterquartileRange)

–SELECT @LowerPoint AS LowerPoint, @LowerRemainder AS LowerRemainder, @UpperPoint AS UpperPoint, @UpperRemainder AS UpperRemainder
— uncomment this line to debug the inner calculations

SELECT @LowerQuartile AS LowerQuartile, @UpperQuartile AS UpperQuartile, @InterquartileRange AS InterQuartileRange,@LowerInnerFence AS LowerInnerFence, @UpperInnerFence AS UpperInnerFence,@LowerOuterFence AS LowerOuterFence, @UpperOuterFence AS UpperOuterFence


SELECT ‘ + @PrimaryKeyName + ‘, ‘ + @ColumnName + ‘, OutlierDegree
FROM  (SELECT ‘ + @PrimaryKeyName + ‘, ‘ + @ColumnName + ‘,
       ”OutlierDegree” =  CASE WHEN (‘ + @ColumnName + ‘ < @LowerInnerFence AND ‘ + @ColumnName + ‘ >= @LowerOuterFence) OR (‘ +
@ColumnName + ‘ > @UpperInnerFence
 AND ‘ + @ColumnName + ‘ <= @UpperOuterFence) THEN 1
       WHEN ‘ + @ColumnName + ‘ < @LowerOuterFence OR ‘ + @ColumnName + ‘ > @UpperOuterFence THEN 2
       ELSE 0 END
       FROM ‘ + @SchemaAndTableName + ‘
       WHERE ‘ + @ColumnName + ‘ IS NOT NULL) AS T1
      ORDER BY CASE WHEN @OrderByCode = 1 THEN ‘ + @PrimaryKeyName + ‘ END ASC,
CASE WHEN @OrderByCode = 2 THEN ‘ + @PrimaryKeyName + ‘ END DESC,
CASE WHEN @OrderByCode = 3 THEN ‘ + @ColumnName + ‘ END ASC,
CASE WHEN @OrderByCode = 4 THEN ‘ + @ColumnName + ‘ END DESC,
CASE WHEN @OrderByCode = 5 THEN OutlierDegree END ASC,
CASE WHEN @OrderByCode = 6 THEN OutlierDegree END DESC‘

–SELECT @SQLString — uncomment this to debug string errors
EXEC (@SQLString)
相关问题