Question

我有这种格式的数据 - 它只是一个

示例：n = 2

X      Y      info

2      1       good
2      4       bad

3      2      good

4     1       bad
4      4      good

6       2     good
6       3     good

现在，上述数据按排序方式（共7行）。我需要分别创建一组2,3或4行并生成图表。在上面的数据中，我做了一组2行。由于第3行中没有其他列形成组，因此第三行保持不变。组只能在同一行内形成。不与其他行。

现在，我将检查两行是否在信息列中都有“好”。如果两行都“好” - 形成的组也很好，否则很糟糕。在上面的例子中，第3个/最后一个组是“好”组。休息都是坏组。一旦我完成所有行，我将计算总数。成立的良好团体/总数没有。小组。

在上面的例子中，输出将是：Total no。好团体/总数没有。 of groups =＆gt;三分之一。

这是n = 2（组的大小）的情况

现在，对于n = 3，我们组成3行，对于n = 4，我们组成4行，并以类似的方式找到好/坏组。如果组中的所有行都有“好”块 - 结果是好块，否则坏。

示例：n = 3

2      1       good
2      4       bad
2     6        good

3      2      good

4     1       good
4      4      good
4    6        good

6       2     good
6       3     good

在上面的例子中，我留下了第4行和最后2行，因为我不能用它们组成3行。第一组结果为“差”，最后一组结果为“好” 输出：1/2

对于n = 4：

2      1       good
2      4       good
2      6        good
2      7       good

3      2      good

4     1       good
4      4      good
4    6        good

6       2     good
6       3     good
6       4     good
6       5     good

在这种情况下，我组成一组4并找到结果。第5，第6，第7，第8行被遗忘或被忽略。我制作了2组4行，两者都是“好”的块。输出：2/2

因此，在获得n = 2，n-3和n = 4的3个输出值之后，我将绘制这些值的图表。

Answer 1

下面是我认为正在寻找你想要的代码。它假定您描述的数据分别存储在名为data_2，data_3和data_4的三个数据集中。这些数据集中的每一个都由％FIND_GOOD_GROUPS宏处理，该宏确定哪些X组在INFO中具有所有“GOOD”值，然后将此摘要信息作为新行附加到BASE数据集。我没有添加代码，但您可以在单独的数据步骤中计算GOOD_COUNT与 FREQ 的比率，然后使用过程绘制N值和比率。希望这接近你想要完成的任务。

%******************************************************************************;
%macro main;

   %find_good_groups(dsn=data_2, n=2);
   %find_good_groups(dsn=data_3, n=3);
   %find_good_groups(dsn=data_4, n=4);

   proc print data=base uniform noobs;

%mend main;
%******************************************************************************;
%******************************************************************************;
%macro find_good_groups(dsn=,n=);

   %***************************************************************************;
   %* Sort data by X and Y so that you can use FIRST.X variable in Data step. *;
   %***************************************************************************;
   proc sort data=&dsn;
      by x y;
   run;

   %***************************************************************************;
   %* TEMP dataset uses the FIRST.X variable to reset COUNT and GOOD_COUNT to *;
   %* initial values for each row where X changes. Each row in the X groups   *;
   %* adds 1 to COUNT and sets GOOD_COUNT to 0 (zero) if INFO is ever "BAD".  *;
   %* A record is output if COUNT is equal to the macro parameter &N.         *;
   %***************************************************************************;
   data temp;
      keep good_count n;
      retain count 0 good_count 1 n &n;
      set &dsn;
      by x y;
      if first.x then do;
         count = 0;
         good_count = 1;
      end;
      count = count + 1;
      if good_count eq 1 then do;
         if trim(left(upcase(info))) eq "BAD" then do;
            good_count = 0;
         end;
      end;
      if count eq &n then output;
   run;

   %***************************************************************************;
   %* Summarize the TEMP data to find the number of times that all of the     *;
   %* rows had "GOOD" in the INFO column for each value of X.                 *;
   %***************************************************************************;
   proc summary data=temp;
      id n;
      var good_count;
      output out=n_&n (drop=_type_) sum=;
   run;

   %***************************************************************************;
   %* Append to BASE dataset to retain the sums and frequencies from all of   *;
   %* the datasets. BASE can be used to plot the N / number of Good records.  *;
   %***************************************************************************;
   proc append data=n_&n base=base force; run;

%mend find_good_groups;
%******************************************************************************;
%main

如何解决选择多行的问题

1 个答案: