SAS - transpose multiple variables in rows to columns

sas
10,307

Solution 1

Ok... you have to keep in mind that SAS arrays are tipically not real vectors or arrays of memory variables like in most other languages. They are just placeholders for dataset variables.

That said, the ARRAY statement is just there to define the mapping between the dataset variables and the shorthand array reference. It is useless to put it where you placed it as it does not initialize anything.

Another thing you should consider is that you need to accumulate the values and write a single row. You cannot do that on the FIRST row as you have not read all the data yet. You have to use the LAST.

A third issue is that the new variables you are defining must preserve the value when you read another row. SAS typically resets all variables that are not in the input dataset to MISSING. To avoid this and accumulate sums, min, max or other values on a group of rows, you must use the RETAIN statement.

Last... you may want to drop any variables you have just used to do your calculations, but are not desired in the ouput dataset... "i" in this case.

I think this should get you what you want:

DATA want (DROP=i);
  SET have; 
  BY acct seq1 seq2;
  /* define variables */
  ARRAY la_array {7} la1-la7;
  ARRAY ln_array {7} ln1-ln7;
  RETAIN i la1-la7 ln1-ln7;
  IF FIRST.seq2 then do;
    /* initialize group variables */
    DO i = 1 to 7;
      la_array(i)=.;
      ln_array(i)=.;
    END;
    /* initialize index variable */
    i = 1
  END;
  /* read input row values into array variables */
  la_array(i)=la;
  ln_array(i)=ln;
  i = i + 1;
  /* write output row if group is finished */
  IF LAST.seq2;
RUN;

Unfortunately I do not have a SAS installation available at the moment... you'll have to try it and let me know how it goes.

Solution 2

I'd go down a different route and take advantage of a little used feature in PROC SUMMARY (or MEANS) to achieve this. The output table is sorted by ACCT as this is the first variable in the CLASS statement, obviously you can easily resort by SEQ1 order afterwards.

The IDGROUP statement in conjunction with OUT[7] creates 7 variables for each of the variables listed (LA, LN) and populates them in sequence as the data is read. The AUTONAME option adds sequential numbers to the variable names (e.g. LN1, LN2,...,LN7). The only problem I've come across with this method is an out of memory error on the server where the source dataset is very large, this being due to PROC SUMMARY being run in memory. Hopefully it will work for you in this instance.

proc summary data=have nway;
class acct seq1 seq2;
output out=want (drop=_:) 
        idgroup(out[7] (la ln)=) / autoname;
run;
Share:
10,307
user2941280
Author by

user2941280

Updated on June 07, 2022

Comments

  • user2941280
    user2941280 almost 2 years

    I'm trying to use an array to accomplish the following. However, I've never used one before and I'm having a bit of trouble.

    Basically I want to take the following data and create one row(observation) for each acct, seq1, seq2 combination:

    acct  seq1 seq2  la      ln
    9999    1   10   20.01   100
    9999    1   10   19.05   1
    9999    2   11   26.77   100
    9999    2   11   24.86   1
    8888    3   20   38.43   218
    8888    3   20   37.53   1
    

    This is what I want to end up with. Note, I'm only showing la1 through la3 and ln1 through ln3 for the sake of space. It would actually to go to la7 and ln7:

    acct  seq1  seq2     la1     la2    la3  ln1    ln2 ln3
    9999    1   10      20.01   19.05   .    100     1   .
    9999    2   11      26.77   24.86   .    100     1   .
    8888    3   20      38.43   37.53   .    218     1   .
    

    Here is the code that I've attempted so far. Any assistance would be greatly appreciated:

    data want;
    set have; 
    by acct seq1 seq2;
    if first.seq2 then do;
    array la_array {7} la1-la7;
    array ln_array {7} ln1-ln7;
    end;
    do i = 1 to 7;
    la_array(i)=la;
    ln_array(i)=ln;
    end;
    run;
    
  • Joe
    Joe over 9 years
    PROC TRANSPOSE works as well, but requires an extra data step which this doesn't. Definitely a good choice.
  • user2941280
    user2941280 over 9 years
    This works great. Thank you for explaining in detail how it works. That was very helpful.
  • user2941280
    user2941280 over 9 years
    This works as well. I've very rarely used proc summary, and I had no idea it could be used for something like this. Very good to know. Thank you.