SAS variable concatenation through data step

19,816

Solution 1

Greetings Seems simple enough, here is my solution:

data a;
set test end=eof;
length cat $100.;
retain cat;
if AddToStringYN = "Y" then do;
   cat=trim(left(cat))||trim(left(value));
end;
if eof then do;
   call symput("VAR",cat);
   output;
end;
run;

%put VAR=&VAR;

in this example you have the concatenation of your variable in the A dataset in the column "CAT" and you have a macrovariable VAR with the same list

Solution 2

The following is my answer to your identical question on RunSubmit.com. I think you and @Fabio may be over-engineering the solution, it doesn't need any iterating data step code at all...

First, the easy way to do what you're trying to do is like this:

proc sql;
  select Value into :StringVar separated by ''
    from work.test
    where AddToStringYN='Y'
    ;
quit;

Here, you can take advantage of the SQL interface with SAS/MACRO, using the select into syntax. You could even add an order by clause to get a particular order you're looking for.

Second, since you've happened upon something about the way SAS macro works and you're keen to understand it: in your first example, the first thing the compiler does before executing your code is to resolve the value of &stringvar, which at that point is empty. So after compilation, with this token replaced, your code looks like this to SAS...

%let stringvar=;
Data _null_;
  set work.test;
  if AddToStringYN = "Y" then do;
    call symput('stringvar',"" || strip(value));
  end;
Run;

...then SAS goes ahead and runs that code (which happens to be valid code, but is concatenating an empty string to the start of something). And because of the way the data step works, each iteration of the data step is in fact replacing the value of StringVar, which is why at the end of the data step, it's left with the last value that was read in.

Share:
19,816
Yoh
Author by

Yoh

Updated on June 15, 2022

Comments

  • Yoh
    Yoh almost 2 years

    I am looking for a way to create a string variable containing certain values of the dataset while going through the data step.

    Example data set work.test:

    AddToStringYN    Value
         Y           One
         Y           Two
         N           Three
         Y           Four
         N           Five
    

    So in the end, the variable would look like: OneTwoFour (or even better FourTwoOne). This looks so simple, but I can't seem to find a way to do it. I also tried to work with macro variables like this:

    %let stringvar=;
    Data _null_;
      set work.test;
      if AddToStringYN = "Y" then do;
        call symput('stringvar',"&stringvar" || strip(value));
      end;
    Run;
    

    But this gives:

    GLOBAL STRINGVAR Four
    

    So I only get the last value. I get that this must be because of some misunderstanding of mine about this macro facility, but I don't understand why there is only the last value in the variable. I thought it was only the last time the symput was called that it was actually executed or something, but then when I adjust the code to:

    %let stringvar=;
    Data _null_;
      set work.test;
      if AddToStringYN = "Y" then do;
        call symput('stringvar'||strip(value),"&stringvar" || strip(value));
      end;
    Run;
    

    Then I do get them all:

    GLOBAL STRINGVARONE  One
    GLOBAL STRINGVARTWO  Two
    GLOBAL STRINGVARFOUR  Four
    

    So my last guess is that going through the data step, the 'call symput...' line is actually added to the macro processor where the "&stringvar" is already replaced and only after the final statement are they all executed.
    Is this a good assumption or is there another explanation? And back to the original question: is there an easy way to achieve this (having the desired variable)?

  • Yoh
    Yoh about 13 years
    Perfect! I recall trying to use some construction like this (combination of retain and sas variable reuse) but it didn't work. Can't find yet what is different now, but IT WORKS ! And indeed simple enough....if you know where to look ;) Thank you! (any idea on the additional question, if my logic is right with the macro variable?)
  • Robert Penridge
    Robert Penridge about 13 years
    Also, try replacing the trim(left()) functions with the cat(), catt(), cats(), catx() functions. They are very flexible, easier to use, and do automatic type conversion for you! Available in SAS9 onwards...
  • Yoh
    Yoh about 13 years
    Saw your answer on RunSubmit, didn't have time to reply yet.First many thanks, especially for the explanation about the SAS macro. Indeed, I was thinking too much of the way it worked, resolving of the macrovariable only happens once, not every iteration of the data step. About the use of proc sql: I always learned using SAS code precendence over SQL (ie. on processing time). I don't know if this holds up for this example though.
  • sasfrog
    sasfrog about 13 years
    @Yohsoog: there's no strict rule about performance of SAS data step code vs. PROC SQL - which one is faster would depend on all sorts of factors. You could always try the 2 solutions out and check the log to see what's fastest in this case, with your data. But there are other considerations for you too, such as code readability, maintainability, clarity to others, coding standards etc. As for me, I would tend towards the PROC SQL method unless you're also doing a whole lot of other stuff in the data step that makes the I/O worthwhile for the result.
  • Yoh
    Yoh almost 13 years
    Yes indeed, in this case I am doing a lot of other things in the data step. The data step actually loops through a table generating a complete xml. And the string I put together here, is also cut apart elsewhere in the same data step, so using the PROC SQL method would not even be possible indeed. But it is good to consider as you say for readability etc.