Creating and using temporary/volatile database tables In Stata

10,281

Solution 1

This answer is not longer correct. Stata now allows multiple SQL statements as long as the multistatement option is added to the odbc command.


Stata's odbc command does not allow combining multiple SQL statements into a single odbc command and altering TD's mode. It also creates a separate connection for each odbc command issued, so the volatile table goes poof by the time you want to use it to do something. This makes it impossible to use volatile tables directly.

However, there is a way to use R through Stata to produce a Stata data file. You need to install rsource from SSC and the foreign and RODBC packages in R. The 2 globals Rterm_path and Rterm_options for rsource can be defined in sysprofile.ado or in your own profile.ado. As far as I can determine, R does not allow exporting timestamps, so I had to do some conversion of dates and timestamps by hand. These conversions are somewhat at odds with the suggestions in the Stata manuals and the Stata blog.

rsource, terminator(END_OF_R)
  library("RODBC")
  library("foreign")
  db <- odbcConnect("mydsn")
  sqlQuery(db,"CREATE VOLATILE MULTISET TABLE vol_tab AS (SELECT ...) WITH DATA PRIMARY INDEX(...) ON COMMIT PRESERVE ROWS;")
  data<- sqlQuery(db,"SELECT * FROM vol_tab;",rows_at_time=1)
  write.dta(data,"mydata.dta",convert.dates = FALSE)
  close(db)
END_OF_R

use "mydata.dta", replace
/* convert dates and timestamps to Stata format */
gen stata_date = rdate + td(01jan1970)
format stata_date %td
gen double stata_timestamp = (rtimestamp + 315594000)*1000
format stata_timestamp %tc

Solution 2

I'm not familiar with Stata, but I'm guessing that your ODBC is connecting in "ANSI" mode. Try adding this between the create volatile table and the select statements:

commit work;

If that doesn't work, you may need to make two separate calls somehow.

UPDATE: Thinking a bit more about this, perhaps you can try this:

odbc load, exec("select distinct user_id from dw_users where cast(date_confirm as
date) > '2011-09-15'") clear dsn("mozart") lowercase;

In other words, just execute the query in one step; don't try to create a volatile table.

Solution 3

Put

BT; --UR LOGIC-- ET;

IF any thing fails in between.it rolls back

got from here

Solution 4

What if you try the following with your connection mode as TERADATA (which is more often then not the default):

odbc load, exec("BT; create volatile table new_usr as
(select top 10 user_id from dw_users) with data primary index(user_id) on commit
preserve rows; 
ET;

select * from new_usr;") clear dsn("mozart") lowercase;

The BT; and ET; statements wrap the SQL contained between in an explicit transaction. This SQL has been tested in SQL Assistant as I don't have access to the tool you are using. Typically, BT and ET are used to enforce logical transactions (or units of work) that must be completed successfully or everything is rolled back. This may allow you to get around the issue you are having in your tool.

EDIT

Failing the ability to wrap the Volatile Table creation in a BT and ET do you have the ability to create a stored procedure or macro that can embed all the logic necessary to complete the task then call the stored procedure or macro from Stata?

Share:
10,281

Related videos on Youtube

dimitriy
Author by

dimitriy

Into data before it was big. Keeping the world safe from folks armed with pivot tables and people trained to detect cats at scale. A great man for things that do not yet exist. Contact me if you are working on interesting problems and want to talk to an economist. Shallow learning for the win. Please do not e-mail me with questions unless we drank at least one beer together. I am happy to help, but I won't do it over e-mail, LinkedIn, Facebook, Twitter, etc., as I much prefer to have my answer be a public good, accessible to everyone in the future. Moreover, if I am wrong, someone will usually come along to correct me, which is much better for everyone involved. Create a simpler reproducible example on one of the toy data sets that demonstrates the problem you are having, put that up on SO, and then tag me. I will take a look when I get the chance.

Updated on June 04, 2022

Comments

  • dimitriy
    dimitriy almost 2 years

    Addendum: As of Stata 14, volatile tables work without any hacks.

    Is there a way to tweak Stata to work with temporary volatile tables? These tables and the data are deleted after a user logs off the session.

    Here's an example of a simple toy SQL query that I am using in Stata and Teradata:

    odbc load,  exec("
        BEGIN TRANSACTION;
        CREATE VOLATILE MULTISET TABLE vol_tab AS (
            SELECT TOP 10 user_id
            FROM dw_users
        ) WITH DATA
        PRIMARY INDEX(user_id)
        ON COMMIT PRESERVE ROWS;
    
        SELECT * FROM vol_tab;
        END TRANSACTION;
    ") dsn("mozart");
    

    This is the error message I am getting:

    The ODBC driver reported the following diagnostics
    [Teradata][ODBC Teradata Driver][Teradata Database] Only an ET or null statement is legal after a DDL Statement.
    SQLSTATE=25000
    r(682);
    

    The Stata error code means:

    error . . . . . . . . . . . . . . . . . . . . . . . . Return code 682 could not connect to odbc dsn; This typically occurs because of incorrect permissions, such as a bad User Name or Password. Use set debug on to display the actual error message generated by the ODBC driver.

    As far as I can tell permission are fine since I can pull data if I just execute the "SELECT TOP 10..." query. I set debug on, but it did not produce any additional information.

    Session mode is Teradata. ODBC manager is set to unixODBC. I am using Stata 13.1 on an Ubuntu server.

    I believe the underlying issue may be that separate connections are established for each SQL statement, so the volatile table evaporates by the time the select is issued. I am waiting on tech support to verify this.

    I tried using the odbc sqlfile command well, but this approach does not work unless I create a permanent table at the end of it. There's no load option with odbc sqlfile.

    Volatile tables seem to work just fine in SAS and R. For example, this works perfectly:

    library("RODBC")
    db <- odbcConnect("mozart")
    sqlQuery(db,"CREATE VOLATILE MULTISET TABLE vol_tab AS (
             SELECT TOP 10 user_id
             FROM dw_users
         ) WITH DATA
         PRIMARY INDEX(user_id)
         ON COMMIT PRESERVE ROWS;
    ")
    data<- sqlQuery(db,"select * from vol_tab;",rows_at_time=1)
    

    Perhaps this is because the connection to the DB remains open until close(db).

  • dimitriy
    dimitriy about 11 years
    That produces "Syntax error: COMMIT WORK not allowed for a DBC/SQL session" error message. Any idea what that means? I am pretty inexperienced with Teradata.
  • BellevueBob
    BellevueBob about 11 years
    Sorry, I understand the Teradata part but I don't know this Stata thing. The SQL looks correct to me (although normally I'd expect to see a fully-qualified table name). Volatile tables only exist for the duration of one session so maybe you need to execute the statements separately. But again, I don't know how it works for you; that clear option might break the connection.
  • BellevueBob
    BellevueBob about 11 years
    Look at the ODBC driver options and try changing the "Session Mode". Try different settings and see if they make a difference. Again, I'm just guessing here. I don't use ODBC at all myself (I use .NET).
  • dimitriy
    dimitriy about 11 years
    This query runs perfectly in SQL Assistant, but I would like to avoid the hassle of having to import a csv file into Stata for analysis. The clear tells Stata to clear any data currently loaded into memory to make room for the second select. Is there anything I can change in .odbc.ini file to set the mode to ANSI? Does the commit work; go right before the second select?
  • BellevueBob
    BellevueBob about 11 years
    If your connection is in ANSI mode, the COMMIT is needed. If the connection is in TERADATA mode, the COMMIT is NOT needed. And sorry, I don't know the parameters for the odbc.ini file; I was referring to the driver settings in the Windows ODBC Administrator.
  • dimitriy
    dimitriy about 11 years
    This is just a simple example with a volatile table. I have to do something much more complicated, which is why the volatile tables are needed.
  • dimitriy
    dimitriy about 11 years
    Thanks for helping me again. I changed the mode in the .odbc.ini file. I am getting the following error: Only an ET or null statement is legal after a DDL Statement. SQLSTATE=25000
  • dimitriy
    dimitriy about 11 years
    I have never tried a stored procedure before. I will try to figure out how to do that.
  • dimitriy
    dimitriy about 11 years
    Just to make sure I am doing things correctly, is connection mode the same thing as session mode?
  • Rob Paller
    Rob Paller about 11 years
    Should be. What is it set to?