How to open a .dat file (ASCII)?

33,958

Solution 1

There is a Java app to get you the data from CPS, DataFerrett This app lets you get CPS and other data sets. But it is not very efficient.

I can show you an example how to open one of them yourself (you can use it for any years in the interval 1989 till 2012).

  1. Download the .dat file
  2. Save it in a Desktop folder (C:\Users\Owner...)
  3. Download corresponding .do and .dct files from here
  4. Save them in the same folder
  5. Open the .dat file just the way you open it in your question in Stata
  6. Save it as a Stata .dta file in the same folder (C:\Users\Owner...)
  7. Open the .do file (using Notepad++) that is in your (C:\Users\Owner...) folder
  8. At the very beginning you will see the author presctibes local variables for the paths of .dta, .dat and .dct files. Change the paths so that they point to the saved .dta, .dat and .dct files in your folder (C:\Users\Owner...) on your Desktop
  9. Reopen Stata, and run the .do file from your folder (C:\Users\Owner...)
  10. Done! Save the .dta file

Now, for the years 1962 to 1988, you can do the same procedure (10 steps) as I explained above, but unfortunately NBER does not provide the .do and .dct files. It means that you have to write them yourself. Take one of the available .do and .dct files from any of the years (1989 - 2012) as a benchmark, and write your own .do and .dct files. You will have to make corrections so that the new .do and .dct files are consistent with the corresponding .pdf documentation for each year. I know it is very tideous, but this is the only way you can handle it.

Solution 2

We need more information.

".dat" is not an extension that is special so far as Stata is concerned. Perhaps you meant .dta.

Even if so, what file was it, what command did you use and what was wrong?

The page you linked to leads to numerous files. We have not a hope of guessing which you mean.

Spelling is "Stata".

Solution 3

might not save you from spending days digging into that data but here's some ideas:

  1. the file contains 2 completely different kinds of lines. this might be the reason why you can't import them. you can see this by opening the unzipped file in a text editor. you have to find out what that means.
  2. what do you want to obtain from this file? according to the pdf it contains 85 different values per record. do you need them all? if you're only interested in a few values you could extract them in a unix shell.
Share:
33,958
Buras
Author by

Buras

Oracle DBA

Updated on July 02, 2022

Comments

  • Buras
    Buras about 2 years

    I tried to open a .dat file using Stata, and it actually opened, but the data set was a complete mess. I took the file from NBER (CPS data)... click on the A icon of the year 1964 March.

    I tried the regular Stata procedure for .dat files: File->Import->ASKII data created by spreadsheet (delimiter " ") as recommended in Stata manual for .dat files.

    But it is still not working. Are there any other ways to open .dat file? Can I convert it to .csv somehow?

    (All the data files are ASCII files compressed with the Unix compress command.)

  • Buras
    Buras about 11 years
    Thank you for reply... this is the link to the .zip file that contains .dat (not .dta): nber.org/data/current-population-survey-data.html Click on the A in the year 1964 March.. I opened it according to the rules in stata manual eui.eu/Personal/Franklin/Tutorial%20session1.pdf , i.e. ImportFile->ASKII->delimiter ""
  • Nick Cox
    Nick Cox about 11 years
    It seems that you must use the .do and .dct files given elsewhere on that site. Attempting direct import of .dat files will, as you report, lead nowhere useful.
  • Buras
    Buras about 11 years
    I tried to use the .do files...I am just confused...opening a file should not be such a big deal...
  • Maarten Buis
    Maarten Buis about 11 years
    Consider this from the perspective of NBER; They are trying to make data available for more than a couple of years, so they will have to deal with the fact that formats change, some programs become less popular, and new programs will emerge. One way to work around that is to make the data available with only minimal formatting. This means that it becomes harder to open a file in a given software package (as you noticed), but formating that is not there cannot become outdated, so the data will remain useful for a longer period of time. That is a fair trade-off.
  • Buras
    Buras about 11 years
    THANK YOU! I have tried this. It is works for 1989 to 2012... But still, what about the years 1962 to 1988. I opened those .pdf files. Each of them is like 200 pages, and the files are non-editable. Is there any other way? It looks like a Hercules task to write my own .dct and .do for each of the 1962 to 1988 years!?
  • Buras
    Buras about 11 years
    why would CPS post .dat files together with a super long .pdf documentations? Do they think people would read those .pdfs and write their own .do and .dct...it is impossibly tedious!!! I think they MUST provide the dictionary.
  • Maarten Buis
    Maarten Buis about 11 years
    Imagine what computers looked like between 1962 and 1988. We should be glad the data from those years are no longer stored on punch cards. The first version of Stata was released in 1985 and its popularity grew only gradually, so it comes as no surprise that the earlier versions of the data did not come with Stata support. Writing post hoc support for Stata is tedious, and obviously we all would like other people to do the tedious work for us, but it does not always work that way...
  • Nick Cox
    Nick Cox about 11 years
    @Buras: Your most recent report is no more than "I tried to use the .do files". That is no detail at all to comment on. The documentation does look very complicated, but unfortunately only those who also want to use those data are likely to be motivated to read it.
  • Buras
    Buras about 11 years
    Thank You for answer...I need all varialbes for all years. Kamil explained how to deal with the files from 1989 to 2012. However, I need also 1962 to 1989 files. So s/he recommends to write .do and .dct for those years. I read the .pdfs but I am still confused how to write .do and .dct ... Does each line of .dat contains encrypted values for each variable or ready values? What is the delimiter? etc...
  • user829755
    user829755 about 11 years
    delimiter is "", i.e. was omitted to save space (1964 was just different, you know). Instead each field has fixed length (Digits column in pdf). The Positions column defines the range of characters for each field. Example: 10. "Age by Single Years" is a 2-digit number found in chars 33 and 34 (0=first) of each line of type B (as I said that there's two kinds of lines, I call them A and B). 11. "Recoded Age" is another column dealing with age and the digit found as char 35 is identical to what you get when looking up "Age by Single Years" in the legend found in the description of 11. got it?
  • Buras
    Buras about 11 years
    Thank you i get it. Do you think it is worth trying to write .do and .dct for the years 1962 to 1989 taking 1989 to 2012 as a benchmark? I tried to open 1962 using 2012 .do and .dct but it did not work...
  • user829755
    user829755 about 11 years
    no idea. know nothing about .do, .dct, not even stata. depends on your salary per hour and on what you can achieve with the result. be prepared for additional problems like glitches in the data, or unreadable format descriptions. did you see those pages that are rotated or even upside down? certainly looks like a lot of work. is this one of these crazy PhD thesis ideas where cheap students are burned?