Bring ASCII Data into Stata

Video Tutorial

Using Stata to Convert ASCII Data

Please note the assumptions that went into the creation of this document: it is assumed the user has access to Stata software and has some basic Stata experience. Also, please note that this procedure is based on Roper Center datasets only and we recommend the use of the Stata “do file” for the best results.

The Best Way to Approach This Task

Review the study documentation (also referred to as the codebook) first to identify the questions you want to analyze in Stata. The codebook fully describes the dataset. It includes information regarding the survey such as the study number, title, name of the survey organization that conducted the study, the sponsor (if applicable), the field dates, type of sample(s), sample size, type of interview, weight information, and number of records per respondent. The study methodology and any usage notes may also appear. The codebook/questionnaire includes the question numbers, question text, responses including the codes and labels, as well as the card and column locations for each variable. Some studies may include a list of variables and the card and column location for each variable. The back of the codebook includes a dump of the data referred to as an x-ray which shows the un-weighted number of cases in each punch for each column.
Download both the documentation and data file at the same time
Based on the number of records per respondent listed on the codebook cover page, determine the appropriate Stata “do file” needed (single record or multi-record)
Use the examples and rules provided to create the required Stata “do file.”
Weighting – If the codebook indicates there is a weight variable, the analysis should be run weighted in order to make the responses be representative of the population surveyed, and to replicate the responses published by the survey organization.

We will focus on determining the appropriate Stata “do file” needed and creating the required Stata “do file.”

ASCII Data Files

ASCII data files are often referred to as “text” files or “plain text” files. They contain no formatting information–just rows of characters. The “mapping” information for the characters comes from the codebook discussed in step one above.

Two types

Single Record – respondent’s output is recorded on one line or row
Multi-Record – respondent’s output is recorded on more than one line or row

Is the ASCII Single Record or Multi-Record?

Refer to the codebook cover page where the number of records per respondent is listed. If this number is “1” the file is a single record file, otherwise it is a multi-record file.

Bringing ASCII Data into Stata

Stata “do file”

Since no structure is included in an ASCII data file, a Stata “do file” must be created to instruct the software on where to go to get particular variables (questions).

Stata Commands That Every Stata “do file” Should Have

Clear – This ensures Stata’s memory is clear
Set more off – This command makes sure Stata runs all commands in the “do file” without stopping
Infix – A command that tells Stata to read a fixed-format ASCII data file
Variable names – Assigns a name to each variable and includes information on the column location of the variable in the raw text (ASCII) file
Using – A suffix that tells Stata the path directory of where the raw data (ASCII) file is located
Variable Labels – Assigns descriptive labels to variables in the dataset
Value Labels – Assigns response labels for each variable.

Example of a Single Record Stata “do file”
*this program reads a single record data file into Stata clear set more off infix Q08 50-51 Sex 98-99 using "c:\temp\abcw887.dat" label var Q08 "FBI Monitoring" label var Sex "Gender of Respondent" label define Q08l 1 "Support" 2 "Oppose" 8 "DK" 9 "NA-Refused" label values Q08 Q08l label define Sexl 1 "Male" 2 "Female" label values Sex Sexl

Example of a Single Record Stata “do file”

*this program reads a single record data file into Stata

clear
set more off

infix Q08 50-51 Sex 98-99 using "c:\temp\abcw887.dat"

label var Q08 "FBI Monitoring"
label var Sex "Gender of Respondent"

label define Q08l 1 "Support" 2 "Oppose" 8 "DK" 9 "NA-Refused"
label values Q08 Q08l
label define Sexl 1 "Male" 2 "Female"
label values Sex Sexl

Stata Syntax Rules

Command syntax in Stata is case sensitive
Comments should start with an asterisk (*)
The Stata “do file”extension is .do
Stata programs can be written up and edited in a basic text file editor (Notepad or Wordpad)

Run the “do file”

There are two ways to run a “do file.” One is to open the file in Stata’s “do file” editor, highlight all commands and click “do.” This is helpful for making sure there are no errors in the “do file.” The most straightforward way, once you have a “do file” free of errors, is to open up Stata and select, from the pull down menu, the run button on Stata. Do and then click the name of your “do file” to run it. This will run the commands in your “do file” and read your dataset into Stata.

Example of Multi-Record Stata “do file”
*this program reads a multi-record data file into Stata clear set more off nfix 4 lines 1: str Q01a 6 str Q01b 7 2: Q22 8-9 3: Sex 28 4: PartyId 57

Example of Multi-Record Stata “do file”

*this program reads a multi-record data file into Stata

clear
set more off

nfix 4 lines 1: str Q01a 6 str Q01b 7 2: Q22 8-9 3: Sex 28 4: PartyId 57

Tips & Troubleshooting

If you make a mistake in your “do file”, Stata will execute every command up until the mistake and produce an error message indicating which command has the error.
If this is your first attempt at writing a Stata “do file”, run the file after 1-2 questions to make error identification easier. Once the file is error-free you can add additional questions and run the file again, continuing the process until all questions have been included.
The general principles outlined here for Stata apply to SAS and SPSS as well. The syntax will be different, but the principles are the same.
Sometimes you may run out of memory in using a large datafile. Add the command “set mem 1000m” to allow Stata to use 1 gigabyte of memory (this depends on how much memory your system has).

Complete Stata “do files” with ASCII Data File and Codebook

Single Record Stata “do file” and Dataset Abstract: ABC News/Washington Post Poll # 1991-9142: Thomas Vote Delay Poll #1, October 8, 1991; Study # USABCWASH1991-9142

Multi-Record Stata “do file” and Dataset Abstract: Gallup/CNN/USA Today Poll # 1998-9808026: Anti-Terrorist Air Strikes, August 20, 1998; Study # USAIPOCNUS1998-9808026

Additional Stata Resources

Research Technologies at Indiana University (https://kb.iu.edu/d/aflyl)
UCLA Academic Technology Services (https://stats.idre.ucla.edu/stata/)

This Page Powered by Stata: Software for Statistics and Science