Importing and Exporting Files STATA MINISTRY OF HEALTH
Importing From Excel For data files that are foreign to STATA, such as Excel or text data Files, you can import them into STATA
Importing From Excel To import an Excel file (e.g. “Example_Dataset.xlsx”) click on File, then on Import, then on Excel spreadsheet . A new window will open. Click Browse and navigate to the folder where the data file you want to use is stored, and then click on Open. You will see a preview of the data file in the “Import Excel” window. If the first row of your data file contains the variable names, check the box next to “Import first row as variable names”:
Importing From Excel
Importing From Excel
Importing From CSV
Open STATA Files Once you have successfully imported your data file into Stata, you should save it as a .dta file so it is easier to read into Stata for future analysis. To open a STATA file (.dta), go to file then click open and navigate to your file Or use the command use mydata.dta
Examining Datasets: Clear The clear command deletes all files, variables, and labels from the memory to get ready to use a new data file You can clear memory using the clear command or by using it as part of the use command This command does not delete any data saved to the hard drive Example Command: clear
Examining Datasets: edit/browse Once you have loaded your data or if you open a .dta file, you can use any of the following commands to open the data editor edit browse You can browse/edit Specific variables E.g browse Sex
Examining Datasets: Browse Numeric variables appear black String (text format) variables appear red Numeric variables with value labels appear blue
Examining Datasets: Describe This command provides a brief description of the data file . You can use “des” or “d” as a shorthand for describe. The output includes: the number of variables the number of observations (records) the size of the file the list of variables and their characteristics Command: describe
Examining Datasets: Describe example
Examining dataset: Codebook The codebook command is a great tool for getting a quick overview of the variables in the data file. It produces a kind of electronic codebook from the data file, displaying information about variables' names, labels and values
Examining dataset: Inspect It is another useful command for a quick data file overview. inspect command displays information about the values of variables and is useful for checking data accuracy
tabulate tabulate (abbreviated tab here ) frequency tables (# observations per value) string or numeric variables ok but not continuous numeric variables Questions to ask with tab : Is this the number of categories I expect? Are the categories numbered and labeled as I expect? Do these frequencies look right? Important data management options miss - treat missing values as another category *table of frequencies of race * display missing tab sex , missing *2-way table of frequencies tab facility sex
summarize summarize (abbreviated summ ) provides summary statistics for numeric variables, which may be useful for data management: Number of non-missing observations Does this variable have the correct number of missing observations? Mean and standard deviation Are there any data errors or missing data codes that make the mean or standard deviation seem implausible? Minimum and maximum Are the min and max within the plausible range for this variable? *summary stats for variable y summarize age *summary stats for all numeric variables summarize
Examining dataset: Logical operators used in Stata