Data transformation commands in Stata. Commands like compress,bysort,real,mvdecode,substr,length,
lower,upper,trim,round,format,regex,date,export excel used in stata.It discusses regular expression
Size: 759.21 KB
Language: en
Added: Mar 11, 2025
Slides: 20 pages
Slide Content
Training on Data Processing with Stata by Anshuman Bhattacharjee Day 2, Sessions II Commands in Stata( compress,bysort,real,mvdecode,substr,length , lower,upper,trim,round,format,regex,date , export excel
compress It reduces the data sizes of the datasets. It demotes data types . Doubles(8 bytes) to longs(4 bytes), ints (2 bytes), or bytes(1 byte) Floats(4 bytes) to ints (2 bytes), or bytes(1 byte) longs to ints or bytes ints to bytes str#s to shorter str#s strLs to str#s It considers coalescing strLs within each strL variable. If a strL variable takes on the same value in multiple observations, compress can link those values to a single memory location to save memory Contd..
compress
bysort It repeats the same command on each group of observation defined by varlist . bysort varlist : stata_cmd
real It is used to convert number stored in string to number or missing
mvdecode “ mvdecode ” changes occurrences of a numlist in the specified varlist to a missing-value code. It can not be used on string variables.
substr substr (s,n1,n2)- This command is used to get substring from main string “s” starting at “n1” and of length “n2”. If n1 < 0, n1 is interpreted as the distance from the end of the string; if n2 = . (missing), the remaining portion of the string is returned. substr ("abcdef",2,3) = " bcd " substr ("abcdef",-3,2) = "de" substr ("abcdef",2,.) = " bcdef " substr ("abcdef",-3,.) = "def" substr ("abcdef",2,0) = "" substr ("abcdef",15,2) = ""
strlen,strlower,strupper,strtrim strlen (s) - This command is used to get the length of string in bytes. strlen (“STATA”) =5. strlen (“ab”) =2 strlower (s) - This command converts the string “s” into lowercase. strlower ("THIS") = "this". strlower (“Ab”) =“ab” strupper (s) - This command converts the string “s” into uppercase. strupper (“this") = “THIS". strupper (“Ab”) =“AB” strtrim (s) - This command removes all leading and trailing blanks in a string. strtrim (“ this ") = “this". NOTE:-Unicode characters cannot be used in strlen,strlower & strupper
round and format round( x,y ) or round(x):- x rounded in units of y or x rounded to the nearest integer if the argument y is omitted round(83.67,0.1)=83.7 round(83.67,0.01)=83.67 round(83.67,1.0)/round(83.67,1)=84 round(-5.2,1)=-5 round(-83.67,1.0)= -84 format:- It is used to set the display format associated with the variables specified . byte %8.0g int %8.0g long %12.0g float %9.0g double %10.0g str# %#s strL %9s DEFAULT FORMATS > format varlist % fmt > format % fmt varlist
round and format Numerical format Description Example right-justified %#.#g general %9.0g %#.#f fixed %9.2f %#.#e exponential %10.7e %21x hexadecimal %21x right-justified with commas %#.# gc general %9.0gc %#.#fc fixed %9.2fc right-justified with leading zeros %0#.#f fixed %09.2f left-justified %-#.#g general %-9.0g %-#.#f fixed %-9.2f %-#.#e exponential %-10.7e
date date(s1,s2[,Y])- the date (days since 01jan1960) corresponding to s1 based on s2 and Y. s1 contains the date, recorded as a string, in virtually any format. s2 is any permutation of M, D, and [##]Y, with their order defining the order that month, day, and year occur in s1. Y provides an alternate way of handling two-digit years. When a two-digit year is encountered, the largest year, topyear , that does not exceed Y is returned. date("1/15/19","MD20Y")-date("1/15/18","MD20Y")=365
regex(Regular Expression) What are regular expressions? A regular expression is a sequence of characters that specifies a match pattern in text. A relatively easy, flexible method of searching strings. You can use them to search any string (e.g. variables, macros). Regular expressions are not the solution to every problem involving strings. In most cases the built in string functions in Stata will do at least as good a job, with less effort, and a lower probability of error. In Stata, there are three functions that use regular expressions.
regex regexm ( s,exp ) allows you to search for the string described in your regular expressions. It evaluates to 1 if the string matches the expression . regexs (n) returns the nth substring within an expression matched by regexm (hence, regexm must always be run before regexs ). regexr (s1,re,s2) searches for re within the string (s1) and replaces the matching portion with a new string (s2). * Asterisk means “match zero or more” of the preceding expression. + Plus sign means “match one or more” of the preceding expression. ? Question mark means “match either zero or one” of the preceding expression ^ When placed at the beginning of a regular expression, the caret means “match expression at beginning of string”. This character can be thought of as an “anchor” character since it does not directly match a character, only the location of the match. $ When the dollar sign is placed at the end of a regular expression, it means “match expression at end of string”. This is the other anchor character .
regex
regex
regexs
regex
export excel export excel :- Save subset of variables in memory to an Excel file command :- export excel [ varlist ] using filename [if] [in] [, export_excel_options ]