Presentation on Data transformation in Stata.

anshukgec1599 9 views 20 slides Mar 11, 2025
Slide 1
Slide 1 of 20
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20

About This Presentation

Data transformation commands in Stata. Commands like compress,bysort,real,mvdecode,substr,length,
lower,upper,trim,round,format,regex,date,export excel used in stata.It discusses regular expression


Slide Content

Training on Data Processing with Stata by Anshuman Bhattacharjee Day 2, Sessions II Commands in Stata( compress,bysort,real,mvdecode,substr,length , lower,upper,trim,round,format,regex,date , export excel

compress It reduces the data sizes of the datasets. It demotes data types . Doubles(8 bytes) to longs(4 bytes), ints (2 bytes), or bytes(1 byte) Floats(4 bytes) to ints (2 bytes), or bytes(1 byte) longs to ints or bytes ints to bytes str#s to shorter str#s strLs to str#s It considers coalescing strLs within each strL variable. If a strL variable takes on the same value in multiple observations, compress can link those values to a single memory location to save memory Contd..

compress

bysort It repeats the same command on each group of observation defined by varlist . bysort varlist : stata_cmd

real It is used to convert number stored in string to number or missing

mvdecode “ mvdecode ” changes occurrences of a numlist in the specified varlist to a missing-value code. It can not be used on string variables.

substr substr (s,n1,n2)- This command is used to get substring from main string “s” starting at “n1” and of length “n2”. If n1 < 0, n1 is interpreted as the distance from the end of the string; if n2 = . (missing), the remaining portion of the string is returned. substr ("abcdef",2,3) = " bcd " substr ("abcdef",-3,2) = "de" substr ("abcdef",2,.) = " bcdef " substr ("abcdef",-3,.) = "def" substr ("abcdef",2,0) = "" substr ("abcdef",15,2) = ""

strlen,strlower,strupper,strtrim strlen (s) - This command is used to get the length of string in bytes. strlen (“STATA”) =5. strlen (“ab”) =2 strlower (s) - This command converts the string “s” into lowercase. strlower ("THIS") = "this". strlower (“Ab”) =“ab” strupper (s) - This command converts the string “s” into uppercase. strupper (“this") = “THIS". strupper (“Ab”) =“AB” strtrim (s) - This command removes all leading and trailing blanks in a string. strtrim (“ this ") = “this". NOTE:-Unicode characters cannot be used in strlen,strlower & strupper

round and format round( x,y ) or round(x):- x rounded in units of y or x rounded to the nearest integer if the argument y is omitted round(83.67,0.1)=83.7 round(83.67,0.01)=83.67 round(83.67,1.0)/round(83.67,1)=84 round(-5.2,1)=-5 round(-83.67,1.0)= -84 format:- It is used to set the display format associated with the variables specified . byte %8.0g int %8.0g long %12.0g float %9.0g double %10.0g str# %#s strL %9s DEFAULT FORMATS > format varlist % fmt > format % fmt varlist

round and format   Numerical format Description Example right-justified %#.#g general %9.0g %#.#f fixed %9.2f %#.#e exponential %10.7e %21x hexadecimal %21x     right-justified with commas %#.# gc general %9.0gc %#.#fc fixed %9.2fc     right-justified with leading zeros %0#.#f fixed %09.2f       left-justified %-#.#g general %-9.0g %-#.#f fixed %-9.2f %-#.#e exponential %-10.7e

date date(s1,s2[,Y])- the date (days since 01jan1960) corresponding to s1 based on s2 and Y. s1 contains the date, recorded as a string, in virtually any format. s2 is any permutation of M, D, and [##]Y, with their order defining the order that month, day, and year occur in s1. Y provides an alternate way of handling two-digit years. When a two-digit year is encountered, the largest year, topyear , that does not exceed Y is returned. date("1/15/19","MD20Y")-date("1/15/18","MD20Y")=365  

regex(Regular Expression) What are regular expressions? A regular expression is a sequence of characters that specifies a match pattern in text. A relatively easy, flexible method of searching strings. You can use them to search any string (e.g. variables, macros). Regular expressions are not the solution to every problem involving strings. In most cases the built in string functions in Stata will do at least as good a job, with less effort, and a lower probability of error. In Stata, there are three functions that use regular expressions.

regex regexm ( s,exp ) allows you to search for the string described in your regular expressions. It evaluates to 1 if the string matches the expression . regexs (n) returns the nth substring within an expression matched by regexm (hence, regexm must always be run before regexs ). regexr (s1,re,s2) searches for re within the string (s1) and replaces the matching portion with a new string (s2). * Asterisk means “match zero or more” of the preceding expression. + Plus sign means “match one or more” of the preceding expression. ? Question mark means “match either zero or one” of the preceding expression ^ When placed at the beginning of a regular expression, the caret means “match expression at beginning of string”. This character can be thought of as an “anchor” character since it does not directly match a character, only the location of the match. $ When the dollar sign is placed at the end of a regular expression, it means “match expression at end of string”. This is the other anchor character .

regex

regex

regexs

regex

export excel export excel :- Save subset of variables in memory to an Excel file command :- export excel [ varlist ] using filename [if] [in] [, export_excel_options ]

export excel export excel :-

THANK YOU
Tags