Presentation on Data transformation in Stata.

anshukgec1599 9 views 20 slides Mar 11, 2025

Slide 1 of 20

About This Presentation

Data transformation commands in Stata. Commands like compress,bysort,real,mvdecode,substr,length,
lower,upper,trim,round,format,regex,date,export excel used in stata.It discusses regular expression

Size: 759.21 KB

Language: en

Added: Mar 11, 2025

Slides: 20 pages

Slide Content

Training on Data Processing with Stata by Anshuman Bhattacharjee Day 2, Sessions II Commands in Stata( compress,bysort,real,mvdecode,substr,length , lower,upper,trim,round,format,regex,date , export excel

compress It reduces the data sizes of the datasets. It demotes data types . Doubles(8 bytes) to longs(4 bytes), ints (2 bytes), or bytes(1 byte) Floats(4 bytes) to ints (2 bytes), or bytes(1 byte) longs to ints or bytes ints to bytes str#s to shorter str#s strLs to str#s It considers coalescing strLs within each strL variable. If a strL variable takes on the same value in multiple observations, compress can link those values to a single memory location to save memory Contd..

compress

bysort It repeats the same command on each group of observation defined by varlist . bysort varlist : stata_cmd

real It is used to convert number stored in string to number or missing

mvdecode “ mvdecode ” changes occurrences of a numlist in the specified varlist to a missing-value code. It can not be used on string variables.

substr substr (s,n1,n2)- This command is used to get substring from main string “s” starting at “n1” and of length “n2”. If n1 < 0, n1 is interpreted as the distance from the end of the string; if n2 = . (missing), the remaining portion of the string is returned. substr ("abcdef",2,3) = " bcd " substr ("abcdef",-3,2) = "de" substr ("abcdef",2,.) = " bcdef " substr ("abcdef",-3,.) = "def" substr ("abcdef",2,0) = "" substr ("abcdef",15,2) = ""

strlen,strlower,strupper,strtrim strlen (s) - This command is used to get the length of string in bytes. strlen (“STATA”) =5. strlen (“ab”) =2 strlower (s) - This command converts the string “s” into lowercase. strlower ("THIS") = "this". strlower (“Ab”) =“ab” strupper (s) - This command converts the string “s” into uppercase. strupper (“this") = “THIS". strupper (“Ab”) =“AB” strtrim (s) - This command removes all leading and trailing blanks in a string. strtrim (“ this ") = “this". NOTE:-Unicode characters cannot be used in strlen,strlower & strupper

round and format round( x,y ) or round(x):- x rounded in units of y or x rounded to the nearest integer if the argument y is omitted round(83.67,0.1)=83.7 round(83.67,0.01)=83.67 round(83.67,1.0)/round(83.67,1)=84 round(-5.2,1)=-5 round(-83.67,1.0)= -84 format:- It is used to set the display format associated with the variables specified . byte %8.0g int %8.0g long %12.0g float %9.0g double %10.0g str# %#s strL %9s DEFAULT FORMATS > format varlist % fmt > format % fmt varlist

round and format Numerical format Description Example right-justified %#.#g general %9.0g %#.#f fixed %9.2f %#.#e exponential %10.7e %21x hexadecimal %21x right-justified with commas %#.# gc general %9.0gc %#.#fc fixed %9.2fc right-justified with leading zeros %0#.#f fixed %09.2f left-justified %-#.#g general %-9.0g %-#.#f fixed %-9.2f %-#.#e exponential %-10.7e

date date(s1,s2[,Y])- the date (days since 01jan1960) corresponding to s1 based on s2 and Y. s1 contains the date, recorded as a string, in virtually any format. s2 is any permutation of M, D, and [##]Y, with their order defining the order that month, day, and year occur in s1. Y provides an alternate way of handling two-digit years. When a two-digit year is encountered, the largest year, topyear , that does not exceed Y is returned. date("1/15/19","MD20Y")-date("1/15/18","MD20Y")=365

regex(Regular Expression) What are regular expressions? A regular expression is a sequence of characters that specifies a match pattern in text. A relatively easy, flexible method of searching strings. You can use them to search any string (e.g. variables, macros). Regular expressions are not the solution to every problem involving strings. In most cases the built in string functions in Stata will do at least as good a job, with less effort, and a lower probability of error. In Stata, there are three functions that use regular expressions.

regex regexm ( s,exp ) allows you to search for the string described in your regular expressions. It evaluates to 1 if the string matches the expression . regexs (n) returns the nth substring within an expression matched by regexm (hence, regexm must always be run before regexs ). regexr (s1,re,s2) searches for re within the string (s1) and replaces the matching portion with a new string (s2). * Asterisk means “match zero or more” of the preceding expression. + Plus sign means “match one or more” of the preceding expression. ? Question mark means “match either zero or one” of the preceding expression ^ When placed at the beginning of a regular expression, the caret means “match expression at beginning of string”. This character can be thought of as an “anchor” character since it does not directly match a character, only the location of the match. $ When the dollar sign is placed at the end of a regular expression, it means “match expression at end of string”. This is the other anchor character .

regex

regexs

regex

export excel export excel :- Save subset of variables in memory to an Excel file command :- export excel [ varlist ] using filename [if] [in] [, export_excel_options ]

Presentation on Data transformation in Stata.

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Presentation on Data transformation in Stata.

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......