ANURAG UNIVERSITY Pre-Ph.D course work Research Methodology-II (April, 2025 Batch) 23-08-2025
Topics to be covered Introduction Jamovi , Installation steps, Navigation (features) in Jamovi Mean, SD, Skewness, Kurtosis, Outliers, Normal distribution Correlation analysis
JAMOVI
Jamovi is a free, open-source, and user-friendly statistical software designed to simplify statistical analysis with a point-and-click interface for users. It can perform tasks like t-tests, ANOVA, regression, and data visualization. It offers both desktop and cloud-based versions, making statistical analysis accessible, intuitive, and a cost-effective alternative to proprietary software like SPSS
Key Features & Benefits Free and Open-Source : Jamovi is completely free to use and open-source, making it accessible to everyone. User-Friendly Interface : It features a spreadsheet-like data entry system and a point-and-click graphical user interface (GUI) that is intuitive and simplifies statistical analysis.
Dynamic Output : All calculations and analyses update automatically when data changes, and users can easily edit, add, or remove elements from the output tables and plots. Wide Range of Analyses : Jamovi supports various statistical analyses, including descriptive statistics, t-tests, ANOVA, linear regression, and various model types.
1. Data Input : Data can be entered directly into the spreadsheet or imported from various file formats. 2. Data Management : You can change data types (e.g., nominal, continuous), compute new variables, and filter data within the software. 3. Analysis Selection : Users select statistical analyses from a menu-driven system. 4. Results Generation : The results, including APA-formatted tables and attractive plots, are generated in a separate results panel and update dynamically.
T wo broad types of statistics
Descriptive statistics Descriptive statistics summarize and organize characteristics of a dataset. They describe the data you have . To present raw data in a meaningful way, making it easier to understand.
Common Tools: Measures of central tendency : Mean, Median, Mode Measures of variability : Range, Variance, Standard Deviation Frequency distributions , charts, and graphs (like histograms, pie charts)
Inferential Statistics Inferential statistics use a sample of data to make inferences or predictions about a larger population. To make generalizations, test hypotheses, and determine relationships between variables.
Common Tools: Hypothesis testing (e.g., t-tests, chi-square tests) Confidence intervals Regression analysis ANOVA (Analysis of Variance)
Measures of Central Tendency These are values that represent the center or typical value in a dataset. Measures of Variability (Dispersion) These describe how spread out the data is. They show the extent to which data points differ from the central value.
Mean Mean is a measure of central tendency, which is a way to find the "average" value in a set of numbers. It gives us a single value that represents the typical value in the data set.
You have the following test scores: 70, 80, 90, 100, 85 Calculate the mean: Sum = 70 + 80 + 90 + 100 + 85 = 425 Number of scores = 5 Mean = 425 ÷ 5 = 85 Interpretation : The average score is 85, so on average, students scored 85 on this test.
Standard Deviation Standard deviation measures how spread out or spread apart the numbers in a data set are from the mean (average ). If the standard deviation is small , the data points are close to the mean . If the standard deviation is large , the data points are more spread out over a wider range.
Why is it Important ? While the mean tells you the average, the standard deviation tells you how much variation or consistency there is in the data.
For example, if you had two classes of students with the same average test score, the class with a low standard deviation had scores very close to the average, while the class with a high standard deviation had a wider range of scores, with some students doing very well and others doing poorly.
T est scores of students: 70, 80, 90, 100, 110 Standard deviation = 14.14 Interpretation: Scores are spread out about 14 points from the average score of 90.
Test scores: 85, 87, 90, 92, 93 Standard deviation = 3.01 Interpretation: Scores vary by about 3 points from the average of 89.4 — much more consistent than the previous example.
Standard deviation measures spread or variability. Low standard deviation = data points are close to the mean. High standard deviation = data points are spread out over a wide range.
Skewness and Kurtosis Skewness and Kurtosis are both important statistical concepts that help describe the shape of a distribution They go beyond the basics like mean and standard deviation.
Asymmetrical distribution is a situation in which the values of variables occur at irregular frequencies and the mean, median, and mode occur at different points.
Types of Skewness Zero skewness : Data is perfectly symmetrical (like a normal distribution) Positive skew (Right-skewed) : Tail is longer on the right Most values are on the left Mean > Median Negative skew (Left-skewed) : Tail is longer on the left Most values are on the right Mean < Median
Normal distribution
How to Interpret Skewness Numerically Skewness > 0: Positive skew (right tail longer) Skewness < 0: Negative skew (left tail longer) Skewness ≈ 0: Symmetrical distribution
Acceptable values Between -2 and +2 is generally considered normal.
Kurtosis Kurtosis measures the " tailedness " or peakedness of a data distribution . It tells us how heavy or light the tails of a distribution are compared to a normal distribution . Essentially, kurtosis shows the likelihood of extreme values (outliers).
Types of Kurtosis Mesokurtic (Normal Kurtosis) Distribution has a kurtosis similar to the normal distribution. Moderate tails and peak. Example: Heights of people, IQ scores . Leptokurtic (High Kurtosis) Distribution has heavy tails and a sharp peak. More prone to producing outliers. Example: Financial returns during a market crash (extreme highs and lows ). Platykurtic (Low Kurtosis) Distribution has light tails and a flatter peak. Fewer outliers than normal distribution. Example: Uniform distribution or test scores where most students score similarly.
Type Kurtosis Value Peak Tails Outliers Mesokurtic ≈ 3 Moderate Moderate Normal Leptokurtic > 3 Sharp/High Heavy/Fat Many Platykurtic < 3 Flat/Low Light/Thin Few
Why Kurtosis Matters ? Helps identify whether your data has more or fewer extreme values than a normal distribution . Influences which statistical methods to use.
Acceptable values Between -7 and +7 is generally considered normal.
Outlier An outlier is a data point that is significantly different from the rest of the data. It’s much higher or lower than most of the values in the dataset
Common Causes of Outliers Data entry errors (e.g., typing 1000 instead of 100 ) Measurement errors Genuine rare events (e.g., a millionaire in income data ) Natural variation in data
Test Scores Test scores of 10 students: 55, 58, 60, 62, 63, 65, 67, 68, 100 , 69 Most scores are around 55–69. The score 100 is much higher than the rest — it’s an outlier
Daily temperatures (°C ): 22 , 23, 22, 21, 22, 40, 23 Which is outlier???
Handle Outliers Remove Outliers If they're errors or clearly irrelevant Use with caution: don’t remove valid data points just to "clean" the data Transform the Data Apply transformations to reduce the influence of outliers: Especially useful for right-skewed data Cap/Floor Outliers) Replace extreme values with a threshold value Example: Values above 95th percentile = 95th percentile 🔄
Normality or Normal distribution Normality refers to whether or not a dataset follows a normal distribution. It called a bell curve
In a normal distribution : The mean = median = mode The data is symmetrical around the mean Most values are clustered around the center The shape looks like a bell curve
Why Normality Matters Statistical Tests Many statistical tests (like t-tests, ANOVA, regression) assume that the data is normally distributed. If it isn’t, the results may not be valid.
How to check normality
Correlation Correlation analysis measures the strength and direction of the linear relationship between two variables . It tells us if, and how strongly, two variables move together . The result is called the correlation coefficient, usually denoted as r.
The Correlation Coefficient (r ) Values range from -1 to +1 . r = +1 means perfect positive correlation (both variables increase together ). r = -1 means perfect negative correlation (one variable increases while the other decreases ). r = 0 means no linear correlation.
Positive Correlation Variables : Hours studied vs. Exam scores Usually, as hours studied increase, exam scores also increase . Suppose r = +0.85 → strong positive correlation.
Negative Correlation Variables : Number of hours using Instagram vs . Exam scores More Instagram time might relate to lower exam scores . Suppose r = -0.60 → moderate negative correlation.
No Correlation Variables : Shoe size vs. Exam scores No logical connection, so r = 0.