A showcase of what I have learn in Datacamp to become a Data Analyst !
Mostly SQL and PBI (with a lil help from Chat GPT)
Size: 1.66 MB
Language: en
Added: Oct 17, 2024
Slides: 16 pages
Slide Content
The game reviews db – from the dbpublic in Projects Data
PBI Summary Overall game analysis : It’s a booming industry ! AGR: nb_games = 185% total_games_sold = 115% The leader seems to be nintendo but competition is growing fast
PBI Summary Quantity vs Quality : - critics have been more generous since the 2010’s -Is there a corelation between best selling games and best critics scores? I made up a comparing method using ranks and the answer is : no! I also asked chat GPT to evaluate : -Pearson correlation = 0.072//0.062 -Spearman cor = 0.17 // 0.114
PBI Summary And the winner is Nintendo! The company started in 1985 with Super Mario Bros 2nd best revenue game of all time! The GOAT selling game is the Wii sports in 2006 and the best seller console is the Wii! But they made a mistake when they betrayed Sony in the early 90’s because…
PBI Summary …Trends are looking good for PS More devs, more (exclusive) games , more revenues! And who knows maybe they will have exclusive games that sale better than Nintendo games ;)
The Data Data available in datacamp for students to practice their skills
How many games in game reviews: SELECT COUNT(DISTINCT name) AS nb_games FROM public.game_sales Null values case + left join : SELECT SUM(CASE WHEN gs.name IS NULL THEN 1 ELSE 0 END) AS gs_name_null_value , SUM(CASE WHEN r.critic_score IS NULL THEN 1 ELSE 0 END) AS critic_score_null_values , SUM(CASE WHEN r.user_score IS NULL THEN 1 ELSE 0 END) AS user_score_null_values FROM public.game_sales AS gs LEFT JOIN reviews AS r ON gs.name = r.name; Join to export raw data for an Excel analysis SELECT gs.name, gs.platform , gs.publisher , gs.year , gs.games_sold , r.critic_score FROM public.game_sales AS gs LEFT JOIN public.reviews AS r ON gs.name = r.name WHERE r.critic_score IS NOT NULL - nb_games_gs : 400 games - nb_games_r : 400 games - Null values in gs.name : 0 - Null values in cs : 1 / LJ : 31 - Null values in cs : 212 / LF : 222 - When Joined +30 Null values that we can’t fill => only use cs ! - Many redundant values in name ( same company nut different names ) - Join to export csv or export both tables to work on PBI ( join col = name ) We can work with 369 rows ! Exploring game_sales & reviews
ANALYZING game_sales & reviews game_sales basis stats: SELECT COUNT(DISTINCT name) AS nb_games , SUM( games_sold ) AS tot_sold , MAX ( games_sold ) AS max_sold , MIN( games_sold ) AS min_sold , AVG( games_sold ) AS avg_sold , STDDEV( games_sold ) AS std_sold FROM public.game_sales ; game_sales period : SELECT MIN(year) AS first_year , MAX(year) AS last_year FROM public.game_sales ; Reviews basic stats: SELECT AVG( critic_score ) AS avg_cs , MAX( critic_score ) AS max_cs , MIN( critic_score ) AS min_cs , STDDEV( public.reviews.critic_score ) FROM public.reviews WHERE critic_score IS NOT NULL - gs overview => Between 1981 and 2020 400 games generated 3 478.55M $, range 3,98-82,90$, avg 8,70 (big gaps) -r overview => AVG cs is 8,57! From 2 to 10 are critics too generous ? AVG us is 7,72, from1,10 to 10, still pretty high (but 212 null values!)
SALES Analysis in game_sales -In 2010, $219,30 total sales -In 1981, $4,31 total sales - Year with most games made ≠ Year most selling - Year most games 2011 with 26 - Year less games 1981 with 1 -AVG sales by year => 94,01$ (code in notes) -8/10 best selling games were made by Nintendo ! Best and Worst selling games years : WITH yearly_sales AS ( SELECT year, SUM( games_sold ) AS total_sales FROM public.game_sales GROUP BY year), max_min_sales AS ( SELECT year, total_sales , 'max' AS sales_type FROM yearly_sales WHERE total_sales = (SELECT MAX( total_sales ) FROM yearly_sales ) UNION ALL SELECT year, total_sales , 'min' AS sales_type FROM yearly_sales WHERE total_sales = (SELECT MIN( total_sales ) FROM yearly_sales ) ) SELECT * FROM max_min_sales ; How many games per year? LIMIT 5: SELECT year, COUNT(name) AS nb_games , SUM( games_sold ) AS tot_sold FROM public.game_sales GROUP BY year ORDER BY nb_games DESC LIMIT 5; Years Game industry sold x3 the avg_games_sold SELECT year, name, publisher, SUM( games_sold ) AS tot_sold FROM public.game_sales GROUP BY year,name , games_sold HAVING games_sold > (SELECT AVG( games_sold )*3 FROM public.game_sales ) ORDER BY tot_sold DESC;
SALES Analysis in game_sales by year -In 2011 made 26 games , 16 more than the avg / year (10,81) -Wii sports ( from 2006) is the best selling game ever , with 82.90M$ which is 74M$ more than avg - Worst selling game : Namco Museum with 3.98M$ in 2005 Nb of published game by year vs the avg_year -- Step 1: Count the number of games per year WITH games_per_year AS ( SELECT year, COUNT(*) AS nb_games FROM game_sales -- Using the correct table name GROUP BY year ) --Final query: Calculate the difference SELECT gpy.year , gpy.nb_games , gpy.nb_games - (SELECT COUNT(*) / COUNT(DISTINCT year) FROM game_sales ) AS diff_nb_games_avg FROM games_per_year gpy ORDER BY nb_games DESC; Game sales compared to avg SELECT year, name, SUM( games_sold ) AS tot_sold , SUM( games_sold ) - AVG( games_sold ) OVER () AS diff_tot_avg_sales FROM public.game_sales GROUP BY year, name, games_sold ORDER BY diff_tot_avg_sales DESC; Best and Worst selling games SELECT name, games_sold , publisher, year FROM public.game_sales WHERE games_sold = (SELECT MAX( games_sold ) FROM public.game_sales ) OR games_sold = (SELECT MIN( games_sold ) FROM public.game_sales )
SALES Analysis in game_sales by platform & console -4/5 best selling platforms (consoles) is Wii and the other one is NES (Nintendo) -In 2006 with 6 games , NINTENDO sold 151,78M$ -In 2014 PlayStation (PS) managed to reach top 5 selling 84,51M$, with 12 games -PS tends to make more games 7/10 in the top 10, and 151 games made tot since 1996 => avg of 6,3 per year (vs nintendo (4,02) Nb of games , and sales by platform over the years SELECT year, platform, COUNT(name) AS nb_games , SUM( games_sold ) AS tot_sold FROM public.game_sales GROUP by public.game_sales.platform , year ORDER BY tot_sold DESC, year DESC LIMIT 5; Console sales and nb of published games SELECT CASE WHEN platform LIKE 'PS%' THEN 'Playstation’ WHEN platform LIKE 'X%' THEN 'Xbox’ WHEN platform = 'PC' THEN 'PC’ WHEN platform IN ('GEN','2600') THEN ' Other consoles’ ELSE 'NINTENDO’ END AS p_category , COUNT( name ) AS nb_games , SUM( games_sold ) AS tot_sold FROM public.game_sales GROUP BY p_category ORDER BY tot_sold DESC, nb_games DESC;
SALES Analysis in game_sales by decade -AVG growth rate : nb_games = 185% total_games_sold = 115% Nb of games , sales and critic score by decade SELECT CASE WHEN gs.year BETWEEN 1980 AND 1989 THEN '1980s’ WHEN gs.year BETWEEN 1990 AND 1999 THEN '1990s’ WHEN gs.year BETWEEN 2000 AND 2009 THEN '2000s’ WHEN gs.year BETWEEN 2010 AND 2019 THEN '2010s’ ELSE NULL END AS decade, COUNT(gs.name) AS nb_games , SUM( gs.games_sold ) AS total_games_sold , AVG( r.critic_score ) AS avg_critic_score FROM public.game_sales AS gs LEFT JOIN public.reviews AS r ON gs.name = r.name WHERE r.critic_score IS NOT NULL GROUP BY CASE WHEN gs.year BETWEEN 1980 AND 1989 THEN '1980s’ WHEN gs.year BETWEEN 1990 AND 1999 THEN '1990s’ WHEN gs.year BETWEEN 2000 AND 2009 THEN '2000s’ WHEN gs.year BETWEEN 2010 AND 2019 THEN '2010s’ ELSE NULL END HAVING CASE WHEN gs.year BETWEEN 1980 AND 1989 THEN '1980s’ WHEN gs.year BETWEEN 1990 AND 1999 THEN '1990s’ WHEN gs.year BETWEEN 2000 AND 2009 THEN '2000s’ WHEN gs.year BETWEEN 2010 AND 2019 THEN '2010s’ ELSE NULL END IS NOT NULL ORDER BY decade;
Quality Analysis in reviews -10/10 games for the critics : Super Mario, Minecraft, GTA V , IV, Mario Kart SNES, ZELDA-Link to the Past (3 NINTENDO!) -10/10 games for users : Zelda - BOTW, God of War , FF X, Zelda – Ocarina, RDR (2 NIN & 2 PS) Comparing critics vs users max and mins WITH max_cte AS ( SELECT name,critic_score , user_score FROM public.reviews WHERE critic_score = (SELECT MAX( critic_score ) FROM reviews ) OR user_score = (SELECT MAX( user_score ) FROM reviews ) ), min_cte AS ( SELECT name,critic_score , user_score FROM public.reviews WHERE critic_score = (SELECT MIN( critic_score ) FROM reviews ) OR user_score = (SELECT MIN( user_score ) FROM reviews ) ) SELECT name , critic_score , user_score FROM max_cte UNION ALL SELECT name , critic_score , user_score FROM min_cte
Quality Analysis in reviews -XBOX has the best critic score avg ! But only 48 games - Mojang is the best developer according to cs 1 game a10/10. -Rockstar Games is the best publisher with 16 games and 9.5 critic score Critic score by console SELECT CASE WHEN gs.platform LIKE 'PS%' THEN 'Playstation’ WHEN gs.platform LIKE 'X%' THEN 'Xbox’ WHEN gs.platform = 'PC' THEN 'PC’ WHEN gs.platform IN ('GEN','2600') THEN ' Other consoles’ ELSE 'NINTENDO’ END AS p_category , COUNT( r.critic_score ), AVG( r.critic_score ) AS avg_cs FROM public.game_sales AS gs LEFT JOIN public.reviews AS r ON gs.name = r.name GROUP BY p_category ORDER BY avg_cs DESC Critic score avg and nb_games by publisher SELECT gs.publisher , AVG( r.critic_score ) AS avg_c_s , COUNT( gs.publisher ) AS nb_games , SUM ( gs.games_sold ) AS tot_sold FROM public.game_sales AS gs LEFT JOIN public.reviews AS r ON gs.name=r.name GROUP BY gs.publisher HAVING AVG( r.critic_score ) IS NOT NULL ORDER BY avg_c_s DESC, nb_games DESC;
Quality Analysis in reviews – overall_score Overall score = cs + us -> remember many null values in us -Ocarina/BOTW/ GoW /RDR/ Mario Galawy -Nintendo, Sony Rockstar and Activision have the best os Best games overall_score SELECT gs.name, ( r.critic_score + r.user_score ) AS overall_score , gs.publisher FROM public.reviews AS r LEFT JOIN public.game_sales AS gs ON r.name = gs.name WHERE ( r.critic_score + r.user_score ) IS NOT NULL ORDER BY overall_score DESC LIMIT 5; Publishers by overall_score SELECT gs.publisher , ( r.critic_score + r.user_score ) AS overall_score , RANK() OVER(ORDER BY ( r.critic_score + r.user_score ) DESC) FROM public.game_sales AS gs LEFT JOIN public.reviews As r ON gs.name = r.name GROUP BY gs.publisher , ( r.critic_score + r.user_score ) HAVING ( r.critic_score + r.user_score ) IS NOT NULL;
Is there a correlation between sales and quality ? I decided to rank the critic score and sales. Then I compared the diff between both rankings and if => -diff less than 10 ‘close’ -diff between 10-30 ‘medium’ -diff more than 30 ‘ huge ’ Huge = 81% Medium =13% Close = 6% Nothing suggest that biggest sales make for better cs for this dataset Best games overall_score WITH gap_cte AS ( SELECT subquery.year , subquery.rank_sold , subquery.rank_cs , CASE WHEN ABS( subquery.rank_sold - subquery.rank_cs ) <= 10 THEN 'close’ WHEN ABS( subquery.rank_sold - subquery.rank_cs ) <= 30 THEN 'medium’ ELSE ' huge ’ END AS gap_sold_cs FROM ( SELECT gs.year , RANK() OVER(ORDER BY gs.games_sold DESC) AS rank_sold , RANK() OVER(ORDER BY r.critic_score DESC) AS rank_cs FROM public.game_sales AS gs LEFT JOIN public.reviews AS r ON gs.name = r.name WHERE r.critic_score IS NOT NULL ) AS subquery ) SELECT gap_sold_cs , COUNT( gap_sold_cs ) FROM gap_cte GROUP BY gap_sold_cs