Beginner Data_Science_Interview_Questions.pdf

AvinashYadav112625 4 views 48 slides Feb 27, 2025
Slide 1
Slide 1 of 48
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48

About This Presentation

It is collected question for data science interview


Slide Content

DATASCIENCEINTERVIEWQUESTIONS

FAQ at interviews
Itmightcalmyournervestoknowthat
almosteveryjobseekerstruggles.That’s
becausedatascienceinterviewquestions
coverabunchofdifferenttopics(data
scienceisaninterdisciplinaryfield,afterall)
andthosecheekyinterviewerslovetothrow
youtheoddcurveball.Soherearesomeof
theFAQatinterviews…
The problem:

1.Whatdoesdatasciencemean?
2.Whataretheassumptionsofalinearregres-
sion?
3.Whatisthedifferencebetweenfactoranalysis
andclusteranalysis?
4.Whatisaniteratorgenerator?
5.WritedownanSQLscripttoreturndatafrom
twotables.
6.Drawgraphsrelevanttopay-per-clickad-
vertsandticketpurchases.
7.HowwouldyouexplainRandomForesttoa
non-technicalperson?
8.Howcanyouproveanimprovementyou
introducedtoamodelisactuallyworking?
9.Whatisrootcauseanalysis?
FAQ AT
INTERVIEWS

10.ExplainK-means.
11.WhatkindofRDBMSsoftwaredoyouhave
experiencewith?Whataboutnon-relational
databases?
12.Supervisedlearningvsunsupervisedlearning.
13.Whatisoverfittingandhowtofixit?
14.WhatisthedifferencebetweenSQL,MySQL
andSQLServer?
15.Howwouldyoustartcleaningabigdataset?
16.Giveexampleswhereafalsenegativeismore
importantthanafalsepositive,andviceversa.
17.Statesomebiasesthatyouarelikelytoen-
counterwhencleaningadatabase.
18.Whatisalogisticregression?
FAQ AT
INTERVIEWS

Itmightcalmyournervestoknowthatalmost
everyjobseekerstruggles.That’sbecausedata
scienceinterviewquestionscoverabunchof
differenttopics(datascienceisaninterdisciplinary
field,afterall)andthosecheekyinterviewerslove
tothrowyoutheoddcurveball.
Thefirststeptohittingthosecurveballsoutofthe
parkistoseethemcoming,andtoseethem
comingyou’vegottobeconfidentabouttherest
ofyourgame.
So,youmustdoyourhomework!Aninterviewer
canspotsomeonewhohasn’tfromamileaway,
butyouwouldn’tbehereifyoudidn’tknowthat
alreadythough,wouldyou?
FAQ AT
INTERVIEWS

Thereareplentyofarticlesouttherethatwillgive
youalltheexampleanswersyoucouldhopefor
andyes,technicalquestionswillcomeup.Butto
rememberonehundred-odddifferentexamples
wouldonlyservetoconfuseyoumore,pluswhat
ifaquestioncomesupyoudidn’tstudyfor?
Wewanttotakeyouthroughtheinterview
typology.Showyouwhatdatascience
interviewquestionsaremadeofandwhatthe
interviewersarelookingfor.
ANSWERING
DATA SCIENCE
QUESTIONS

1. Technical questions
1.1 Mathematics
1.2 Statistics
1.3 Coding
1.4 Machine Learning
2. Practical experience questions
3. Behavioral questions
4. Scenarios (case study questions)
CONTENTS
LET’S BREAK THINGS DOWN

TECHNICAL
QUESTIONS
Astronggraspofmathematics,statistics,
coding,andmachinelearningisamustfor
adatascientist.
Youarelikelytobeaskedtodemonstrate
yourhands-ontechnicalskillsbutprepare
toshowoffyourtheoreticaltechniques,
too!

MATHEMATICS

Mathematicsunderpinsthestudyofmachine
learning,statistics,algorithms,andcomputer
architecture,amongothers.
So,appliedmathsisattheheartofthematter.
Showingagoodgraspofmathematicssignalsto
theinterviewerthatyoucouldquicklyadaptto
thoseotherfields.
Questionslikethesearetocheckyouhavebasic
mathsskillsandshouldn’tbetootrickyforyou.
MATHEMATICS

Bepreparedtoanswersomequick(mental)maths
questions,suchas:
•Whatisthesumofnumbersfrom1to100?
•Asnailfallsdownawell50ftdeep.Eachdayit
climbsup3ft,andeachnightslidesdown1ft.
Howmanydaysdoesittakehimtogetout?
•Youhavea10x10x10cube,madeofone
thousand1x1x1cubes.Ifyouremovetheouter
layerofthisstructure,howmanycubeswillyou
haveleft?
MATHEMATICS

Herearesomereal-lifedatascienceinterview
questions:
•Aracetrackhas5lanes.Thereare25horses
andonewouldliketofindoutthe3fastest
horsesofthose25.Whatistheminimum
numberofracesonewouldneedtoconductto
determinethe3fastesthorses?
Thingsbecomealittlemoreinterestingwhen
encounteringpuzzlequestionsthattestyour
lateralthinking.
MATHEMATICS

•Fourpeopleneedtocrossaricketybridgeat
night.Unfortunately,theyhaveasingletorch
andthebridgeistoodangeroustocross
withoutone.Thebridgeisonlystrongenough
tosupporttwopeopleatatime.Notallpeople
takethesametimetocrossthebridge.Times
foreachperson:1min,2mins,7minsand10
mins.Whatistheshortesttimeneededforall
fourofthemtocrossthebridge?
MATHEMATICS

Finally,therearethosehardmathsproblems.
Itisunlikelythatyou’llbegivenanequationto
solve,ratheryou’llbeaskedasimplyworded
questionwhichrequiresconceptualpreparationto
answer.
Furthermore,itmayintertwinewithprobability
theory,evenifitseemsitdoesn’t.
MATHEMATICS

Someexamplesare:
•Consideranextensionofrock,paper,scissors
wherethereareNoptionsinsteadof3options.
ForwhatvaluesofNisitpossibletoconstructa
fairgame,whereby‘fair’wemeanthatforany
movethataplayerplaysthereareanequal
numberofmovesthatbeatitorlosetoit?
•Inacountryinwhichpeopleonlywantboys,
everyfamilycontinuestohavechildrenuntil
theyhaveaboy.Iftheyhaveagirl,theyhave
anotherchild.Iftheyhaveaboy,theystop.
Whatistheproportionofboystogirlsinthe
country?
MATHEMATICS

STATISTICS

Didyouknow,dataScientistswereoncecalled
statisticians?Thetwoprofessionsaren’toneand
thesame,butmanydatascientistshavefinisheda
statisticsdegree.
Andthat’snowonder!Statisticsisoneofthe
‘foundingfathers’ofdatascience.
Logically,youwillbetestedonyourabilityto
reasonstatistically.Eveniftheoreticalknowledge
isn’tyourstrongestsuit,youneedtouseprecise
technicallanguage.
STATISTICS

Considerthefollowingquestion:
Whatisthedifferencebetweenfalsepositiveand
falsenegative?
Itseemsthatyouneedtoprovidesometextbook
definitions…Gotyou!Nobodywantstohear
generictheory;it’sboringandyouwillblendin
withthecrowd.
Employerswillwantyoutoidentifysituations
whereyoucanimplementthetheory.
STATISTICS

Whilestilltalkingstatistics,whataresomeother
questionsthatmaypopup?
•Whatisthenullhypothesisandhowdowe
stateit?
•Howwouldyouexplainalinearregressiontoa
businessexecutive?
•Tellmewhatheteroskedasticityisandhowto
solveit.
STATISTICS

•What’stheCentralLimitTheoremandwhatare
itspracticalimplications?
•Howdoyoufindthecorrelationbetweena
categoricalvariableandacontinuousvariable?
•Explainp-value.Presentitasiftalkingtoa
client.
•Whatdoyouunderstandbystatisticalpower
andhowdoyoucalculateit?
•Pleaseexplainthedifferencesbetween
overfittingandunderfitting.
•Explainwhatcross-validationis.Howandwhyis
itused?
STATISTICS

Did you think those last two are machine learning
questions? Well spotted, now we see that ML
overlaps with statistical concepts!
•Could you give examples of data that does not
have a Gaussian distribution, nor log-normal?
•Explain bootstrapping as if you’re talking to a
non-technical person.
•State some biases that you are likely to
encounter when cleaning a database.
STATISTICS

CODING

Everydatascientistneedsacertainamountof
programmingknowledge.Youdon’thavetobe
apro,butemployerswillwanttoseethatyou
haveadecentgriponitandhavethepotential
forrapidimprovement.
Python,R,andSQLarethebread-and-butter
programminglanguagesindatascience.
Questionsaboutthesethreestaplesshouldnot
comeassurprise.
CODING

R
•Howaremissingvaluesandimpossiblevalues
representedinR?
•Whatisthedifferencebetweenlapplyand
sapply?
•HowdoyoumergetwodataframesinR?
•WhatisthecommandusedtostoreRobjectsin
afile?
•Howcanyousplitacontinuousvariableinto
differentgroups/ranksinR?
•Pleaseexplainthreekeydifferencesbetween
PythonandR.
CODING

Python
•WhichPythonlibrarywouldyouprefertouse
forDatawrangling?
•Howcanyoubuildasimplelogisticregression
inPython?
•What’stheshortestwayopenatextfilein
Python?
•HaveyoudonewebscrapinginPython?How
canyoudothat?
•Pleaseexplainwhatisa‘pass’inPython.
•Pleaseexplainhowonecanperformpattern
matchinginPython.
•Whattoolwouldyouusetofindbugs?
•What’syourpreferredlibraryforplottingin
Python:SeabornorMatplotlib?
CODING

SQL
•YouhaveatablecalledwithCust_ID,
Order_Date,Order_ID,Tran_Amt.Howwould
youselectthetop100customerswiththe
highestspendoverayear-longperiod?
•DescribethedifferentpartsofanSQLquery.
•WhatisthedifferencebetweenUNIONand
UNIONALL?
•WritedownaSQLscripttoreturndatafrom
twotables.
•Tellmethedifferencebetweenaprimarykey
andauniquekey.
•WhatisthedifferencebetweenSQL,MySQL
andSQLServer?
CODING

MACHINE
LEARNING

MACHINE LEARNING
Afamiliaritywithmachinelearning
methodologiesisessentialforevery
aspiringdatascientist.
Youshouldbepreparedtoexplain
keyconceptsinanutshell.
It’squitepossiblethattheinterviewer
willoutlineapredictionproblemand
askyoutocomeupwithalgorithms.
Withthealgorithms,expecttotouch
uponcommonlyobservedproblems
andtheirfixes.

Checkoutthefollowingmachine
learningquestionswe’vepickedfor
you:
•Whatisthedifferencebetween
supervisedandunsupervised
machinelearning?
•Howwouldyoudealwithan
imbalanceddataset?
•Howdoyouensureyouarenot
overfittingwithamodel?
•Whatapproacheswouldyouuse
toevaluatethepredictionaccuracy
ofalogisticsregressionmodel?
•Howdoyoudealwithsparse
data?
•CouldyouexplaintheBias-
Variancetrade-off?
MACHINE LEARNING

Additionally,youmaystumbleupon
waytoospecificorwaytoovague
questionssuchas:
•Explainthedifferencebetween
GaussianMixtureModelandK-
Means.
•Tellmeaboutamachinelearning
projectyouadmire.
MACHINE LEARNING

PRACTICAL EXPERIENCE
QUESTIONS

Technicalquestionsareimportant,andadata
scientistneedstoknowtheanswersandhowtoput
themintopractice.
Therearecountlessdatasciencequestionsandan
interviewerisnotgoingtowastetimeaskingdozens
ofquestionstogaugewhetheryouarethe
candidateforthem.Instead,whynotaskyoutogive
yourexperience.
PRACTICAL
EXPERIENCE Qs

PRACTICAL
EXPERIENCE Qs
Thesearepracticalexperiencequestions,designed
toshedlightonyourpaceofwork,experiences,and
habits.
Toavoidhavingtosiftthroughyourbackcatalogue
ofexperiencesonthespot,haveinmindafew
experiencesthatareversatile–Onesthatexemplify
differentskillsbasedonthequestion.

PRACTICAL
EXPERIENCE Qs
Let’sgiveyoutasteofthose:
•Summarizeyourexperience.
•Tellmeaboutyourfirstdatasciencepetproject.
•Howdoyoukeepupwiththenewsabout
politics,economics,andbusiness?Whatabout
datascience?
•So,Pythonisyourpreferredprogramming
language.WhatexperiencedoyouhavewithR?
Tellmewhatyouhavedonewiththat.

PRACTICAL
EXPERIENCE Qs
Ofcourse,youcangetitvice-versa:
•So,Risyourpreferredprogramming
language.Whatexperiencedoyouhave
withPython?Tellmewhatyouhavedone
withthat.
•DoyouhaveexperienceinTableau?
•WhatkindofRDBMSsoftwaredoyouhave
experiencewith?

BEHAVIORAL
QUESTIONS

Likeanyotherjobinterview,employersare
interestedinhowyouhandleworkplacesituations,
howyouworkinateamandwhetheryouarea
goodfitforthecompany.
Behaviouralquestionscanbeaskedindirectly,for
example,theinterviewermayposebroad
questionsaboutyourmotivationorthetasksyou
enjoy.
Certainly,thereisnotarightanswerhere.The
intentistojudgeyourpastresponsesastheycan
accuratelypredictfuturebehavior.
Let’sseeanexample:Describeasituationwhenyou
facedaconflictwhileworkingonateamproject.
BEHAVIORAL Qs

Insteadofaskinghypotheticalquestions(“Howwill
youdealwith…”),theinterviewerishopingtoelicit
amoremeaningfulresponsebypushingyouto
chataboutareal-lifepastevent.Theinterviewer
willbelookingforfourthingsinyourstory:
Situation:Whatwasthecontext?(devotearound10%of
theanswertime)
Task:Whatneededtobedone?(devotearound10%of
theanswertime)
Action:Whatdidyoudo?(devotearound70%ofthe
answertime)
Results:Whatweretheaccomplishments?(devote
around10%oftheanswertime)
AlsoknownastheSTARtechnique,thesestepswillhelpyoupresentyour
answersinaclearandsuccinctfashion.
BEHAVIORAL Qs

Dyingforexamples?Hereyougo:
•Pleasedescribeadatascienceprojectyou
workedon(Yes!Itoverlapswiththe
‘practicalexperiencecategory!)
•Tellmeaboutasituationwhenyouhadto
balancecompetingpriorities.
•Describeatimewhenyoumanagedto
persuadesomeonetoseethingsyourway.
•Describeatimewhenyouwereboredat
work.Howdidyoumotivateyourself?
•whenyoufailedtomeetadeadline.
•Ourteamisbrandnewandisunder-
financed.Wehavenostandardprocedures
ortraining,andeverythingisad-hoc.How
wouldyougoaboutthissituation?
BEHAVIORAL Qs

CASE STUDY
QUESTIONS

Thepurposeofscenarios(casestudyquestions)is
totestyourexperienceinvariousdatascience
fields.
Casestudyquestionswilllikelylookforskills
outsideofthetechnicaltoolkit.
Forinstance,theymaybelookingforlogical
reasoningorbusinessunderstanding.It’s
importantforyoutodemonstratestructured
thinking,reasoning,andproblem-solvingskills.
Afterall,youcan’tbeagooddatascientistifyou
cannotidentifytheunderlyingproblems.
CASE STUDY Qs

Let’sseehowthisworks:
•Thesalesdepartmenthasincreasedtheselling
priceofallitemsby5%.Thereare10items,all
withdifferentpricetags.Beforetheprice
increase,grossrevenuewas$500,000withan
averagesellingpriceof$1.Aftertheprice
increase,grossrevenuewas$505,000,withan
averagesellingpriceof$0.95.Whyhasn’tthe
priceincreasehadthedesiredimpactof
increasingrevenueandaveragesellingprice?
CASE STUDY Qs

Youcanbealsogivenmarketsizingquestions,
calledguestimatesbysome,atermthatsounds
likeyoujustneedtotakeastabinthedark,which
isjustnotthecase.
Whilereachingaconclusiondoesrequireadegree
ofguessworkandestimation,theprocessofhow
youusethemisdifficultandrequiresrigidlogic.
CASE STUDY Qs

Thereisnotasinglecorrectanswertoquestions
liketheseandchancesarethattheinterviewer
doesn’tknowtheexactanswer,either.Hereisan
example:
•HowmanySUV’sintheparkinglotdownstairs?
Howmanyping-pongballscanfitintothis
room?
CASE STUDY Qs

An interview is a
dialogue,
not a written test!
Excellent,nowconsiderourtypologyasthestarting
pointinyourinterviewprep.
However,wehaveonlyscratchedthesurfacewhenit
comestoexamplesofdatascienceinterviewquestions
youmayencounter.Theindustryisboomingandas
such,companiesareconstantlyadaptingtheirinterview
sessions(whatmaybeacommonquestiontodaymay
beonehardlyaskedin2years).
Datascienceinterviewquestionsvaryintheir
peculiarities,butthetypesofquestionsremainthe
same,sohavingabaseknowledgeofthesetypeswitha
goodamountofpreparationwillallowyoutologically
tackleanyquestiontheinterviewerhasuphersleeve.

ABOUT THE AUTHORS
365DataScienceisanonline
educationalcareerwebsitethat
offerstheincredibleopportunitytofindyourwayintothedatascience
worldnomatteryourpreviousknowledgeandexperience.Wehave
comprehensiveprogramsthatsuittheneedsofaspiringBIanalysts,
Dataanalysts,andDatascientists.
What we do
We, the authors, are committed
educators who believe that curio-
sityshould not be hindered by inability to access good learning
resources. This is why we focus all our efforts on creating high-quality
educational content which anyone can access online.
Ourcoursescoverallthenecessarytopicstobuildupdatascienceskills
fromthegroundup,includingMathematicsandStatistics,toPython,R,
SQL,datavisualization,andMachineandDeeplearning.
Who we are

THECOMPREHENSIVEDATASCIENCECURRICULUMTO
GROW YOUR DATA SCIENCE SKILLSET
ABOUT OUR TRAINING
The365DataScienceProgramiscomprehensivesetofcourses,that
worktogethertohelpanystudentlearneverythingtheyneedto
becomeanexpertdatascientistinmonths.Thetrainingincludesallof
themostsought-afterskills,including:
•ThefundamentalsofMathematics
•Probability
•IntrotoData&DataScience
•Tableau
•SQL
•R
•Python
•MachineLearning
Theprogramconsistsof45hoursofon-demandvideo,splitinto12
courses,withreal-lifebusinessexamples,andover300exercises.

Good luck!