SREENIVASA INSTITUTE OF TECHNOLOGY AND MANAGEMENT STUDIES (Autonomous) (Approved by AICTE, New Delhi & Affiliated to JNTUA, Ananthapuramu ) Accredited by NBA Murukambattu Post, Chittoor-517127. Model Review WEB SCRAPING THE INFORMATION OF RESEARCH PAPERS FROM GOOGLE SCHOLAR MCA Department Presented by M SAI PRASANTH 21751F0031 Under the Guidance of Mrs. R. PADMAJA Asst. Professor
CONTENTS ABSTRACT INTRODUCTION EXISTING SYSTEM DEMERITS FOR EXISITNG SYSTEM PROPOSED SYSTEM MERITS FOR PROPOSED SYSTEM ARCHITECTURE DIAGRAM MODULES OF THE SYSTEM SOFTWARE REQUIRMENTS HARDWARE REQUIRMENTS TEST CASES SCREENSHOTS CONCLUSION REFERENCES
ABSTRACT Researchers need an excellent platform to access Research articles from various areas freely. Google Scholar is a widely regarded and freely accessible search engine, offering a vast collection of published literature, including articles and research papers. Browsing Google Scholar for literature reviews or research updates can lead to confusion due to the overwhelming amount of information available. To overcome this limitation and obtain information in a structured manner, the proposed system employs web scraping techniques on Google Scholar. Using tools like Beautiful Soup and Scrapy, the system extracts data from Google Scholar to provide researchers with organized and structured information for their research needs.
INTRODUCTION Google Scholar is acknowledged as one of the best search engines for accessing a wide variety of published literature, including articles and research papers, in various research areas. The abundance of information on Google Scholar can sometimes lead to confusion for researchers who need a more structured approach to access relevant papers. To address this issue, the article proposes using web scraping to retrieve information from Google Scholar's search results based on specific keywords provided by the researcher. Web scraping enables researchers to collect data from the website in a structured and meaningful manner, tailored to their research interests. The proposed approach involves using Python libraries like Requests and Beautiful Soup for web scraping, providing an automated and efficient way to gather relevant research papers and stay updated with the latest publications in their fields.
EXISITING SYSTEM Google Scholar is a widely used platform that offers access to a vast collection of academic literature, including scholarly articles, research papers, theses, and other publications from various academic disciplines. It serves as a valuable resource for researchers, students, and academics seeking reliable and relevant information for their research and academic pursuits.
DEMERITS
PROPOSED SYSTEM
MERITS
ARCHITECTURE DIAGRAM
MODULES OF SYSTEM Searching and Querying Retrieving Search Results Parsing HTML Content Handling Pagination Storing and Analysing Data
SOFTWARE REQUIRMENTS Operating System : Windows 10 S erver -side script : Python 3.9 IDE : Pycharm Libraries used : Beautiful Soap, Requests
HARD WARE REQUIRMENTS Processor : intel i3 core RAM : 8GB Hard Disk : 500 GB
Test case 1: Input: The search query "machine learning“ Process: The web scraper queries Google Scholar for the search query and extracts the first 10 results. Output: The web scraper should successfully extract the title, author(s), year of publication, journal, abstract, and URL link for each of the 5 research papers. Test case 2: Input: Enter Faculty Name and year. Process : process to fetch Faculty wise publication details Output : displays year wise a Faculty publication details TEST CASES
SCREEN SHOTS Description: The page displays to enter about details of faculty members. Fig No.B1
SCREEN SHOTS Fig No.B2 Description: Entered faculty details to fetch the data about faculty wise publications.
SCREEN SHOTS Fig No.B4 Description: Analszing the data with Scopus SCI journal of specified time duration of faculty wise publications. Fig No.B3
SCREEN SHOTS Fig No .B4 Description: Analysing the data with specified time duration of faculty wise publications.
SCREEN SHOTS Fig No.B5 Description: Analysing the data with all time duration of faculty wise publications
CONCLUSION
REFERENCES [1] Cloudflare. (2020). Retrieved from https://www.cloudflare.com/learning/bots/what-is- data-scraping/ [2] Cole, J. R. (1992). Citations, citation indicators, and research quality: An overview of basic concepts and theories. Making science: between nature and society. Harvard University Press. [3] David, C. P. C., & Geronia , M. C. M. (2019). Insights on the Scientific Publications of the Faculty of the College of Science, UP Diliman: 1998-2017. Science Diliman, 31(2). [4] Fatima, A. Y. (2020). The Use of Grey Literature and Google Scholar in Software Engineering Systematic Literature Reviews. In 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC), 2020, 1099-1100. [5] Fernández-Villamor, J. I., Blasco -García, J., Iglesias, C. A., & Garijo , M. (2011, January). A semantic scraping model for web resources-Applying linked data to web page screen scraping. In International Conference on Agents and Artificial Intelligence (Vol. 2, pp. 451-456). SciTePress .