CONTENTS 2 INTRODUCTION DATA MINING Vs. TEXT MINING MOTIVATION FOR TEXT MINING STEPS FOR TEXT MINING KEY TERMS IN TEXT MINING MERITS OF TEXT MINING APPLICATIONS OF TEXT MINING DEMERITS OF TEXT MINING REFERENCES
INTRODUCTION 3 Text Mining is a Discovery 2 ) Text Mining is also referred as Text Data Mining (TDM) and Knowledge Discovery in Textual Database (KDT). Text Mining is used to extract relevant information or knowledge or pattern from different sources that are in unstructured or semi-structured form . 3 ) Text Mining is used to extract relevant information or knowledge or pattern from different sources that are in unstructured or semi-structured form . 4) Extract and discover knowledge hidden in text automatically 5) Aid domain experts by automatically : identifying concepts extracting facts/relations discovering implicit links generating hypotheses
DATA MINING VS. TEXT MINING 4
5 Data Mining Text Mining Process directly Linguistic processing or natural language processing (NLP) Identify causal relationship Discover heretofore unknown information Structured Data Semi-structured & Unstructured Data (Text) Structured numeric transaction data residing in rational data warehouse Applications deal with much more diverse and eclectic collections of systems and formats
MOTIVATION FOR TEXT MINING 6
7 Approximately 90% of the world’s data is held in unstructured formats (source: Oracle Corporation) Information intensive business processes demand that we transcend from simple document retrieval to “knowledge” discovery.
STEPS FOR TEXT MINING 1) Pre-Processing the Text 2) Applying Text Mining Techniques -Summarization -Classification -Clustering -Visualization -Information Extraction 3) Analyzing the Text 8
KEY TERMS IN TEXT MINING 1) Information Extraction (IE) - The science of searching for - Information in documents - Documents themselves - Metadata which describe documents - Text , sound, images or data, within database: relational stand-alone database or hypertext networked databases such as the Internet or intranets. 2) Artificial Intelligence (AI) - Artificial intelligence (AI) is a branch of computer science and engineering that deals with intelligent behavior, learning, and adaptation in machines 9
M erits of text mining 10 Database limits itself to Storage of less Information whereas Text Mining overcomes this limitation Extraction of relevant Information and Relationships from Natural Documents Extraction of Information from Unstructured or Semi- structured Documents
Applications of text mining 11
12 Analysis of Market Trends -Classification Technique -Information Extraction Technique Analysis and Screening of Junk Emails -Classification on the basis of pre-defined frequently occurring items
Demerits of text mining 13 Requires Initial Learned Information System for Initial Extraction Suitable programs are not been defined to Analyze Text from Mining Knowledge or Information Misguided interpretations or the misuse of information.