Big dataappliance hadoopworld_final

Oracle Big Data Appliance and Solutions Jean-Pierre Dijcks Hadoop World – Nov 8 th , 2012

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remain at the sole discretion of Oracle.

Case: On-line Ads and Content NoSQL DB Expert System Real-time: Determine best ad to place on page for this user Input into Lookup user profile Add user if not present Web logs HDFS Profiles NoSQL DB High scale data reductions BI and Analytics Billing Predictions on browsing Actual ads served Low Latency Batch

Agenda Big Data Technology Oracle Big Data Appliance Big Data Applications Summary Q&A

<Insert Picture Here> Big Data Technology

Deep Analytics Agile Development Massive Scalability Real Time Results High Throughput In-Place Preparation All Data Sources/Structures Low, predictable Latency High Transaction Volume Flexible Data Structures Big Data: Infrastructure Requirements

Divided Solution Spectrum Acquire Analyze Organize MapReduce Solutions DBMS (DW) DBMS (OLTP) Advanced Analytics Distributed File Systems Transaction (Key-Value) Stores ETL NoSQL Flexible Specialized Developer Centric SQL Trusted Secure Administered Dynamic Schema Data Variety Schema

Oracle Integrated Software Solution Stack Acquire Analyze Organize Oracle Database (DW) Oracle Database (OLTP) In-DB Analytics “R” Mining Text Graph Spatial Oracle BI EE Oracle NoSQL DB HDFS Hadoop Oracle Data Integrator Oracle Loader for Hadoop Dynamic Schema Data Variety Schema

Oracle Engineered Solutions Oracle Database (DW) Oracle Database (OLTP) In-DB Analytics “R” Mining Text Graph Spatial Oracle BI EE Oracle NoSQL DB HDFS Hadoop Oracle Data Integrator Oracle Loader for Hadoop Big Data Appliance Hadoop NoSQL Database Oracle Loader for hadoop Oracle Data Integrator Oracle Exadata OLTP & DW Data Mining & Oracle R Semantics Spatial Exalytics Speed of Thought Analytics Acquire Analyze Organize Dynamic Schema Data Variety Schema

Big Data Appliance Batch Usage Model Oracle Big Data Appliance Oracle Exadata InfiniBand Acquire Organize Analyze Oracle Exalytics InfiniBand

Why build a Hadoop Appliance? Time to Build? Required Expertise? Cost and Difficulty Maintaining?

18 Sun X4270 M2 Servers 48 GB memory per node = 864 GB memory 12 Intel cores per node = 216 cores 24 TB storage per node = 432 TB storage 40 Gb p/sec InfiniBand 10 Gb p/sec Ethernet Oracle Big Data Appliance Hardware

Big Data Appliance Cluster of industry standard servers for Hadoop and NoSQL Database Focus on Scalability and Availability at low cost Compute and Storage 18 High-performance low-cost servers acting as Hadoop nodes 24 TB Capacity per node 2 6-core CPUs per node Hadoop triple replication NoSQL Database triple replication 10GigE Network 8 10GigE ports Datacenter connectivity InfiniBand Network Redundant 40Gb/s switches IB connectivity to Exadata

Scale Out to Infinity Scale out by connecting racks to each other using Infiniband Expand up to eight racks without additional switches Scale beyond eight racks by adding an additional switch

Oracle Linux 5.6 Java Hotspot VM Apache Hadoop Distribution v0.20.x R Distribution Oracle NoSQL Database Enterprise Edition Oracle Data Integrator Application Adapter for Hadoop Oracle Loader for Hadoop Oracle Big Data Appliance Software

Why Open-Source Apache Hadoop? Fast evolution in critical features Built by the Hadoop experts in the community Practical instead of esoteric Focus on what is needed for large clusters Proven at very large scale In production at all the large consumers of Hadoop Extremely stable in those environments Well-understood by practitioners

Software Layout Node 1: M: Name Node, Balancer & HBase Master S: HDFS Data Node, NoSQL DB Storage Node Node 2: M: Secondary Name Node, Management, Zookeeper, MySQL Slave S: HDFS Data Node, NoSQL DB Storage Node Node 3: M: JobTracker , MySQL Master, ODI Agent, Hive Server S: HDFS Data Node, NoSQL DB Storage Node Node 4 – 18: S : HDFS Data Nodes, Task Tracker, HBase Region Server, NoSQL DB Storage Nodes Your MapReduce runs here!

Big Data Appliance Big Data for the Enterprise Optimized and Complete Everything you need to store and integrate your lower information density data Integrated with Oracle Exadata Analyze all your data Easy to Deploy Risk Free, Quick Installation and Setup Single Vendor Support Full Oracle support for the entire system and software set

<Insert Picture Here> Oracle NoSQL Database

Key-Value Store Workloads Large dynamic schema based data repositories Data capture Web applications Online retail Sensor/statistics/network capture/Mobile Devices Data services Scalable authentication Real-time communication (MMS, SMS, routing) Personalization / Localization Social Networks

Oracle NoSQL DB A distributed, scalable key-value database Simple Data Model Key-value pair with major+sub -key paradigm Read/insert/update/delete operations Scalability Dynamic data partitioning and distribution Optimized data access via intelligent driver High availability One or more replicas Disaster recovery through location of replicas Resilient to partition master failures No single point of failure Transparent load balancing Reads from master or replicas Driver is network topology & latency aware Storage Nodes Data Center A Storage Nodes Data Center B NoSQLDB Driver Application NoSQLDB Driver Application

Operation result New Partition Map RepNodeStorageTable information Resolving a Request Client Operation + Key[ M,m ] + Value + Transaction Policy

ACID Transactions Transaction Policy Write Durability Configurable per-operation, application can set defaults Write Transaction Durability consists of both Sync policy (on Master and Replica) Sync – force to disk Write No Sync – force to OS buffer No Sync – write to local log buffer, flush when convenient Replica Acknowledgement Policy All Simple Majority None Transaction Policy Read Consistency Configurable per-operation, application can set defaults Read Consistency specified as Absolute, T ime-based, Version or None Absolute  Read from the master Time-based  Read from any replica that is within <time-interval> of master or better Version  Read from any replica that is current with <transaction-token> or higher None  Read from any replica

Oracle NoSQL DB Differentiation Commercial Grade Software and Support General-purpose Reliable – Based on proven Berkeley DB JE HA Easy to install and configure Scalable throughput, bounded latency Simple Programming and Operational Model Simple Major + Sub key and Value data structure ACID transactions Configurable consistency & durability Easy Management Web-based console, API accessible Manages and Monitors: Topology; Load; Performance; Events; Alerts Completes Oracle large scale data storage offerings

Try NoSQL Database on OTN Oracle NoSQL Database: Community Edition is available as a software only distribution Enterprise Edition is available as a separately licensable product or as part of Big Data Appliance

<Insert Picture Here> Oracle Loader for Hadoop

Oracle Loader for Hadoop Features Load data into a partitioned or non-partitioned table Single level, composite or interval partitioned table Support for scalar datatypes of Oracle Database Load into Oracle Database 11g Release 2 Runs as a Hadoop job and supports standard options Pre-partitions and sorts data on Hadoop Online and offline load modes

Oracle Loader for Hadoop Shuffle /Sort Shuffle /Sort MAP MAP MAP MAP Shuffle /Sort Reduce Reduce Shuffle /Sort Shuffle /Sort Reduce Reduce Reduce Input 2 Input 1 MAP MAP MAP MAP MAP Reduce Reduce Reduce MAP MAP MAP MAP MAP MAP Reduce Reduce MAP MAP MAP MAP MAP Reduce Reduce Reduce Oracle Loader for Hadoop

Oracle Loader for Hadoop : Online Option Shuffle /Sort Shuffle /Sort Reduce Reduce Reduce MAP MAP MAP MAP MAP MAP Reduce Reduce Oracle Loader for Hadoop Connect to the database from reducer nodes, load into database partitions in parallel Read target table metadata from the database Perform partitioning, sorting, and data conversion

Oracle Loader for Hadoop : Offline Option Shuffle /Sort Shuffle /Sort Reduce Reduce Reduce MAP MAP MAP MAP MAP MAP Reduce Reduce Oracle Loader for Hadoop Read target table metadata from the database Perform partitioning, sorting, and data conversion Write from reducer nodes to Oracle Data Pump files Import into the database in parallel using external table mechanism

Oracle Loader for Hadoop Advantages Offload database server processing to Hadoop : Convert input data to final database format Compute table partition for row Sort rows by primary key within a table partition Generate binary datapump files Balance partition groups across reducers

Input and Output Formats Input Formats Delimited text Hive tables Managed and external tables Native and non-native tables Write your own input format Output Formats Online Mode Load directly from Hadoop nodes to Oracle database JDBC Parallel direct path Offline Mode Datapump format Create binary files for external tables Import data into the database from the external table with a SQL statement CSV, delimited text Load through SQL*Loader or external table mechanism

Selection Output Option for Use Case Oracle Loader for Hadoop Output Option Use Case Characteristics Online load with JDBC The simplest use case for non partitioned tables Online load with Direct Path Fast online load for partitioned tables Offline load with datapump files Fastest load method for external tables On Oracle Big Data Appliance Direct HDFS Leave data on HDFS Parallel access from database Import into database when needed

Invoking Oracle Loader for Hadoop Command line $ hadoop jar oraloader.jar oracle.hadoop.loader.OraLoader - libjars <library jar files> -D <configuration properties> $HADOOP_HOME/bin/ hadoop jar oraloader.jar oracle.hadoop.loader.oraLoader - libjars avro-1.4.1.jar, commons-math-2.2.jar -conf connection.xml -D mapreduce.inputformat.class = oracle.hadoop.loader.lib.input.DelimitedTextInputFormat -D mapreduce.outputformat.class = oracle.hadoop.loader.lib.output.JDBCOutputFormat

Sample Configuration Properties CONFIGURATION PROPERTY DESCRIPTION mapreduce.inputformat.class Specify input format class (Delimited text , Hive, or write your own) mapreduce.outputformat.class Specify output format class, ex: JDBCOutputFormat , OCIOutputFormat , DPOutputFormat oracle.hadoop.loader.oracle_home Path to ORACLE_HOME oracle.hadoop.loader.targetTable Table in the database to be loaded oracle.hadoop.loader.connection.url Database connection string oracle.hadoop.loader.sampler.enableSampling Balance load across reducer nodes oracle.hadoop.loader.sampler.enableSorting Perform secondary sort by primary key of the table … … … …

Automate Usage of Oracle Loader for Hadoop ODI has knowledge modules to Generate data transformation code to run on Hive/Hadoop Invoke Oracle Loader for Hadoop Use the drag-and-drop interface in ODI to Include invocation of Oracle Loader for Hadoop in any ODI packaged flow Oracle Data Integrator (ODI)

<Insert Picture Here> Summary

Big Data Appliance Big Data for the Enterprise Optimized and Complete Everything you need to store and integrate your lower information density data Integrated with Oracle Exadata Analyze all your data Easy to Deploy Risk Free, Quick Installation and Setup Single Vendor Support Full Oracle support for the entire system and software set

Big Data Appliance and Exadata Big Data for the Enterprise NoSQL DB  HDFS  Hadoop  RDBMS 

Questions

Big dataappliance hadoopworld_final

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Big dataappliance hadoopworld_final

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx