Setting up Sumo Logic - June 2017

Sumo_Logic 717 views 29 slides Jun 27, 2017
Slide 1
Slide 1 of 29
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29

About This Presentation

Webinar: https://www.sumologic.com/online-training/#SettingUpSumo
Designed for Administrators, this course shows you how to set up your data collection according to your organization’s data sources. Best practices around deployment options ensure you choose a deployment that scales as your organiz...


Slide Content

Setting up Sumo Logic Data Collection and System Optimization Welcome! Note you are currently muted. We will get started shortly. Mario Sánchez June 2017

At the completion of this webinar, you will be able to… Deploy a data c ollection strategy that best fits your environment Implement best p ractices around data collection Develop a robust naming convention for your metadata Learn to utilize optimization tools to enhance search performance

What is Sumo Logic?

Continuous Intelligence DEVOPS IT INFRASTRUCTURE AND OPERATIONS COMPLIANCE AND SECURITY DEVOPS Streamline continuous delivery Monitor KPI’s and Metrics Accelerate Troubleshooting IT INFRASTRUCTURE AND OPERATIONS Monitor all workloads Troubleshoot and increase uptime Simplify, Modernize, and save costs COMPLIANCE AND SECURITY Automate and demonstrate compliance Audit all systems Think beyond rules Sumo Logic Cloud Analytics Service

Unified Logs and Metrics – Troubleshooting Demo ALERT notifies of a critical event METRICS to identify what’s going on LOGS to identify why it’s happening

High-Level Data Flow

Sumo Logic Data Flow Data Collection Search & Analyze Visualize & Monitor Alerts Dashboards Collectors Sources Operators Detect 1 2 3

Data Collection Strategy

Enterprise Logs are Everywhere Custom App Code Server / OS Virtual Databases Network Open Source Middleware Content Delivery IaaS , PaaS SaaS Security

Designing Your Deployment Sumo Logic Data Collection is infinitely flexible . Design a Sumo Logic deployment that's right for your organization . Installed versus Hosted Collectors.

Collector and Deployment Options Collector Cloud Data Collection Centralized Data Collection Local Data Collection Collector Collector Collector Collector Hosted Collectors Installed Collectors Best Practices on Designing Your Deployment

Collector Considerations Consider having an Installed Collector on a dedicated machine if: You are running a very high-bandwidth network with high logging levels. You want a central collection point for many Sources. Consider having more than one Installed Collector if: You expect the combined number of files coming into one Collector to exceed 500. Your hardware has memory or CPU limitations. You expect combined logging traffic for one Collector to be higher than 15,000 events per second. Your network clusters or regions are geographically separated. You prefer to install many Collectors, for example, one per machine to collect local files. For system requirement details, see  Installed Collector Requirements .

Local Data Collection The Sumo Logic Collector is installed on all target Hosts and, where possible, sends log data produced on those target Hosts directly to Sumo Logic Backend via https connection. Source Types Local Files Operating Systems, Middleware, Custom Apps , etc . Windows Events Local Windows Events Docker Logs and Stats Syslog (dedicated Collector) Network Devices, Snare, etc Script (dedicated Collector) Cloud API’s, Database Content, binary data Typical Scenarios Customers with large amounts of (similar ) servers, using orchestration/automation, mostly OS and application logs On Premise Datacenters Cloud Instances Benefits/Drawbacks No Hardware Requirement Automation (Chef/Puppet/Scripting) Outbound Internet Access Required Resource Usage on Target

Source Types Syslog Operating Systems, Middleware, Custom Applications, etc Windows Events Remote Windows Events Script Cloud API’s, Database Content, binary data Typical Scenarios Customers with mostly Windows Environments or existing logging infrastructure (syslog/ logstash ) On Premise Datacenters Benefits/Drawbacks No Outbound Internet Access Leverage existing logging Infrastructure Scale Dedicated Hardware Complexity (Failover, syslog rules) Centralized Data Collection The Sumo Logic Collector is installed on a set of dedicated machines, these collect log data from the target Hosts via various remote mechanisms and forward the data to the Sumo Logic Backend. This can be accomplished by either using Sumo Logic syslog source type or by running Syslog Servers (syslog-ng, rsyslog ), write to file, and collect from there.

Source Types S3 Bucket Any data written to S3 buckets (AWS Audit or other) HTTPS Lambda Scripts , Akamai, One Login, Log Appender Libraries, etc . Google / O365 Google API and O365 API Typical Scenarios Customers using Cloud Infrastructure , while it's possible to rely on Cloud Data Collection entirely, this is not typical. These source types are normally just part of the overall collection strategies Benefits/Drawbacks No Software Installation S3 Latency issues Https Post Caching Need Cloud Data Collection Most Data is generated in the Cloud and by Cloud Services and is collected via Sumo Logics Cloud Integrations.

Metadata Design

What is Metadata? Tag Description _collector Name of the collector (defaults to hostname) _source Name of the source this data came through _ sourceHost Hostname of the server (defaults to hostname) _ sourceName Name and Path of the log file _ sourceCategory Can be freely configured. Main metadata tag Metadata tags are associated with each log message that is collected. Values are set through collector and source configuration.

Source Category Best Practices Recommended nomenclature for Source Categories Component1/Component2/Component3 … From least descriptive to most descriptive * Note : Not all types of logs need to have the same amount of levels. Best Practices: Good Source Category, Bad Source Category Prod/MyApp1/Apache/Access Prod/MyApp1/Apache/Error Prod/MyApp1/ CloudTrail Dev/MyApp1/Apache/Access Dev/MyApp1/Apache/Error Dev/MyApp1/ CloudTrail Prod/MyApp2/Nginx/Access Prod/MyApp2/Tomcat/Access Prod/MyApp2/Tomcat/Catalina/Out Prod/MyApp2/MySQL/ SlowQueries Dev/MyApp2/Nginx/Access Dev/MyApp2/Tomcat/Access Dev/MyApp2/Tomcat/Catalina/Out Dev/MyApp2/MySQL/ SlowQueries

Metadata: Source Category Best Practices and Benefits Simple Search S coping _ sourceCategory =Prod/MyApp1/Apache* (All Apache Logs for Prod) _ sourceCategory =*/MyApp1/Apache* (All Apache Logs for all environments) Simple, Intuitive and Self-maintaining Partitions/Indexes _ sourceCategory =Prod/MyApp1* _ sourceCategory =Prod/MyApp2* Note: First or first and second component are used for partitioning Simple and Self-maintaining RBAC R oles _ sourceCategory =Prod/MyApp1*

Metadata: Source Category Best Practices Common components (and any combination of ): Environment (Prod/UAT/DEV) Application Name Geographic Information (East vs West datacenter, office location, etc.) AWS Region Business Unit Highest level components should group the data how it is most often searched together: Prod/Web/Apache/Access Dev/Web/Apache/Access Prod/DB/MySQL/Error Dev/DB/MySQL/Error Web/Apache/Access/Prod Web/Apache/Access/Dev DB/MySQL/Error/Prod DB/MySQL/Error/Dev

Ingesting Metrics Host AWS Graphite-Compatible AWS ECS Webinar: Setting up Host Metrics Webinar: Setting up AWS Metrics Webinar: Setting up Graphite Metrics

Sending Metrics to Sumo Logic Collector Custom Code StatsD Server Collector Server/Device/Container OS/Container Metrics Library StatsD CollectD Host Metrics Graphite Graphite 3 2 1

Optimization Tools

Partitions Indexes for subsets of your data. Segregate your data into smaller, logical chunks, that are mostly searched in isolation of other Partitions. Best Practices No overlap < 20 Partitions Ideally between 1% and 30% of total volume Group data that is searched together most often About Partitions Examples: _ sourceCategory =Prod/MyApp1* _ sourceCategory =Prod/MyApp2* o r _ sourceCategory =Prod/* _ sourceCategory =Dev/*

Field Extraction Rules Apply parse logic for a dataset at time of ingest, as opposed to at search time. Benefits Better Performance Standardized field names Simplified Searches Best Practices Build simple, specific Rules Test Parse and other operations thoroughly ( use nodrop and isEmpty for testing) Limitations 50 rules/200 fields (Will be removed soon) Not all operators supported

Scheduled Views Copies of subsets of data, similar to a relation DB materialized view. Use Cases Pre-aggregated data (e.g. for long-term trends) Find the needle in the haystack … . Best Practices We recommend selectivity of > 1:10000 How They Work View is updated by service ~once a minute Allows for backfilling Search view using _view=[ viewname ] Data does count against ingest volume

Review: Search Optimization Tools What I want to do is Partition Scheduled View Field Extraction Run queries against a certain set of data Choose if the amount of data is between 1-30% Choose if the amount of data you’d like to segregate is 1% or less Choose if you want to pre-extract fields that you are searching against frequently Extract fields from logs and make available to all users ✔ Use data to identify long-term trends ✔ Segregate data by Metadata ✔ Pre-computed or aggregate data ready to query ✔ Use RBAC to deny or grant access to the data ✔ ✔

In Summary, you can … I ngest any type of logs (structured and non-structured) Select a deployment option that best fits your sources Develop a robust naming convention for your metadata Take advantage of Optimization Tools Call to Action: Set up deployment option or (hybrid option) that best fits your environment Ensure you have a robust _ SourceCategory naming convention At the very least, set up Field Extraction Rules for your popular data sources

Questions?