Webinar: https://www.sumologic.com/online-training/#SettingUpSumo
Designed for Administrators, this course shows you how to set up your data collection according to your organization’s data sources. Best practices around deployment options ensure you choose a deployment that scales as your organiz...
Webinar: https://www.sumologic.com/online-training/#SettingUpSumo
Designed for Administrators, this course shows you how to set up your data collection according to your organization’s data sources. Best practices around deployment options ensure you choose a deployment that scales as your organization grows. Because metadata is so important to a healthy environment, learn how to design and set up a naming convention that works best for your teams. Use Chef, Puppet or the likes? Learn how to automate your deployment. Test your deployment with simple searches, and learn to take advantage of optimization tools that can help you stay on top of your deployment.
Size: 5.51 MB
Language: en
Added: Jun 27, 2017
Slides: 29 pages
Slide Content
Setting up Sumo Logic Data Collection and System Optimization Welcome! Note you are currently muted. We will get started shortly. Mario Sánchez June 2017
At the completion of this webinar, you will be able to… Deploy a data c ollection strategy that best fits your environment Implement best p ractices around data collection Develop a robust naming convention for your metadata Learn to utilize optimization tools to enhance search performance
What is Sumo Logic?
Continuous Intelligence DEVOPS IT INFRASTRUCTURE AND OPERATIONS COMPLIANCE AND SECURITY DEVOPS Streamline continuous delivery Monitor KPI’s and Metrics Accelerate Troubleshooting IT INFRASTRUCTURE AND OPERATIONS Monitor all workloads Troubleshoot and increase uptime Simplify, Modernize, and save costs COMPLIANCE AND SECURITY Automate and demonstrate compliance Audit all systems Think beyond rules Sumo Logic Cloud Analytics Service
Unified Logs and Metrics – Troubleshooting Demo ALERT notifies of a critical event METRICS to identify what’s going on LOGS to identify why it’s happening
Enterprise Logs are Everywhere Custom App Code Server / OS Virtual Databases Network Open Source Middleware Content Delivery IaaS , PaaS SaaS Security
Designing Your Deployment Sumo Logic Data Collection is infinitely flexible . Design a Sumo Logic deployment that's right for your organization . Installed versus Hosted Collectors.
Collector and Deployment Options Collector Cloud Data Collection Centralized Data Collection Local Data Collection Collector Collector Collector Collector Hosted Collectors Installed Collectors Best Practices on Designing Your Deployment
Collector Considerations Consider having an Installed Collector on a dedicated machine if: You are running a very high-bandwidth network with high logging levels. You want a central collection point for many Sources. Consider having more than one Installed Collector if: You expect the combined number of files coming into one Collector to exceed 500. Your hardware has memory or CPU limitations. You expect combined logging traffic for one Collector to be higher than 15,000 events per second. Your network clusters or regions are geographically separated. You prefer to install many Collectors, for example, one per machine to collect local files. For system requirement details, see Installed Collector Requirements .
Local Data Collection The Sumo Logic Collector is installed on all target Hosts and, where possible, sends log data produced on those target Hosts directly to Sumo Logic Backend via https connection. Source Types Local Files Operating Systems, Middleware, Custom Apps , etc . Windows Events Local Windows Events Docker Logs and Stats Syslog (dedicated Collector) Network Devices, Snare, etc Script (dedicated Collector) Cloud API’s, Database Content, binary data Typical Scenarios Customers with large amounts of (similar ) servers, using orchestration/automation, mostly OS and application logs On Premise Datacenters Cloud Instances Benefits/Drawbacks No Hardware Requirement Automation (Chef/Puppet/Scripting) Outbound Internet Access Required Resource Usage on Target
Source Types Syslog Operating Systems, Middleware, Custom Applications, etc Windows Events Remote Windows Events Script Cloud API’s, Database Content, binary data Typical Scenarios Customers with mostly Windows Environments or existing logging infrastructure (syslog/ logstash ) On Premise Datacenters Benefits/Drawbacks No Outbound Internet Access Leverage existing logging Infrastructure Scale Dedicated Hardware Complexity (Failover, syslog rules) Centralized Data Collection The Sumo Logic Collector is installed on a set of dedicated machines, these collect log data from the target Hosts via various remote mechanisms and forward the data to the Sumo Logic Backend. This can be accomplished by either using Sumo Logic syslog source type or by running Syslog Servers (syslog-ng, rsyslog ), write to file, and collect from there.
Source Types S3 Bucket Any data written to S3 buckets (AWS Audit or other) HTTPS Lambda Scripts , Akamai, One Login, Log Appender Libraries, etc . Google / O365 Google API and O365 API Typical Scenarios Customers using Cloud Infrastructure , while it's possible to rely on Cloud Data Collection entirely, this is not typical. These source types are normally just part of the overall collection strategies Benefits/Drawbacks No Software Installation S3 Latency issues Https Post Caching Need Cloud Data Collection Most Data is generated in the Cloud and by Cloud Services and is collected via Sumo Logics Cloud Integrations.
Metadata Design
What is Metadata? Tag Description _collector Name of the collector (defaults to hostname) _source Name of the source this data came through _ sourceHost Hostname of the server (defaults to hostname) _ sourceName Name and Path of the log file _ sourceCategory Can be freely configured. Main metadata tag Metadata tags are associated with each log message that is collected. Values are set through collector and source configuration.
Source Category Best Practices Recommended nomenclature for Source Categories Component1/Component2/Component3 … From least descriptive to most descriptive * Note : Not all types of logs need to have the same amount of levels. Best Practices: Good Source Category, Bad Source Category Prod/MyApp1/Apache/Access Prod/MyApp1/Apache/Error Prod/MyApp1/ CloudTrail Dev/MyApp1/Apache/Access Dev/MyApp1/Apache/Error Dev/MyApp1/ CloudTrail Prod/MyApp2/Nginx/Access Prod/MyApp2/Tomcat/Access Prod/MyApp2/Tomcat/Catalina/Out Prod/MyApp2/MySQL/ SlowQueries Dev/MyApp2/Nginx/Access Dev/MyApp2/Tomcat/Access Dev/MyApp2/Tomcat/Catalina/Out Dev/MyApp2/MySQL/ SlowQueries
Metadata: Source Category Best Practices and Benefits Simple Search S coping _ sourceCategory =Prod/MyApp1/Apache* (All Apache Logs for Prod) _ sourceCategory =*/MyApp1/Apache* (All Apache Logs for all environments) Simple, Intuitive and Self-maintaining Partitions/Indexes _ sourceCategory =Prod/MyApp1* _ sourceCategory =Prod/MyApp2* Note: First or first and second component are used for partitioning Simple and Self-maintaining RBAC R oles _ sourceCategory =Prod/MyApp1*
Metadata: Source Category Best Practices Common components (and any combination of ): Environment (Prod/UAT/DEV) Application Name Geographic Information (East vs West datacenter, office location, etc.) AWS Region Business Unit Highest level components should group the data how it is most often searched together: Prod/Web/Apache/Access Dev/Web/Apache/Access Prod/DB/MySQL/Error Dev/DB/MySQL/Error Web/Apache/Access/Prod Web/Apache/Access/Dev DB/MySQL/Error/Prod DB/MySQL/Error/Dev
Ingesting Metrics Host AWS Graphite-Compatible AWS ECS Webinar: Setting up Host Metrics Webinar: Setting up AWS Metrics Webinar: Setting up Graphite Metrics
Partitions Indexes for subsets of your data. Segregate your data into smaller, logical chunks, that are mostly searched in isolation of other Partitions. Best Practices No overlap < 20 Partitions Ideally between 1% and 30% of total volume Group data that is searched together most often About Partitions Examples: _ sourceCategory =Prod/MyApp1* _ sourceCategory =Prod/MyApp2* o r _ sourceCategory =Prod/* _ sourceCategory =Dev/*
Field Extraction Rules Apply parse logic for a dataset at time of ingest, as opposed to at search time. Benefits Better Performance Standardized field names Simplified Searches Best Practices Build simple, specific Rules Test Parse and other operations thoroughly ( use nodrop and isEmpty for testing) Limitations 50 rules/200 fields (Will be removed soon) Not all operators supported
Scheduled Views Copies of subsets of data, similar to a relation DB materialized view. Use Cases Pre-aggregated data (e.g. for long-term trends) Find the needle in the haystack … . Best Practices We recommend selectivity of > 1:10000 How They Work View is updated by service ~once a minute Allows for backfilling Search view using _view=[ viewname ] Data does count against ingest volume
Review: Search Optimization Tools What I want to do is Partition Scheduled View Field Extraction Run queries against a certain set of data Choose if the amount of data is between 1-30% Choose if the amount of data you’d like to segregate is 1% or less Choose if you want to pre-extract fields that you are searching against frequently Extract fields from logs and make available to all users ✔ Use data to identify long-term trends ✔ Segregate data by Metadata ✔ Pre-computed or aggregate data ready to query ✔ Use RBAC to deny or grant access to the data ✔ ✔
In Summary, you can … I ngest any type of logs (structured and non-structured) Select a deployment option that best fits your sources Develop a robust naming convention for your metadata Take advantage of Optimization Tools Call to Action: Set up deployment option or (hybrid option) that best fits your environment Ensure you have a robust _ SourceCategory naming convention At the very least, set up Field Extraction Rules for your popular data sources