SoT unit-4-BACKUP, ARCHIVE AND REPLICATION

dhanasekar_kongu 65 views 80 slides Jul 03, 2024
Slide 1
Slide 1 of 80
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80

About This Presentation

SoT unit-4-BACKUP, ARCHIVE AND REPLICATION


Slide Content

UNIT -IV BACKUP, ARCHIVE AND REPLICATION

UNIT-IV Introduction to Business Continuity, Backup architecture, Backup targets and methods, Data deduplication, Cloud-based and mobile device backup. Data archive, Uses of replication and its characteristics, Compute based, storage-based, and network-based replication. Data migration, Disaster Recovery as a Service ( DRaaS )

INTRODUCTION TO BUSINESS CONTINUITY Business continuity (BC) is an integrated and enterprise wide process that includes all activities (internal and external to IT) that a business must perform to mitigate the impact of planned and unplanned occurrences. There are many threats to information availability, natural disasters (e.g., flood, fire, earthquake), unplanned occurrences (e.g., cybercrime, human error, network and computer failure), and planned occurrences (e.g., upgrades, backup, restore) that result in the inaccessibility of information.

INFORMATION AVAILABILITY Information availability can be defined with the help of reliability, accessibility and timeliness. Reliability: This reflects a component’s ability to function without failure, under stated conditions, for a specified amount of time. Accessibility : The required information is accessible at the right place, to the right user. The period of time during which the system is in an accessible state is termed system uptime; when it is not accessible it is termed system downtime. (I n computing, uptime is a measure of how long a computer or service is on and  available . Downtime is the measure of how long it is not available )

Timeliness: Defines the exact time window (a particular time of the day, week, month, and/or year as specified) during which information must be accessible. For example, if online access to an application is required between 8:00 AM and 10:00 PM each day, any disruptions to data availability outside of this time slot are not considered to affect timeliness.

Causes of Information Unavailability Various planned and unplanned incidents result in data unavailability. Planned outages include installation/integration/maintenance of new hardware, software upgrades or patches, taking backups, application and data restores, facility operations (renovation and construction), and refresh/migration of the testing to the production environment. Unplanned outages include failure caused by database corruption, component failure, and human errors. Another type of incident that may cause data unavailability is natural or man-made disasters such as flood, fire, earthquake, and contamination.

Measuring Information Availability Information availability depend on the availability of the hardware and software components of a data center. Failure of these components might disrupt information availability. A failure is the termination of a component’s ability to perform a required function. An external corrective action, such as a manual reboot, a repair, or replacement of the failed component(s). Repair involves  restoring a component and perform a required function within a specified time Information availability are measured by MTBF and MTTR Mean Time Between Failure (MTBF): MTBF is a key performance indicator (KPI) that represents the average time between two consecutive failures of a system or product. Mean Time To Repair (MTTR): It is the average time required to repair a failed component.

MTTR includes the time required to do the following: detect the fault mobilize the maintenance team diagnose the fault obtain the spare parts repair test resume normal operations.

It can be expressed in terms of system uptime and downtime and measured as the amount or percentage of system uptime IA = system uptime / (system uptime + system downtime) In terms of MTBF and MTTR, IA could also be expressed as IA = MTBF / (MTBF + MTTR)

Consequences of Downtime Data unavailability, or downtime, results in loss of productivity, loss of revenue, poor financial performance, and damages to reputation. Loss of productivity reduces the output per unit of labor, equipment, and capital. Loss of revenue includes direct loss, compensatory payments, future revenue losses, billing losses, and investment losses. Poor financial performance affects revenue recognition, cash flow, discounts, payment guarantees, credit rating, and stock price. it is calculated as follows: Where: Productivity loss per hour = (total salaries and benefits of all employees per week) / (average number of working hours per week) Average revenue loss per hour = (total revenue of an organization per week) / (average number of hours per week that an organization is open for business) Average cost of downtime per hour = average productivity loss per hour + average revenue loss per hour

Common terms of BC Disaster recovery : This is the coordinated process of restoring systems, data, and the infrastructure required to support key ongoing business operations in the event of a disaster. Disaster restart : This is the process of restarting business operations with mirrored consistent copies of data and applications. Recovery-Point Objective (RPO): This is the point in time to which sys terms and data must be recovered after an outage. It defines the amount of data loss that a business can endure. For example, if the RPO is six hours, backups or replicas must be made at least once in 6 hours. For example: RPO of 24 hours: This ensures that backups are created on an offsite tape drive every midnight. RPO of 1 hour: This ships database logs to the remote site every hour. RPO of zero: This mirrors mission-critical data synchronously to a remote site.

Recovery-Time Objective (RTO): The time within which systems, applications, or functions must be recovered after an outage. It defines the amount of downtime that a business can endure and survive. For example, if the RTO is two hours, then use a disk backup because it enables a faster restore than a tape backup. Some examples of RTOs and the recovery strategies to ensure data availability are listed below RTO of 72 hours: Restore from backup tapes at a cold site. RTO of 12 hours: Restore from tapes at a hot site. RTO of 4 hours: Use a data vault to a hot site

PLANNING BC LIFECYCLE BC planning must follow a disciplined approach like any other planning process. Organizations today dedicate specialized resources to develop and maintain BC plans. The BC planning life cycle includes five stages (see Figure 11-3): Establishing objectives Analyzing Designing and developing Implementing Training, testing, assessing, and maintaining

1. Establishing objectives Determine BC requirements. Estimate the scope and budget to achieve requirements. Select a BC team by considering subject matter experts from all areas of the business, whether internal or external. Create BC policies. 2. Analyzing Collect information on data profiles, business processes, infra- structure support, dependencies, and frequency of using business infrastructure. Identify critical business needs and assign recovery priorities. Create a risk analysis for critical areas and mitigation strategies. Conduct a Business Impact Analysis (BIA). Create a cost and benefit analysis based on the consequences of data unavailability. Evaluate options.

3. Designing and developing Define the team structure and assign individual roles and responsibilities. For example, different teams are formed for activities such as emergency response, damage assessment, and infrastructure and application recovery. Design data protection strategies and develop infrastructure. Develop contingency scenarios. Develop emergency response procedures. Detail recovery and restart procedures. 4. Implementing Implement risk management and mitigation procedures that include backup, replication, and management of resources. Prepare the disaster recovery sites that can be utilized if a disaster affects the primary data center. Implement redundancy for every resource in a data center to avoid single points of failure.

5. Training, testing, assessing, and maintaining Train the employees who are responsible for backup and replication of business critical data on a regular basis or whenever there is a modification in the BC plan. Train employees on emergency response procedures when disasters are declared. Train the recovery team on recovery procedures based on contingency scenarios. Perform damage assessment processes and review recovery plans. Test the BC plan regularly to evaluate its performance and identify its limitations. Assess the performance reports and identify limitations. Update the BC plans and recovery/restart procedures to reflect regular changes within the data center.

BACKUP PROCESS (BACKUP ARCHITECTURE)

The common and widely used Backup Architecture is based on the Server-Client model. Any backup architecture is composed of the following four components. Backup Servers Backup Clients Media Servers Backup Destinations/Targets The backup server duties are Manages the backup operations and Maintains the backup database, which contains information about the backup configuration and backup metadata. The backup configuration contains information about when to run backups, which client data to be backed up. The backup metadata contains information about the backed up data.

The backup server manages the backup operations and maintains the backup database, which contains information about the backup configuration and backup metadata The role of a  backup client  is to gather the data that is to be backed up and send it to the backup server. The backup client can be installed on application servers, mobile clients, and desktops. It also sends the tracking information to the backup server. Media Servers connect to the backup destinations and make it available to backup clients so that they can send data to the backup target. The media servers controls one or more backup devices. Backup devices may be attached directly or through a network to the Media Servers. The Media Servers sends the tracking information about the data written to the backup device to the backup server.

A wide range of  backup destinations/targets  are currently available such as Tape, Disk, and Virtual tape library.  Traditional backup solutions primarily used tape as a backup destination. Modern backup to use disk based targets which are shared over SAN or LAN. Disk arrays can also be used as virtual tape libraries to combine the benefits of Disk and Tape.  Now, many organizations can also use back up their data to the cloud storage (backup as a service) that enables an organization to reduce its backup management overhead.

BACKUP TYPES & METHODS T ypes of Backups Full backup : The most basic and comprehensive backup method, where all data is sent to another location. Incremental backup : Backs up all files that have changed since the last backup occurred. Differential backup : Backs up only copies of all files that have changed since the last full backup. METHODS OF BACKUP H ot backup, the application is running, with users accessing their data during the backup process. C old backup, the application is not active during the backup process.

Challenging in Data Backup The backup of online( hot backup ) production data becomes more challenging because data is actively being used and changed. An open file is locked by the operating system and is not copied during the backup process until the user closes it. The backup application can back up open files by retrying the operation on files that were opened earlier in the backup process. The maximum number of retries can be configured depending on the backup application. However, this method is not considered robust because in some environments certain files are always open. The backup application provides open file agents . These agents interact directly with the operating system and enable the creation of consistent copies of open files. In some environments, the use of open file agents is not enough.

For example , a database is composed of many files of varying sizes, occupying several file systems. To ensure a consistent database backup, all files need to be backed up in the same state. Consistent backups of databases can not possible in hot backup . Consistent backups of databases can also be done by using a cold backup. This requires the database to remain inactive during the backup. disadvantage  hot backup is that the agents usually affect overall application performance. Disadvantage  cold backup is that the database is inaccessible to users during the backup process. A point-in-time( PiT ) copy is  a copy of original data as it appeared at a point in time . In a disaster recovery environment, bare-metal recovery (BMR) refers to a backup in which all metadata, system information, and application configurations are appropriately backed up for a full system recovery

DATA DEDUPLICATION DEFINITION: Data deduplication is a process that eliminates redundant copies of data and reduces storage overhead. Data deduplication techniques ensure that only one unique data is retained on storage media, such as disk, flash or tape. Redundant data blocks are replaced . ADVANDAGE : Reduce the amount of space and the cost that are associated with storing large amounts of data In that way, data deduplication done in  incremental backup , which copies only the data that has changed since the previous backup. IBM has a solution with the technology that best solves the problem: IBM ProtecTIER ® Gateway and Appliance IBM System Storage N series Deduplication IBM Tivoli Storage Manager

Compressed, encrypted, or otherwise scrambled workloads typically do not benefit from data deduplication. Good candidates for data deduplication are text files, log files, uncompressed and non-encrypted database files, email files (PST, DBX, and IBM Domino®), and Snapshots (Filer Snaps, BCVs, and VMware images).

Target vs. source deduplication Data deduplication can occur at the source or target level. Source-based dedupe removes redundant blocks before transmitting data to a backup target at the client or server level. There is no additional hardware required. Deduplicating at the source reduces bandwidth and storage use. In target-based dedupe, backups are transmitted across a network to disk-based hardware in a remote location. This deduplication increases costs, it provides a performance advantage compared to source dedupe, particularly for  petabyte -scale data sets.

Techniques to deduplicate data Two main methods to deduplicate redundant data:  Inline deduplication post-processing deduplication . Inline deduplication   Inline processing is a widely used method of implementing deduplication where in data reduction happens before the incoming data gets written to the storage media. It cause bottlenecks. Note : Inline deduplication is also known as source deduplication in the back up context.

Post-processing dedupe is an  asynchronous  backup process that removes redundant data after it is written to storage. The post-processing approach gives users the flexibility to dedupe specific workloads and quickly recover the most recent backup without hydration.

Types of data deduplication and IBM HyperFactor The following three methods are used frequently for data deduplication: Hash-based data deduplication uses a hashing algorithm to identify chunks of data. Secure Hash Algorithm 1 (SHA-1) Message-Digest Algorithm 5 (MDA-5). This methods are aware of the structure of common patterns of data that is used by applications. content-aware data deduplication This type of deduplication is generally called a Byte level deduplication, because the deuplication for the information happens in the deepest level – that is BYTES method . When a file match is identified, a bit-by-bit comparison is performed to determine whether data changed and the changed data is saved. .

IBM HyperFactor ® IBM HyperFactor ® is a patented technology that is used in the IBM System Storage ProtecTIER Enterprise Edition and higher software. HyperFactor to providing a more efficient process for data deduplication. HyperFactor data deduplication uses a 4 GB Memory to track similarities for up to 1 petabyte (PB)( one Petabyte is equal to  1,000 Terabytes ) of physical disk in a single repository.

CLOUD BASED AND MOBILE DEVICE BACKUP CLOUD BACKUP Cloud backup, also known as online backup or remote backup, is a strategy for sending a Copy of a physical or virtual to cloud based backup for preservation data (In case of equipment failure or human malfeasance) The backup server and data storage systems are usually hosted by a third-party cloud or SaaS provider. They charges the backup customer based on storage space or capacity used, data transmission bandwidth, number of users, number of servers number of times data is retrieved. There are a variety of approaches to cloud backup, with available services that can easily fit into an organization's existing data protection process. Varieties of cloud backup include the following:

Backing up directly to the public cloud. Backing up to a service provider Choosing a cloud-to-cloud (C2C) backup How data is restored if the customer has contracted for daily backups, the application collects, compresses, encrypts and transfers data to the cloud service provider's servers every 24hours. To reduce the amount of bandwidth consumed and the time it takes to transfer files. The service provider might only provide incremental backups after the initial full backup. Whether a customer uses its own backup application or the software the cloud backup service provides, the organization uses that same application to restore backed up data.

DATA ARCHIVE Data archiving is  the practice of identifying data that is no longer active and moving it out of production systems into long-term storage systems . Archival data is stored so that at any time it can be brought back into service.

Types of Archives It can be implemented as online, nearline, or offline based on the means of access: Online archive : The storage device is directly connected to the host to make the data immediately available. This is best suited for active archives. Nearline archive : The storage device is connected to the host and information is local, but the device must be mounted or loaded to access the information. Offline archive : The storage device is not directly connected, mounted, or loaded. Manual intervention is required to provide this service before information can be accessed.

PROTECT THE DATA ARCHIVE An archive is often stored on a write once read many (WORM) device, such as a CD-ROM. These devices protect the original file from being overwritten. Some tape devices also provide this functionality by implementing file locking capabilities in the hardware or software. Disadvantages Although these devices are inexpensive(tapes), they involve operational, management, and maintenance overhead. Archives implemented using tape devices and optical disks involve many hidden costs. The traditional archival process using optical disks and tapes is not optimized to recognize the content, so the same content could be archived several times. Government agencies and industry regulators are establishing new laws and regulations that affect all businesses activities

REPLICATION Replication is the process of creating an exact copy of data. Creating one or more replicas of the production data is one of the ways to provide Business Continuity (BC). Data replication , where the same data is stored on multiple storage devices Benefits of Data Replication Data replication can be a cost-demanding process/operation in terms of computing power and storage requirements, but it provides an immense set of benefits that overshadow the cost aspect. Some of the benefits of data replication are as follows : High Data Availability: Data replication mechanisms ensures high availability and accessibility of the data by allowing users or applications to access the data from numerous nodes or sites even during an unforeseen failure or technical glitch. It stores data across multiple locations and thus enhances the reliability of systems. Enhanced Data Retrieval: With data replication in place, users can access data from a diverse set of regions/locations. With data available across different storage locations, data replication reduces latency and allows users to access data from a nearby data replica. Enhanced Server Performance: Data replication helps reduce the load on the primary server by distributing data across numerous storage regions/locations, thereby boosting the network performance. Fault tolerance & Disaster Recovery: With the rapid growth in the number of cyberattacks, data breaches, etc., most organizations face the issue of unexpected losses.

Uses of Data Replication One common use of data replication is for Disaster recovery, Planned outages . Un Planned outages

COMPUTE BASED, STORAGE-BASED, AND NETWORK-BASED REPLICATION STORAGE-BA SED REPLICATION (REPLICATION STORAGE) Storage-based replication, is an approach to replicating data available over a network to numerous distinct storage locations/regions. It enhances the availability, accessibility, and retrieval speed of data by allowing users to access data in real-time from various storage locations when unexpected failures occur at the source storage location.  Storage system-based replication supports both local and remote replication Storage Based local Replication Techniques In storage system based local replication, the replication is performed within the storage system. In other words, the source and the target LUNs reside on the same storage system. Full Volume replication (Cloning) Pointer based Virtual replication (Snapshot)

Storage System based Remote Replication Techniques In storage system-based remote replication, the replication is performed between storage systems. Typically one of the storage systems is in source site and the other system is in remote site for DR purpose. Data can be transmitted from the source storage system to the target system over a shared or a dedicated network. Synchronous Replication Asynchronous Replication Multi-site Replication

Full volume replication Full volume replication provides to create fully copies of LUNs within a storage system. When the replication session is started, an initial synchronization is performed between the source LUN and the replica (clone). Synchronization is the process of copying data from the source LUN to the clone. During synchronization process, the replica is not available for any server access. Once the synchronization is completed, the replica is exactly same as source LUN. The replica can be detached from the source LUN and it can be made available to another server for business operations. Typically after detachment, changes made to both the source and replica can be tracked at some predefined points. This enables incremental resynchronization (source to target) incremental restore (target to source). The clone must be the same size as the source LUN.

POINTER-BASED VIRTUAL REPLICATION Pointer-based virtual replication (also referred as storage system-based snapshot) is a space optimal solution when compared to full volume replica. At the time of replication session activation, the target (snapshot) contains pointers to the location of the data on the source. The snapshot does not contain data at any time. Therefore, the snapshot is known as a virtual replica. Snapshot is immediately accessible after the replication session activation. This replication method either uses Copy on First Write ( CoFW ) Redirect on Write ( RoW ) mechanism. Data on the target is a combined view of unchanged data on the source and data on the save location

Disadvantages The unavailability of the source device invalidates the data on the target. The target contains only pointers to the data. Calculation of Target storage Size The physical capacity required for the target is a fraction of the source device. The capacity required for the save location depends on the amount of the expected data change. Copy on First Access ( CoFA ) At the time of activation, a protection bitmap is created for all data on the source devices. Pointers are initialized to map the (currently) empty data blocks on the target to the corresponding original data blocks on the source. The size range from 512 byte blocks to 64 KB blocks or higher. Data is then copied from the source to the target, based on the mode of activation.

CoFA ( Copy on First Access )-Write to Source In CoFA , after the replication session is initiated, data is copied from the source to the target when the following occurs: A write operation is issued to a specific address on the source for the first time (see Figure 13-7). A read or write operation is issued to a specific address on the target for the first time (see Figure 13-8 and Figure 13- 9). When a write is issued to the source for the first time after session activation, original data at that address is copied to the target. After this operation, the new data is updated on the source. This ensures that original data at the point-in-time of activation is preserved on the target. This is illustrated in Figure 13-7.

Redirect on Write ( RoW ) In Redirect on Write replication, at the time of session activation, the target contains pointers to the location of data on the source. The target does not contain data, at any time. Hence, the target is known as a virtual replica. protection bitmap is created for all data on the source device, and the target is immediately accessible

Storage Based Remote Replication Techniques Synchronous Replication ‘ Writes’ are committed to the source and remote target before sending notification of ‘Write complete’ to the production / Primary server. No additional writes will happen until every earlier write has been successfully completed and acknowledged. This is required to ensure that data is always same at source and target. To ensure remote data is consistent writes happen in same sequence as they were received.

Asynchronous Remote Replication It is important for an organization to replicate data across geographical locations in order to mitigate the risk involved during disaster. If the data is replicated (synchronously) between sites and the disaster strikes, then there would be a chance that both the sites may be impacted. This leads to data loss and service outage. In asynchronous replication, the server writes are collected into buffer (delta set) at the source. This delta set is transferred to the remote site in regular intervals. Therefore, adequate buffer capacity should be provisioned to perform asynchronous replication. 

Multi-site Replication  Source data is replicated to multiple remote sites. Data at source is replicated to two different locations and two different storage. The source to target (Site 1) is synchronous and Source to remote target (Site 2) is Asynchronous. In normal scenario all three sites are available, but operations continue from primary site.

In a two-site synchronous replication, the source and target sites are usually within a short distance. Therefore, if a regional disaster occurs, both the source and the target sites might become unavailable. A regional disaster will not affect the target site in a two-site asynchronous replication because the sites are typically several hundred or several thousand kilometers apart. If the source site fails, production can be shifted to the target site, but there is no further remote protection of data until the failure is resolved.

Host-based ( COMPUTE BASED ) data replication Host-based data replication uses the servers to copy data from one site to another site. Host-based replication software usually includes options like compression, encryption and, Error checking, & corrcetion

Advantages of Host-Based Replication Flexible: It can leverage existing IP networks Can be customized to your business’ needs: You can choose what data to replicate Can create a schedule for sending data Can use any combination of storage devices on end Disadvantages of Host-Based Replication Difficult to manage with a large group of servers if there is no centralized management console Consumes host resources during replication Both storage devices on each end need to be active, which means you will need to purchase dedicated hardware and OS Not all applications can support this type of data replication Can be affected by viruses or application failure

NETWORK-BASED DATA REPLICATION Network-based replication is a data replication technique that operates at the network layer. It involves replicating data between source and target systems over a network infrastructure. network-based replication is not tightly coupled to storage arrays or hosts but focuses on replicating data at the network level. Tightly coupled systems are characterized by high interdependence among the components, which makes it difficult to replace or modify one component without affecting the rest. In network-based replication, data is captured and replicated at the application or file system level. It intercepts the input/output (I/O) operations at the network layer, captures the changes made to data, and replicates them to the target system

Network-based replication can be synchronous or asynchronous With synchronous replication, data is written to the primary storage and the replica simultaneously. This method provides a higher level of data integrity but may introduce some latency due to the delay in acknowledging the write operation until the data is replicated In contrast, with asynchronous replication, data is first written to primary storage and then to the replica. Asynchronous replication is suitable for  minimal data loss is acceptable, and the focus is on optimizing performance and network utilization.

Advantages of network-based data replication Effective in large, heterogeneous storage and server environments Supports any host platform and works with any array Works separately from the servers and the storage devices Allows replication between multi-vendor products Disadvantages of network-based data replication Higher initial set-up cost because it requires proprietary hardware, as well as ongoing operational and management costs Requires implementation of a storage area network (SAN)

DATA MIGRATION Data migration is  the process of selecting, preparing, extracting, and transforming data and permanently transferring it from one computer storage system to another. Data migration is a common IT activity. Purpose of data migration replacing or upgrading servers or storage equipment. moving data between  on-premises infrastructure to cloud-based services. performing infrastructure maintenance; installing software upgrades. moving data during a company merger or data center relocation.

Storage migration transfers data from one storage device to another. This involves moving blocks of storage and files from storage systems, whether they're on disk, tape or in the cloud. Database migration moves database files to a new device. This is done when an organization changes database vendors, upgrades the database software or moves a database to the cloud. Databases must be backed up before migrating. Application migration moves an application or program from one environment to another. Application migration typically occurs when an organization switches to another vendor, application or platform. This process is complex because applications interact with other applications, and each one has its own data model. Successful application migration may require using middleware products to bridge technology gaps.

Cloud migration moves data or applications from an on-premises location to the cloud or from one cloud service to another. Cloud migration is a common form of data migration. Business process migration moves business applications -- including customer, product and operational data -- and processes to a new environment or new business model.

data migrations challenges Source data . Not preparing the source data being moved might lead to data duplicates, gaps or errors when it's brought into the new system or application. Wrong data formats . Data must be opened in a format that works with the system. Files might not have access controls on a new system if they aren't properly formatted before migration. Mapping data . When stored in a new database, data should be mapped in a sensible way to minimize confusion. Sustainable governance . Having a data governance plan in place can help organizations track and report on data quality, which helps them understand the integrity of their data. Security. Maintaining who can access, edit or remove data is a must for security.

Data migration strategies There are two main strategies organizations use: big bang and trickle migrations Big Bang Approach Under this approach, you migrate all your data at one time from the source system to the target system. One of the benefits of big bang data migration is fast implementation. it will require you to stop your operations and dedicate some time to the data migration process . All the systems involved during data migration to downtime till the process ends. This means that these systems will be unavailable for your end users and employees. As a solution, you can organize the big bang migration during public holidays, weekends, or the time when your customers don’t expect you to offer services . The approach is frequently applied by small-size businesses that do not collect vast amounts of data.

Trickle migration Trickle migration is the process of transferring data that involves breaking the migration down into several phases. Each phase is performed according to the plan, scope, and timeline. As a rule, the migration phases are subdivided Breaking the process migration into smaller sub-processes enables a business to keep functioning without pausing operations and systems. The data is also subdivided into parts, and these data pieces are migrated one by one. Compared with the big bang approach, the drawback of the trickle method is that it takes more time as the migration is done gradually. conclssion :complete a data migration within phases. During the migration, both old and new systems run at the same time, so there's no downtime, which means there's less risk of losing data. However, trickle migrations are more complicated and need more planning and time to implement properly.

The planning stage can be divided into four steps. Step 1 : refine the scope . To filter out any excess data and to define the smallest amount of information required to run the system effectively Step 2 : assess source and target systems To Assessment of the current system’s operational requirements and how they can be adapted to the new environment . Step 3 :set data standards To spot problem areas across each phase of the migration process and avoid unexpected issues at the post-migration stage. Step 4 : estimate budget and set realistic timelines

2.Data auditing and profiling : employ digital tools This stage is for examining and cleansing the full scope of data to be migrated. It aims at detecting possible conflicts, identifying  data quality  issues, and eliminating duplications prior to the migration. 3.Data backup : protect your content before moving it Technically, this stage is not compulsory. The best practices of data migration to give suggestion to creation of a full backup of the content you plan to move before executing the actual migration.  4.Migration design : hire an ETL specialist The migration design specifies migration testing rules, clarifies acceptance criteria, and assigns roles and responsibilities across the migration team members.

4.Execution This include data extraction, transformation, and loading process In the big bang scenario, it will last no more than a couple of days. if data is transferred in trickles, execution will take much longer but, as we mentioned before, with zero downtime and the lowest possible risk of critical failures. 5.Data migration testing : check data quality across phases 6.Post-migration audit : validate results with key clients Examples of data migration tools Microsoft SQL, AWS Data Migration Service, Varonis DatAdvantage and Varonis Data Transport Engine .

DISASTER RECOVERY AS A SERVICE (DRAAS) Disaster recovery as a service( DRaaS ) is a third-party solution that delivers cloud computing service model that allows an organization to back up its data and IT infrastructure in a third party cloud computing environment . Disaster recovery planning is critical to business continuity. Many disasters that have the potential to wreak havoc on an IT organization have become more frequent in recent years: Natural disasters such as hurricanes, floods, wildfires and earthquakes Equipment failures and power outages Cyberattacks

Managed DRaaS : In a managed DRaaS model, a third party takes over all responsibility for disaster recovery. Choosing this option requires an organization to stay in close contact with their DRaaS provider to ensure that it stays up to date on all infrastructure, application and services changes. Assisted DRaaS : If you prefer to maintain responsibility for some aspects of your disaster recovery plan, or if you have unique or customized applications that might be challenging for a third party to take over, assisted DRaaS might be a better option. Self-service DRaaS : The least expensive option is self-service DRaaS , where the customer is responsible for the planning, testing and management of disaster recovery, and the customer hosts its own infrastructure backup on virtual machines in a remote location. Careful planning and testing are required to make sure that processing can fail over to the virtual servers instantly in the event of a disaster. This option is best for those who have experienced disaster recovery experts on staff.