Active File Management (AFM) Bringing data together across clusters
Unit objectives After completing this unit, you should be able to Describe Home and Cache features List the various AFM modes Create and manage an AFM relationship
GPFS introduced concurrent file system access from multiple nodes. Multi-cluster expands the global namespace by connecting multiple sites AFM takes global namespace truly global by automatically managing asynchronous replication of data GPFS GPFS GPFS GPFS GPFS GPFS 1993 2005 2011 Evolution of the global namespace: AFM Active file management (AFM)
Synchronous operations (cache validate/miss) On a cache miss, pull attrs and create on demand (lookup, open, …) In case, where cache is setup with empty home, there shouldn’t be any sync ops. On a later data read Whole file is fetched over NFS and written locally Data read is done in parallel across multiple nodes Application can continue after required data is in cache while the remaining file is being fetched On a cache hit Attributes are revalidated based on revalidation delay If data hasn’t changed it is read locally On a disconnected mode access Data access to cached data will fetch local data Files not cached are returned as not existing (ENOENT )
Asynchronous Updates (write, create, remove) Updates at cache site are pushed back lazily Mask the latency of the WAN Data is written to GPFS at cache site synchronously Writeback is asynchronous Configurable asynch delay Writeback coalesce updates and accommodate out-of-order and parallel writes Filter I/o as needed (Rewrites to same blocks) Admin can force a sync if needed mmafmctl -- flushPending
AFM Modes Single Writer Only cache can write data. Home can’t change. Peer cache needs to be setup as read only Read Only Cache can only read data, no data change allowed. Local Update Data is cached from home and changes are allowed like SW mode but changes are not pushed to home. Once data is changed the relationship is broken i.e cache and home are no longer in sync for that file. Independent Writer Data can change at home and any caches Different caches can change different files Changing Modes SW, IW & RO mode cache can be changed to any other mode LU cache can’t be changed
Pre-fetching Prefetch Files selectively from home to cache Runs asynchronously in the background Parallel multi-node prefetch (4.1) Metadata-only without fetching files (4.1) User exit when completed Choose to files to prefetch based on policy
Cache Eviction Use when Cache smaller than home Data fills up in cache faster than it can be pushed to home. Need to create space for caching other files or space for incoming writes. Eviction is linked with fileset quotas . For RO fileset cache eviction is triggered automatically When fileset usage level goes above fileset soft quota limits Chooses files based on LRU Files with unsynched data are not evicted Eviction can be disabled It can be triggered manually mmafmctl Device evict -j FilesetName
Non- Posix data over NFS No POSIX attributes ACLs , Xattrs , Sparse files Can be transferred using AFM protocol over NFS To enable on the home exported fileset root directory mmafmconfig At the cache site First access checks for existence of this special file and logs a message if not accessible User Ids should be maintained across sites as there is no mapping done by AFM
Expiration of Data Staleness Control Defined based on time since disconnection Once cache is expired, no access is allowed to cache Manual expire/ unexpire option for admin Mmafmctl – expire/ unexpire Allowed only for ro mode cache
Independent Writer Multiple cache filesets can write to single home, as long as each cache writes to different files . The multiple cache sites do re-validation periodically and pull the new data from home . In case, multiple cache filesets write to same file, the sequence of updates is un-deterministic . The writes are pushed to home as they come in independently becuase there is no locking between clusters.
IW Cache Home Moving Applications The application can be moved from cache to home Only If application semantics allows them to work with incomplete data. A special procedure needs to be followed for moving application from cache to home and back to cache. Unplanned / Planned
Unplanned Application Movement Cache site goes down while pending changes are not synced Note down the time at home, this time is required during failback when application is moved back to cache. Unlink fileset at cache. Move Application To Home. Once cache is back, run failback cmd to resolve differences, prefetch new data and open up for application.
New in GPFS 4.1 GPFS Backend using GPFS multi-cluster Parallel I/O Using multiple threads and multiple nodes per file Better handling of GW node failures Various usability improvements
Restrictions Hard links Hard links at home are not detected Creation of hard link in cache are maintained The following are NOT supported/cached/replicated Clones Special files like sockets or device files Fileset metadata like quotas, replication parameters, snapshots etc. Renames Renames at home will result in remove/create in cache Locking is restricted to cache cluster only Independent Filesets only (NO file system level AFM setup) Dependent filesets can’t be linked into AFM filesets Peer snapshots are supported only in SW mode
Disaster Recovery Coming in GPFS 4.1.1 aka TL2
NAS client AFM (configured as primary) AFM (configured as secondary) Push all updates asynchronously Client switches to secondary on failure AFM based DR – 4.1 ++ Supported in TL2 Replicate data from primary to secondary site. The relationship is Active-Passive (primary – RW, secondary – RO) Allow primary to operate actively with no interruption when the relationship with secondary fails Automatic failback when primary comes back Granularity at fileset level RPO and RTO support – from min to hours (depends on data rate change/link bw etc ) Not supported in TL2 No Cascading mode aka no teritiary and only one secondary allowed per relationship Posix only ops, no appendOnly support No file system level support Continue with present limitation of not allowing to link dependent filset inside panache fileset No Metadata replication (dependent filesets , user snapshots, fileset quotas, user quotas, replication factor, other fileset attributes, support direct io setting)
Take a fileset snapshot at master Mark the point in time in the “write-back queue” Push all updates upto point in time marker Take a snapshot of fileset at the replica Update mgmnt tool state of last snapshot time and ids AFM (configured as home-replica) Push all updates asynchronously Continuous replication with snapshot support AFM (configured as home-master) 1 1 3 Multi site snapshot mgmnt tool SONAS-SPARK 2 1 2 3 Snapshot at cache and home correspond to same point in time Psnap Consistent Replication
DR Configuration Establish primary-secondary relationship Create AFM fileset at primary and associate with the DRSecondary fileset Provides DRPrimaryID that should be used when setting up DRSecondary Initialization phase Truck the data from the primary to secondary if necessary Initial Trucking can be done via AFM or out of band (customer choosen method like tape etc ) Normal operation Async replication will continuously push data to secondary based on asyncDelay Psnap support to get common consistency points between primary and secondary. Done periodically based on RPO
On DR Event Primary Failure Promote Secondary to DRPrimary Restore data from last consistency point (RPO snapshot) Secondary Failure Establish a new secondary mmafmctl – setNewSecondary Takes a initial snapshot and pushes data to new secondary in the background RPO snapshot will start after intial sync Failback to old primary Restore to last RPO snapshot(similar to whats done on secondary during its promotion to primary) Find changes made at secondary and apply back to original primary. Incremental or once Needs down time in the last iteration to avoid any more changes Revert the primary secondary modes
Tunable Parameters Revalidation timeout afmFileLookupRefreshInterval = <0, MAXINT> afmDirLookupRefreshInterval = <0, MAXINT> Async delay timeout set per fileset to push updates to home at a later time afmAsyncDelay = <0, MAXINT> Expiration timeout After disconnection, data in cache expires after this time afmExpirationTimeout = <0, MAXINT> Disconnect Timeout pingThread timeout for gateway node to go into disconnected mode afmDisconnectTimeout = <0, MAXINT> Parallel Read threshold afmParallelReadThreshold = <1GB,maxfilesize> Operating Modes (Caching behavior e.g., read-only default) afmMode = [read-only, single-writer, local-update ]