DATA SERIALIZATION IN BIG DATA ANALYSIS S.Subhalakshmi , II m.sc( cs ), nadar saraswathi college of arts and science, theni .
CONTENT: Serialization Uses of serialization Drawbacks Serialization formats Programming language support
SERIALIZATION: Serialization is the process of translating data structures or object state into a format that can be stored. This process of serializing an object is also called marshalling an object . The opposite operation, extracting a data structure from a series of bytes, is unmarshalling .
They are sequences of bytes in several ways: i ) send it to another process ii) send it to the clipboard, to be browsed or used by another application. iii) send it to another machine iv)send it to file on disk
USES OF SERIALIZATION: A method of transferring data through the wires (messaging). A method of storing data (in databases, on hard disk drives). A method of remote procedure calls, e.g., as in SOAP.
A method for detecting changes in time-varying data . T he serialization process includes a step called unswizzling or pointer unswizzling . The deserialization process includes an inverse step called pointer swizzling .
DRAWBACKS: Serialization breaks the opacity of an abstract data type by potentially exposing private implementation details . Trivial implementations which serialize all data members may violate encapsulation. Many institutions, such as archives and libraries, attempt to future proof their backup archives.
SERIALIZATION FORMATS: The Xerox Network Systems Courier technology in the early 1980s influenced the first widely adopted standard . Sun Microsystems published the External Data Representation (XDR) in 1987. XML was used to produce a human readable text-based encoding.
Binary XML had been proposed as a compromise which was not readable by plain-text editors. In the 2000s, XML was often used for asynchronous transfer of structured data between client and server in Ajax web applications . JSON is a lighter plain-text alternative to XML which is also commonly used for client-server communication in web applications .
YAML , is similar to JSON and includes features that make it more powerful for serialization, more "human friendly," and potentially more compact. For large volume scientific datasets, such as satellite data and output of numerical climate, weather, or ocean models, specific binary serialization standards have been developed, e.g. HDF, netCDF and the older GRIB.
PROGRAMMING LANGUAGE SUPPORT: Several object-oriented programming languages directly support object serialization. The languages which do so include Ruby, Smalltalk, Python, PHP, Objective-C , Delphi, Java, and the .NET family of languages. There are also libraries available that add serialization support to languages that lack native support for it.
CFML: CFML allows data structures to be serialized to WDDX. OCAML: OCaml's standard library provides marshalling through the Marshal module its documentation.
PERL: Several Perl modules available from CPAN provide serialization mechanisms, including Storable , JSON::XS and Freeze Thaw. DELPHI: Delphi provides a built-in mechanism for serialization of components which is fully integrated with its IDE.
C and C++: C and C++ do not provide serialization as any sort of high-level construct, but both languages support writing any of the built-in data types, as well as plain old data structs , as binary data.
SWIFT: The Swift standard library provides two protocols, Encodable and Decodable . JAVASCRIPT: JavaScript has included the built-in JSON object and its methods.
JAVA: Java provides automatic serialization which requires that the object be marked by implementing the java.io.Serializable . .NETFRAMEWORK: .NET Framework has several serializers designed by Microsoft.
PYTHON: The core general serialization mechanism is the pickle standard library module. PHP: PHP originally implemented serialization through the built-in serialize() and unserialize () functions.
R: R has the function dput which writes an ASCII text representation of an R object to a file or connection . REBOL: REBOL will serialize to file (save/all) or to a string! (mold/all).
RUBY: Ruby includes the standard module Marshal. SMALLTALK: In general, non-recursive and non-sharing objects can be stored and retrieved in a human readable form using the storeOn :/ readFrom : protocol.
LISP: Generally a Lisp data structure can be serialized with the functions "read" and "print ". HASKELL: In Haskell, serialization is supported for types that are members of the Read and Show type classes.
WINDOWS POWERSHELL: Windows PowerShell implements serialization through the built-in cmdlet Export- CliXML . JULIA: Julia implements serialization through the serialize() / deserialize () modules .