Big Data Storage in Hadoop
Storage system incumbents are at an advantage, but they must prove that their data protection and performance optimization features are right for Hadoop; providers of pooled storage must prove that such architectures can hold their own, in spite of the additional network overhead (compared to direct-attached topologies that are established practice with Hadoop).
As mainstream enterprises are becoming interested in Hadoop, the number of storage alternatives to HDFS is increasing.
Optimized for high-volume, sequential processing of large files, HDFS lacks many data protection, security, access, and performance optimization features associated with mature commercial file and data storage subsystems; many of these features are not essential for existing Hadoop analytic processing patterns.
Because Big Data analytics is a moving target, the Hadoop platform may have to accommodate features currently offered by more mature file and storage systems, such as support of snapshots, if it is to become accepted as an enterprise analytics platform.
Features and Benefits
Compares the strategies of different storage vendors.
Assesses the viability of NAS architectures.
Key Questions Answered
Is HDFS the definitive architecture for storing Hadoop data? Why or why not?
Are there other viable alternatives for storing Hadoop data aside from HDFS?
Does pooled storage (e.g., NAS) make sense for Hadoop/Big Data analytics?