Azure Data Lake

 


Intro

Azure Data Lake Storage Gen1 is an enterprise-wide hyper-scale repository for big data analytic workloads. Azure Data Lake enables you to capture data of any size, type, and ingestion speed in one single place for operational and exploratory analytics.


Documentation

 


Tips and Tidbits

  • Data Lake Storage Gen1 can be accessed from Hadoop (available with HDInsight cluster) using the WebHDFS-compatible REST APIs.

  • Data Lake Storage Gen1 is an Apache Hadoop file system that's compatible with Hadoop Distributed File System (HDFS), and works with the Hadoop ecosystem.

  • Your existing HDInsight applications or services that use the WebHDFS API can easily integrate with Data Lake Storage Gen1.

  • Azure Data Lake Storage Gen2 hierarchical namespace

  • The hierarchical namespace optimizes Azure Storage accounts for use in big data analytics scenarios while not forcing the creation of a separate silo of data. 

  • The hierarchical namespace allows you to define ACL and POSIX permissions on directories, subdirectories or individual files. You can also use role-based authentication and Azure Active Directory (Azure AD) to support resource management and data operations.

  • Blob storage supports Azure Data Lake Storage Gen2, Microsoft's enterprise big data analytics solution for the cloud. Azure Data Lake Storage Gen2 offers a hierarchical file system as well as the advantages of Blob storage, including:

    • Low-cost, tiered storage

    • High availability

    • Strong consistency

    • Disaster recovery capabilities

 

Â