Azure Event Hub

Azure Event Hub

 


Intro

Azure Event Hubs is a big data streaming platform and event ingestion service. It can receive and process millions of events per second. 


Documentation

 


Tips and Tidbits

  • Azure Event Hubs is a big data streaming platform and event ingestion service.

  • It can receive and process millions of events per second. 

    • It can receive events but not send responses back. That is, it is not bi-directional

    • IoT Hub can receive events and back-end service can send commands to your IoT devices.

  • It facilitates the capture, retention, and replay of telemetry and event stream data.

  • The data can come from many concurrent sources.

  • Event Hubs allows telemetry and event data to be made available to various stream-processing infrastructures and analytics services.

  • It's available either as data streams or bundled event batches.

  • This service provides a single solution that enables rapid data retrieval for real-time processing, and repeated replay of stored raw data.

  • It can capture the streaming data into a file for processing and analysis.

  • You can publish an event via AMQP 1.0, the Kafka protocol, or HTTPS. 

  • It has the following characteristics:

    • Low latency

    • Can receive and process millions of events per second

    • At least once delivery of an event

  • Tutorial: Stream big data into a data warehouse

 

 

 

 

  • In most cases, the most effective method to stream monitoring data to external tools is using Azure Event Hubs

  • Before you configure streaming for any data source, you need to create an Event Hubs namespace and event hub.

  • This namespace and event hub is the destination for all of your monitoring data.

  • An Event Hubs namespace is a logical grouping of event hubs that share the same access policy .

 

  • What is the maximum retention period for events?

  • Event Hubs Standard tier currently supports a maximum retention period of seven days.

  • Event hubs aren't intended as a permanent data store.

    • Retention periods greater than 24 hours are intended for scenarios in which it's convenient to replay an event stream into the same systems.

      • For example, to train or verify a new machine learning model on existing data.

    • If you need message retention beyond seven days, enabling Event Hubs Capture on your event hub pulls the data from your event hub into the Storage account or Azure Data Lake Service account of your choosing.

 


Discover Azure Event Hubs

  • Discover Azure Event Hubs

  • Azure Event Hubs represents the "front door" for an event pipeline, often called an event ingestor in solution architectures.

  • An event ingestor is a component or service that sits between event publishers and event consumers to decouple the production of an event stream from the consumption of those events.

  • Event Hubs provides a unified streaming platform with time retention buffer, decoupling event producers from event consumers.

  • A partition is an ordered sequence of events that is held in an Event Hub.

    • Partitions are a means of data organization associated with the parallelism required by event consumers.

  • A consumer group is a view of an entire Event Hub.

    • Consumer groups enable multiple consuming applications to each have a separate view of the event stream, and to read the stream independently at their own pace and from their own position.

    • There can be at most 5 concurrent readers on a partition per consumer group;

      • however it is recommended that there is only one active consumer for a given partition and consumer group pairing.

 

  • Azure Event Hub – Understanding & Designing of Partitions and Throughput Unit

  • Partitions are a data organization mechanism that relates to the downstream parallelism required in consuming applications.

    • The number of partitions in an event hub directly relates to the number of concurrent readers you expect to have.

    • For more information on partitions, see Partitions.

  • You can use a partition key to map incoming event data into specific partitions for the purpose of data organization.

  • The partition key is a sender-supplied value passed into an event hub. It is processed through a static hashing function, which creates the partition assignment.

  • If you don't specify a partition key when publishing an event, a round-robin assignment is used.

 


Explore Event Hubs Capture

  • Explore Event Hubs Capture

  • Azure Event Hubs enables you to automatically capture the streaming data in Event Hubs in an Azure Blob storage or Azure Data Lake Storage account of your choice, with the added flexibility of specifying a time or size interval.

 

  • Event Hubs is a time-retention durable buffer for telemetry ingress, similar to a distributed log.

  • The key to scaling in Event Hubs is the partitioned consumer model.

    • Each partition is an independent segment of data and is consumed independently.

    • Over time this data ages off, based on the configurable retention period.

    • As a result, a given event hub never gets "too full."

  • Event Hubs Capture enables you to specify your own Azure Blob storage account and container, or Azure Data Lake Store account, which are used to store the captured data.

  • These accounts can be in the same region as your event hub or in another region

  • Captured data is written in Apache Avro format: a compact, fast, binary format that provides rich data structures with inline schema.

    • This format is widely used in the Hadoop ecosystem, Stream Analytics, and Azure Data Factory

  • Event Hubs Capture enables you to set up a window to control capturing.

    • This window is a minimum size and time configuration with a "first wins policy," meaning that the first trigger encountered causes a capture operation.

    • Each partition captures independently and writes a completed block blob at the time of capture, named for the time at which the capture interval was encountered.

    • The storage naming convention is as follows:

    • {Namespace}/{EventHub}/{PartitionId}/{Year}/{Month}/{Day}/{Hour}/{Minute}/{Second} https://mystorageaccount.blob.core.windows.net/mycontainer/mynamespace/myeventhub/0/2017/12/08/03/03/17.avro

       


Comparison of services

Service

Purpose

Type

When to use

Service

Purpose

Type

When to use

Event Grid

Reactive programming

Event distribution (discrete)

React to status changes

Event Hubs

Big data pipeline

Event streaming (series)

Telemetry and distributed data streaming

Service Bus

High-value enterprise messaging

Message

Order processing and financial transactions

 


Protocols

 

  • AMQP 1.0 in Azure Service Bus and Event Hubs protocol guide

  • The Advanced Message Queueing Protocol 1.0 is a standardized framing and transfer protocol for asynchronously, securely, and reliably transferring messages between two parties.

  • It is the primary protocol of Azure Service Bus Messaging and Azure Event Hubs.

  • AMQP is a framing and transfer protocol.

    • Framing means that it provides structure for binary data streams that flow in either direction of a network connection.

    • The structure provides delineation for distinct blocks of data, called frames, to be exchanged between the connected parties.

  • Publishing an event

  • You can publish an event via AMQP 1.0, the Kafka protocol, or HTTPS. 

  • The choice to use AMQP or HTTPS is specific to the usage scenario.

    • AMQP requires the establishment of a persistent bidirectional socket in addition to transport level security (TLS) or SSL/TLS.

    • AMQP has higher network costs when initializing the session, however HTTPS requires additional TLS overhead for every request.

    • AMQP has significantly higher performance for frequent publishers and can achieve much lower latencies when used with asynchronous publishing code.

 

  • This article has some good performance experiments and recommendations: Event Hubs ingestion performance and throughput

    • Taming the fire hose: Azure Stream Analytics

    • If we send many events:  always reuse connections, i.e. do not create a connection only for one event.  This is valid for both AMQP and HTTP.  A simple Connection Pool pattern makes this easy.

    • If we send many events & throughput is a concern:  use AMQP.

    • If we send few events and latency is a concern:  use HTTP / REST.

    • If events naturally comes in batch of many events:  use batch API.

    • If events do not naturally comes in batch of many events:  simply stream events.  Do not try to batch them unless network IO is constrained.

    • If a latency of 0.1 seconds is a concern:  move the call to Event Hubs away from your critical performance path.