The HBase Architecture consists of servers in a Master-Slave relationship as shown below. Typically, the HBase cluster has one Master node, called HMaster and multiple Region Servers called HRegionServer. Each Region Server contains multiple Regions — HRegions.

Just like in a Relational Database, data in HBase is stored in Tables and these Tables are stored in Regions. When a Table becomes too big, the Table is partitioned into multiple Regions. These Regions are assigned to Region Servers across the cluster. Each Region Server hosts roughly the same number of Regions.

To Get More Tutorials visit Big data and Hadoop course Blog

The HMaster in the HBase is responsible for

  • Performing Administration

On the other hand, the HRegionServer perform the following work

  • Hosting and managing Regions

Each Region Server contains a Write-Ahead Log (called HLog) and multiple Regions. Each Region in turn is made up of a MemStore and multiple StoreFiles (HFile). The data lives in these StoreFiles in the form of Column Families (explained below). The MemStore holds in-memory modifications to the Store (data).

The mapping of Regions to Region Server is kept in a system table called .META. When trying to read or write data from HBase, the clients read the required Region information from the .META table and directly communicate with the appropriate Region Server. Each Region is identified by the start key (inclusive) and the end key (exclusive)

HBase Tables and Regions

Table is made up of any number of regions.

Region is specified by its startKey and endKey.

  • Empty table: (Table, NULL, NULL)

Each region may live on a different node and is made up of several HDFS files and blocks, each of which is replicated by Hadoop. HBase uses HDFS as its reliable storage layer.It Handles checksums, replication, failover

HBase Tables:

  • Tables are sorted by Row in lexicographical order

Hbase consists of,

  • Java API, Gateway for REST, Thrift, Avro

Data is stored in memory and flushed to disk on regular intervals or based on size

  • Small flushes are merged in the background to keep number of files small

MemStores:

After data is written to the WAL the RegionServer saves KeyValues in memory store

  • Flush to disk based on size, is hbase.hregion.memstore.flush.size

Compactions:

Two types: Minor and Major Compactions

Minor Compactions

  • Combine last “few” flushes

Major Compactions

  • Rewrite all storage files

Key Cardinality:

The best performance is gained from using row keys

  • Time range bound reads can skip store files

Fold, Store, and Shift:

All values are stored with the full coordinates,including: Row Key, Column Family, Column Qualifier, and Timestamp

  • Folds columns into “row per column”

Big Data,ios,android,Spark