The HBase full guide

Hbase is part of the Hadoop platform and can be considered a NoSQL store. A NoSQL store basically means that the database is not a SQL database. In this case it basically entails a columnar store which HBase represents. A columnar store enables to store data in columns instead of rows. It does hold rows but its strength is mainly to make it easy to perform columnar searches and analytics.

The main strength of HBase is its ablity to allow realtime read and write of data. In use cases where realtime data analytics is imperative, HBase is the solution to go for.

Install HBase on your Linux machine

Creating a HBase table to play around is rather easy. The easiest way to initially install HBase is to have a virtual machine available on which JAVA is installed. You can install HBase in the standalone mode, meaning you do not need the Hadoop platform to install and run in. In this guide we follow the standalone mode of HBase as we only want to explain how to create a HBase table on your system for learning purposes.

If JAVA is not yet installed on your virtual machine, install it via the below command.

sudo apt update

sudo apt install default-jre

sudo apt install default-jdk

Once JAVA is installed, you can check the version with below command. If this shows the JAVA version, it is successfully installed on your system.

java -version

Now let’s proceed with installing HBase to subsequently create a HBase table. For this select the latest stable version of HBase. Download the package onto your machine with the following command.

wget https://dlcdn.apache.org/hbase/stable/hbase-2.4.9-bin.tar.gz

The package will be downloaded. Do make sure you select the “bin” version and not the “src” version of the file. Once the download is complete, extract the tar.gz file with the below commands. Make sure you change the version number to the one you downloaded yourself. For that always use above link to find and download the latest stable version of HBase.

tar xzvf hbase-2.4.9-bin.tar.gz

Once the tar.gz file is extractec successfully, we can start installing HBase. For this first change into the directory of HBase that has been created as part of the extracting job above.

cd hbase-2.4.9

Now we need to make sure the global variable for JAVA has been set correctly. If not, we would not be able to install and run HBase on our system. To do that we first need to check the latest JAVA version on our machine. The code to check that was already used above. Once you know the version we can set the HBase global JAVA variable with below code (on an Ubuntu machine).

export JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk-amd64

If no error is shown, the command has successfully set the JAVA global variable for your HBase installation. This basically means we can start installing HBase on our machine before we can actually start creating a HBase table. Note that you need to set the global Java variable based on the version of JAVA that you installed on your machine. Installing HBase is made easy with the standard installation script supplied with the HBase package. The installation script can be kicked off with the below command. Make sure you are in the main HBase folder which you extracted in an earlier step.

bin/start-hbase.sh

If no error occurs, HBase is successfully installed. You can easily test this in the browser by going to the internal ip of the machine where you installed HBase followed with the port 16010. This will load the HBase Web UI where you can view all running servers, the master and instantiated tables that have been created in HBase.

http://localhost:16010

How to create a HBase table

As this is a standalone HBase installation, only one server will be visible. The one where you installed HBase. In the next section we are going to create our first HBase table.

In order to do this, we need to move into the HBase shell in our Linux machine. This can be done with the below command.

./bin/hbase shell

Once the HBase shell is open, the Linux command line will now start with HBase:001:0. Via the CREATE command we can actually start creating a HBase table. A HBase table always holds, obviously, a table name and a column family.

The column family is used to create multiple substructures for columns to be created. This means that columns in the same column family are stored within the same files on your filesystem whereas a different column family will be stored in a separate file on your file system. For querying the data this will greatly improve the performance.

Note that all databases use files for storing the data. Whether it is a relational SQL database or HBase.

In this tutorial we will start simply with the creation of a single table and column family. create ‘table_name’, ‘column_family’, thus:

create 'test_table', 'cf_test'

When you open the HBase Web UI again you would now see the just created HBase table. Great! We have now demonstrated how to create a HBase table. In the next section we are going to populate the table with data which you can afterwards query.

Load data in your HBase table

In this section I will explain how you can easily populate the HBase table with data. Obviously, this is only for experimentation purposes as loading data via the HBase cell line by line is not something for large scale exploitation. In those cases an Apache Nifi workflow would make more sense. Apache Nifi is generally used to process and distribute data.

put 'test_table', 'row1', 'cf_test:column1', 'value1'
put 'test_table', 'row2', 'cf_test:column2', 'value2'
put 'test_table', 'row3', 'cf_test:column3', 'value3'
put 'test_table', 'row4', 'cf_test:column4', 'value4'
put 'test_table', 'row5', 'cf_test:column1', 'value5'

As you can see, we loaded 5 rows into this table. Each has a separate and new column apart from row 5. This shows that it is a columnar store whereby you can define per row which column is created or reused. You can also easily update a row and add a new column. We will show that later in this tutorial.

To verify and check the data you just loaded, you can use the scan function.

Query data in your HBase table

Querying data in your HBase table can be done in several ways. Initially I will show how you can quickly scan for the data you loaded in the previous section. Simply use the scan option.

scan 'test_table'

This will retrieve and show all records you just loaded into the HBase table within the shell.

If you want to select a single row and display the column value you can use the get command in the shell. This is shown below.

get 'test_table','row5'

This will show the Column1 value you have defined in the earlier step above.

In a new article I will show how you can query data from a HBase table via Apache Impala. Apache Impala allows executing Analytics and queries on top of data stored in Hadoop. HiveQL is a similar component in the Hadoop platform that also allows quering data from a HBase database. When considering the Hadoop platform, data can be stored in three storage type. There is the HBase columnar storage as described in this article, HDFS (Hadoop FileSystem) and Kudu.