HBase Query Examples Using HappyBase python

In this article, we will see some HBase query examples using HappyBase python and shell commands. For this article, Hadoop and HBase services are running through the HBase docker image, which is already up and running. Let’s see how it all works in conjunction. 

Prerequisite

Table of Content

  1. What is HBase Database? 
  2. HBase Queries Examples Through Shell 
  3. HBase Queries by python HappyBase example  
  4. FAQ 

What is HBase Database?

HBase is a non-relational database management system built on top of Apache Hadoop. It is fault tolerant. It provides real-time data access from and to HDFS. HBase also performs compressions of data. It can hold billions of records of data and provides low latency in fetching records from those big data. 

1. HBase Queries Examples Through Shell

For this step, the 3rd point in the prerequisite should be completed. You should first bash into the Docker HBase container and open the HBase shell terminal with this command:  

				
					shell hbase
				
			

Create Table: 

This will create a table named ‘transaction’ and the remaining fields would be the columns. 

create ‘transaction’, ‘amount’, ‘card_type’, ‘websitename’, ‘countryname’, ‘datetime’, ‘transactionID’, ‘cityname’, ‘productname’ 

Show tables list: 

list 

Insert into Table: 

put ‘transaction’, ‘2’, ‘card_type’, ‘MasterCard’
put ‘transaction’, ‘3’, ‘card_type’, ‘Visa’
put ‘transaction’, ‘4’, ‘card_type’, ‘MasterCard’
put ‘transaction’, ‘5’, ‘card_type’, ‘Maestro’
put ‘transaction’, ‘1’, ‘amount’, ‘50.87’
put ‘transaction’, ‘2’, ‘amount’, ‘1023.2’
put ‘transaction’, ‘3’, ‘amount’, ‘3321.1’
put ‘transaction’, ‘4’, ‘amount’, ‘234.11’
put ‘transaction’, ‘5’, ‘amount’, ‘321.11’  

Count Total Rows in Table:

count ‘transaction’

Describe the Table and its Fields:

describe ‘transaction’

See if Table Exists:

exists ‘transaction’

Fetch Item from Table:

get ‘transaction’, ‘2’
get ‘transaction’, ‘5’

Fetch All from Table:

scan ‘transaction’

Delete a Cell Value:

delete ‘transaction’, ‘4’, ‘card_type’

Delete a Row in Table:

deleteall ‘transaction’, ‘4’

Drop Entire Table:

disable ‘transaction’
drop ‘transaction’

ALter Table:

alter ‘transaction’, NAME => ‘card_type’, VERSIONS => 5

Fyetch First 2 Rows:

scan ‘transaction’, {FILTER => “PageFilter(2)”}

Fetch Data Where Values are Matching with String> ‘Mas’

scan ‘transaction’, {COLUMNS => ‘card_type’, FILTER => “ValueFilter(=, ‘substring:Mas’)”}

Fetch Only Columns:

scan ‘transaction’, {FILTER => “KeyOnlyFilter()”}

Fetch Rows Where the RowKey Matched the Pattern:

scan ‘transaction’, {FILTER => “PrefixFilter(‘1’)”

To Fetch Columns Only for RowKey Matching = 2:

scan ‘transaction’, {FILTER => “PrefixFilter(‘2’) AND KeyOnlyFilter()”}

Find Column Families Which Start with ‘c’ and/or ‘p’

scan ‘transaction’, {FILTER => “ColumnPrefixFilter(‘c’)”}
scan ‘transaction’, {FILTER => “MultipleColumnPrefixFilter(‘c’,’p’)”}

Fetch Rows Starting from RowKey 3 and Onwards:

scan ‘transaction’, {STARTROW => ‘3’}

Fetch 3 Rows with The Column Family Starting with The Letter ‘c’

scan ‘transaction’, {FILTER => “PageFilter(3) AND ColumnPrefixFilter(‘c’)”}

More HBase Examples

For more examples and detailed explanations, I’ve found 2 online resources that may help you. Check them out:

  1. HBase query examples 
  2. HBase query examples 

2. HBase Queries by Python HappyBase Example 

Earlier we saw a lot of HBase query examples but using shell commands. In this step, we will connect HBase with Python and then execute queries. For this, we will use the HappyBase Python library, whose configuration you can find in this article. Before we start, you can look at the Official Documentation for HappyBase Library. Let’s get started.   

Create Table if not exists: 

				
					if "my_hbase_table" not in connection.tables(): # In all tables  
connection.create_table( 
'my_hbase_table', # Table name  
{ 
    "col_family_1": dict(), # Define column families  
    "col_family_2": dict(), # Define column families  
} 
) 
elif not connection.is_table_enabled("my_hbase_table"): # See if it's disabled  
connection.enable_table("my_hbase_table") # Enabling table 
				
			

Add Data into Table: 

				
					table = connection.table("my_hbase_table") # Connecting with table 
 
# Adding new data  
# The format is :put(" Line name ",{" ' Column family : Name ':' value ' "}) 
table.put("row1", { 
"col_family_1:col_1": "a", "col_family_2:col_1": "b"}) 
table.put("row2", { 
"col_family_1:col_1": "1", "col_family_1:col_2": "2", "col_family_2:col_1": "c"}) 
				
			

Check out Table.put() official docs. 

Fetch Data from Table: 

				
					table = connection.table("my_hbase_table") 
data = [] 
 
# Method 1: 
one_row = table.row('row1') # obtain row1 Row data  
for value in one_row.keys(): # Traverse each column of the current row  
print(value.decode('utf-8'), one_row[value].decode('utf-8')) # There may be Chinese , Use encode transcoding  
 
# Method 2: 
for row_index, col_families in table.scan(): # row_key Yes index, col_families It's a column family  
for col_key, col_value in col_families.items(): 
col_key_str = col_key.decode('utf-8') 
col_value_str = col_value.decode('utf-8') 
temp.append(col_value_str) 
data.append(temp) 
print(F" Row_Index: {row_index} \nColumn: {col_key_str} \nValue: {col_value_str}") 
print("==============================================") 
print(F“Your List of Fetched Data: {data}”)
				
			

Delete Data from Table: 

				
					table = connection.table("my_hbase_table") 
 
# Delete data  
table.delete("row1", ["col_family_1:col_1"]) # Delete single cell data  
table.delete("row2", ["col_family_1"]) # Delete the entire column family  
 
# Delete by Matching Data 
for row_index, col_families in table.scan(): # row_key Yes index, col_families It's a column family  
for col_key, col_value in col_families.items(): 
col_key_str = col_key.decode('utf-8') 
col_value_str = col_value.decode('utf-8') 
If col_value_str == ‘matching_string’: 
table.delete(" row_index ", ["col_familes"]) 
				
			

FAQ

As mentioned here 

‘HBase provides a dual approach to data access. While it’s row key based table scans provide consistent and real-time reads/writes, it also leverages Hadoop MapReduce for batch jobs. This makes it great for both real-time querying and batch analytics. Hbase also automatically manages sharding and failover support.’ 

Yes. HBase is an open-source NoSQL database which follows columnar format for storing large data. 

HappyBase is a python 3rd party library which is used to communicate HBase with Python script. We could use happybase connection to execute queries in HBase database with large datasets. 

And that’s a wrap! 

Thank you for going through my article, regarding HBase Query Examples Using HappyBase python and Shell Commands. I hope that you found it helpful and were able to comprehend it. Feel free to share your thoughts and opinions, in the comments section down below.  

Have a great one!