In this article, we will see some HBase query examples using HappyBase python and shell commands. For this article, Hadoop and HBase services are running through the HBase docker image, which is already up and running. Let’s see how it all works in conjunction.
Prerequisite
- Docker should be installed. (How to install Docker)
- Docker HBase Container should be up and running (Check how)
- HappyBase connection through Python should be done (Check how)
Table of Content
- What is HBase Database?
- HBase Queries Examples Through Shell
- HBase Queries by python HappyBase example
- FAQ
What is HBase Database?
HBase is a non-relational database management system built on top of Apache Hadoop. It is fault tolerant. It provides real-time data access from and to HDFS. HBase also performs compressions of data. It can hold billions of records of data and provides low latency in fetching records from those big data.
1. HBase Queries Examples Through Shell
For this step, the 3rd point in the prerequisite should be completed. You should first bash into the Docker HBase container and open the HBase shell terminal with this command:
shell hbase
Create Table:
This will create a table named ‘transaction’ and the remaining fields would be the columns.
create ‘transaction’, ‘amount’, ‘card_type’, ‘websitename’, ‘countryname’, ‘datetime’, ‘transactionID’, ‘cityname’, ‘productname’
Show tables list:
list
Insert into Table:
put ‘transaction’, ‘2’, ‘card_type’, ‘MasterCard’
put ‘transaction’, ‘3’, ‘card_type’, ‘Visa’
put ‘transaction’, ‘4’, ‘card_type’, ‘MasterCard’
put ‘transaction’, ‘5’, ‘card_type’, ‘Maestro’
put ‘transaction’, ‘1’, ‘amount’, ‘50.87’
put ‘transaction’, ‘2’, ‘amount’, ‘1023.2’
put ‘transaction’, ‘3’, ‘amount’, ‘3321.1’
put ‘transaction’, ‘4’, ‘amount’, ‘234.11’
put ‘transaction’, ‘5’, ‘amount’, ‘321.11’
Count Total Rows in Table:
count ‘transaction’
Describe the Table and its Fields:
describe ‘transaction’
See if Table Exists:
exists ‘transaction’
Fetch Item from Table:
get ‘transaction’, ‘2’
get ‘transaction’, ‘5’
Fetch All from Table:
scan ‘transaction’
Delete a Cell Value:
delete ‘transaction’, ‘4’, ‘card_type’
Delete a Row in Table:
deleteall ‘transaction’, ‘4’
Drop Entire Table:
disable ‘transaction’
drop ‘transaction’
ALter Table:
alter ‘transaction’, NAME => ‘card_type’, VERSIONS => 5
Fyetch First 2 Rows:
scan ‘transaction’, {FILTER => “PageFilter(2)”}
Fetch Data Where Values are Matching with String> ‘Mas’
scan ‘transaction’, {COLUMNS => ‘card_type’, FILTER => “ValueFilter(=, ‘substring:Mas’)”}
Fetch Only Columns:
scan ‘transaction’, {FILTER => “KeyOnlyFilter()”}
Fetch Rows Where the RowKey Matched the Pattern:
scan ‘transaction’, {FILTER => “PrefixFilter(‘1’)”
To Fetch Columns Only for RowKey Matching = 2:
scan ‘transaction’, {FILTER => “PrefixFilter(‘2’) AND KeyOnlyFilter()”}
Find Column Families Which Start with ‘c’ and/or ‘p’
scan ‘transaction’, {FILTER => “ColumnPrefixFilter(‘c’)”}
scan ‘transaction’, {FILTER => “MultipleColumnPrefixFilter(‘c’,’p’)”}
Fetch Rows Starting from RowKey 3 and Onwards:
scan ‘transaction’, {STARTROW => ‘3’}
Fetch 3 Rows with The Column Family Starting with The Letter ‘c’
scan ‘transaction’, {FILTER => “PageFilter(3) AND ColumnPrefixFilter(‘c’)”}
More HBase Examples
For more examples and detailed explanations, I’ve found 2 online resources that may help you. Check them out:
2. HBase Queries by Python HappyBase Example
Earlier we saw a lot of HBase query examples but using shell commands. In this step, we will connect HBase with Python and then execute queries. For this, we will use the HappyBase Python library, whose configuration you can find in this article. Before we start, you can look at the Official Documentation for HappyBase Library. Let’s get started.
Create Table if not exists:
if "my_hbase_table" not in connection.tables(): # In all tables
connection.create_table(
'my_hbase_table', # Table name
{
"col_family_1": dict(), # Define column families
"col_family_2": dict(), # Define column families
}
)
elif not connection.is_table_enabled("my_hbase_table"): # See if it's disabled
connection.enable_table("my_hbase_table") # Enabling table
Add Data into Table:
table = connection.table("my_hbase_table") # Connecting with table
# Adding new data
# The format is :put(" Line name ",{" ' Column family : Name ':' value ' "})
table.put("row1", {
"col_family_1:col_1": "a", "col_family_2:col_1": "b"})
table.put("row2", {
"col_family_1:col_1": "1", "col_family_1:col_2": "2", "col_family_2:col_1": "c"})
Check out Table.put() official docs.
Fetch Data from Table:
table = connection.table("my_hbase_table")
data = []
# Method 1:
one_row = table.row('row1') # obtain row1 Row data
for value in one_row.keys(): # Traverse each column of the current row
print(value.decode('utf-8'), one_row[value].decode('utf-8')) # There may be Chinese , Use encode transcoding
# Method 2:
for row_index, col_families in table.scan(): # row_key Yes index, col_families It's a column family
for col_key, col_value in col_families.items():
col_key_str = col_key.decode('utf-8')
col_value_str = col_value.decode('utf-8')
temp.append(col_value_str)
data.append(temp)
print(F" Row_Index: {row_index} \nColumn: {col_key_str} \nValue: {col_value_str}")
print("==============================================")
print(F“Your List of Fetched Data: {data}”)
Delete Data from Table:
table = connection.table("my_hbase_table")
# Delete data
table.delete("row1", ["col_family_1:col_1"]) # Delete single cell data
table.delete("row2", ["col_family_1"]) # Delete the entire column family
# Delete by Matching Data
for row_index, col_families in table.scan(): # row_key Yes index, col_families It's a column family
for col_key, col_value in col_families.items():
col_key_str = col_key.decode('utf-8')
col_value_str = col_value.decode('utf-8')
If col_value_str == ‘matching_string’:
table.delete(" row_index ", ["col_familes"])
FAQ
As mentioned here:
‘HBase provides a dual approach to data access. While it’s row key based table scans provide consistent and real-time reads/writes, it also leverages Hadoop MapReduce for batch jobs. This makes it great for both real-time querying and batch analytics. Hbase also automatically manages sharding and failover support.’
Yes. HBase is an open-source NoSQL database which follows columnar format for storing large data.
HappyBase is a python 3rd party library which is used to communicate HBase with Python script. We could use happybase connection to execute queries in HBase database with large datasets.
And that’s a wrap!
Thank you for going through my article, regarding HBase Query Examples Using HappyBase python and Shell Commands. I hope that you found it helpful and were able to comprehend it. Feel free to share your thoughts and opinions, in the comments section down below.
Have a great one!
Recent Comments
Categories
- Angular
- AWS
- Backend Development
- Big Data
- Cloud
- Database
- Deployment
- DevOps
- Docker
- Frontend Development
- GitHub
- Google Cloud Platform
- Installations
- Java
- JavaScript
- Linux
- MySQL
- Networking
- NodeJS
- Operating System
- Python
- Python Flask
- Report
- Security
- Server
- SpringBoot
- Subdomain
- TypeScript
- Uncategorized
- VSCode
- Webhosting
- WordPress
Search
Recent Post
Process scheduling algorithm – FIFO SJF RR
- 14 September, 2024
- 8 min read
How to Implement Multithreading in C Language
- 8 September, 2024
- 8 min read
How to Implement Inter-Process Communication Using Pipes
- 7 September, 2024
- 10 min read