1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
|
Commands can be run from any cluster machine with hadoop environment config
set up. Most of these commands are run from the shell (start with `hbase
shell`). There is only one AIT/Webgroup HBase instance/namespace; there may be
QA/prod tables, but there are not QA/prod clusters.
## Create Table
Create column families (note: not all individual columns) with something like:
create 'wbgrp-journal-extract-0-qa', 'f', 'file', {NAME => 'grobid0', COMPRESSION => 'snappy'}
## Run Thrift Server Informally
The Thrift server can technically be run from any old cluster machine that has
Hadoop client stuff set up, using:
hbase thrift start -nonblocking -c
Note that this will run version 0.96, while the actual HBase service seems to
be running 0.98.
To interact with this config, use happybase (python) config:
conn = happybase.Connection("bnewbold-dev.us.archive.org", transport="framed", protocol="compact")
# Test connection
conn.tables()
## Queries From Shell
Fetch all columns for a single row:
hbase> get 'wbgrp-journal-extract-0-qa', 'sha1:3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ'
Fetch multiple columns for a single row, using column families:
hbase> get 'wbgrp-journal-extract-0-qa', 'sha1:3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ', 'f', 'file'
Scan a fixed number of rows (here 5) starting at a specific key prefix, all
columns:
hbase> scan 'wbgrp-journal-extract-0-qa',{LIMIT=>5,STARTROW=>'sha1:A'}
|