aboutsummaryrefslogtreecommitdiffstats
path: root/hbase/howto.md
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2018-08-24 13:39:21 -0700
committerBryan Newbold <bnewbold@archive.org>2018-08-24 13:39:21 -0700
commit67755e366bcc1df455a9d75710a11030c3e2cc52 (patch)
tree9e46155ba3290634e9a328fc10fc3362789d448d /hbase/howto.md
parent1ae7fd2f0c5661560b15be86614c2c4d41b21205 (diff)
downloadsandcrawler-67755e366bcc1df455a9d75710a11030c3e2cc52.tar.gz
sandcrawler-67755e366bcc1df455a9d75710a11030c3e2cc52.zip
move HBase schema and notes from journal-infra repo
Diffstat (limited to 'hbase/howto.md')
-rw-r--r--hbase/howto.md28
1 files changed, 28 insertions, 0 deletions
diff --git a/hbase/howto.md b/hbase/howto.md
new file mode 100644
index 0000000..fcf561f
--- /dev/null
+++ b/hbase/howto.md
@@ -0,0 +1,28 @@
+
+Commands can be run from any cluster machine with hadoop environment config
+set up. Most of these commands are run from the shell (start with `hbase
+shell`). There is only one AIT/Webgroup HBase instance/namespace; there may be
+QA/prod tables, but there are not QA/prod clusters.
+
+## Create Table
+
+Create column families (note: not all individual columns) with something like:
+
+ create 'wbgrp-journal-extract-0-qa', 'f', 'file', {NAME => 'grobid0', COMPRESSION => 'snappy'}
+
+## Run Thrift Server Informally
+
+The Thrift server can technically be run from any old cluster machine that has
+Hadoop client stuff set up, using:
+
+ hbase thrift start -nonblocking -c
+
+Note that this will run version 0.96, while the actual HBase service seems to
+be running 0.98.
+
+To interact with this config, use happybase (python) config:
+
+ conn = happybase.Connection("bnewbold-dev.us.archive.org", transport="framed", protocol="compact")
+ # Test connection
+ conn.tables()
+