Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | pig: first rev of join-cdx-sha1 script | Bryan Newbold | 2019-12-22 | 3 | -0/+91 |
| | |||||
* | pig: move count_lines helper to pighelper.py | Bryan Newbold | 2019-12-22 | 3 | -7/+6 |
| | |||||
* | new/additional GWB CDX filter scripts | Bryan Newbold | 2019-10-17 | 7 | -0/+142 |
| | |||||
* | add ojs and dspace as in-domain patterns to look for in heuristic CDX PDF filter | Bryan Newbold | 2019-04-12 | 1 | -1/+1 |
| | |||||
* | rework fetch_hadoop script | Bryan Newbold | 2018-08-24 | 2 | -24/+5 |
| | | | | | Should work on macOS now, and fetches hadoop in addition to pig. Still requires wget (not installed by default on macOS). | ||||
* | commit old tweak to pig script (from cluster) | Bryan Newbold | 2018-07-06 | 1 | -2/+4 |
| | |||||
* | possibly-broken version of hbase-count-rows.pig | Bryan Newbold | 2018-07-06 | 1 | -0/+13 |
| | | | | | | This just worked a minute ago, but now throws: org.apache.hadoop.hbase.DoNotRetryIOException: java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/util/ByteStringer | ||||
* | fix tests post-DISTINCT | Bryan Newbold | 2018-05-08 | 4 | -25/+25 |
| | | | | Confirms it's working! | ||||
* | distinct on SHA1 in cdx scripts | Bryan Newbold | 2018-05-08 | 2 | -6/+18 |
| | |||||
* | pig cdx join improvements | Bryan Newbold | 2018-05-08 | 1 | -1/+1 |
| | |||||
* | how to run pig in production | Bryan Newbold | 2018-05-08 | 1 | -0/+5 |
| | |||||
* | WIP on filter-cdx-join-urls.pig | Bryan Newbold | 2018-05-07 | 1 | -0/+37 |
| | |||||
* | pig script to filter GWB CDX by SURT regexes | Bryan Newbold | 2018-05-07 | 6 | -0/+127 |
| | |||||
* | improve pig helper | Bryan Newbold | 2018-05-07 | 1 | -3/+11 |
| | |||||
* | try pig env again | Bryan Newbold | 2018-04-06 | 1 | -0/+2 |
| | |||||
* | use IA mirror for pig download | Bryan Newbold | 2018-04-06 | 1 | -1/+2 |
| | |||||
* | shift docs around a bit | Bryan Newbold | 2018-04-03 | 1 | -5/+0 |
| | |||||
* | clean up pig test stuff | Bryan Newbold | 2018-03-30 | 6 | -62/+71 |
| | |||||
* | basically working pig test | Bryan Newbold | 2018-03-29 | 5 | -23/+32 |
| | |||||
* | progress on pig tests | Bryan Newbold | 2018-03-29 | 8 | -10/+127 |
| | |||||
* | import WIP on pig test setup | Bryan Newbold | 2018-03-29 | 6 | -0/+156 |