aboutsummaryrefslogtreecommitdiffstats
path: root/please
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'bnewbold-backfill' into 'master'bnewbold2021-10-041-0/+22
|\ | | | | | | | | CDX Backfill (scalding version) See merge request webgroup/sandcrawler!12
| * temporary please option for scala backfillBryan Newbold2018-07-241-0/+22
| |
* | please: better usage outputBryan Newbold2020-11-021-1/+1
| |
* | point 'please' to python_hadoopBryan Newbold2019-09-251-4/+4
| |
* | GroupFatcatWorksSubsetJobBryan Newbold2019-08-261-0/+44
| | | | | | | | | | | | | | | | | | | | | | | | This is a hack-y variant of GroupFatcatWorksSubsetJob which allows setting different left and right sides of the join. The initial application is to re-run work merging with only longtail-oa works on the "left", with the goal of hard-merging these releases into existing releases with actual identifiers (instead of just grouping into works). As a refactor, the normal GroupFatcatWorksJob could just be this with the same file passed as both left and right, though that requires twice as much JSON parsing/filtering.
* | please command for groupworksfatcatBryan Newbold2019-08-101-0/+63
| |
* | please: add staging config (commented out)Bryan Newbold2019-07-071-0/+4
| |
* | scalding dump-grobid-status-code jobBryan Newbold2019-04-121-0/+24
| |
* | set long timeout on HBaseStatusCountJobBryan Newbold2019-02-261-1/+3
| |
* | longer match-crossref timeoutBryan Newbold2018-12-181-2/+3
| |
* | please support DumpGrobidXmlJobBryan Newbold2018-10-301-0/+24
| |
* | please support for DumpGrobidMetaInsertableJobBryan Newbold2018-09-221-0/+24
| |
* | dumpfilemeta support in pleaseBryan Newbold2018-09-131-0/+24
| |
* | insertable flag for match-crossrefBryan Newbold2018-09-121-1/+9
| |
* | match crossref reducers=200Bryan Newbold2018-08-311-1/+1
| |
* | please: save extraction outputBryan Newbold2018-08-261-0/+6
| |
* | add extraction_ungrobided support to pleaseBryan Newbold2018-08-251-0/+30
| |
* | please support for DumpUnGrobidedJobBryan Newbold2018-08-241-0/+24
| |
* | Merge branch 'bnewbold-missing-column'Bryan Newbold2018-08-241-0/+29
|\ \ | | | | | | | | | | | | | | | Manually Resolved Conflicts: please
| * | fixes to please keys-missing-colBryan Newbold2018-08-211-2/+2
| | |
| * | add please for keysmissingcolumnBryan Newbold2018-08-211-0/+29
| | |
* | | clarify please docsBryan Newbold2018-08-241-2/+2
| | |
* | | rename ./mapreduce to ./pythonBryan Newbold2018-08-241-3/+3
| | |
* | | fix merge typos in pleaseBryan Newbold2018-08-241-2/+2
| | |
* | | Merge branch 'bnewbold-match-quality'Bryan Newbold2018-08-241-0/+28
|\ \ \ | |/ / |/| | | | | | | | | | | Manually resolved merge conflict in: please
| * | please support for match-benchmarkBryan Newbold2018-08-211-0/+26
| | |
| * | fix bug with qa/prod detectionBryan Newbold2018-08-211-0/+1
| | |
* | | Merge branch 'bnewbold-match-scale'Bryan Newbold2018-08-211-0/+5
|\ \ \
| * | | explicit spill and compression settings for ScoreJobBryan Newbold2018-08-201-0/+5
| |/ /
* | | HDFS doesn't like colonsBryan Newbold2018-08-211-1/+1
| | |
* | | please support for status-code-countBryan Newbold2018-08-211-0/+24
| | |
* | | make col counter genericBryan Newbold2018-08-211-0/+28
| | |
* | | please support for grobid-scorable-dumpBryan Newbold2018-08-211-0/+24
|/ /
* | update 'please' command for scoring refactorBryan Newbold2018-08-151-1/+10
| |
* | add 'please' command for crossref matchingBryan Newbold2018-07-271-0/+28
|/
* update please helpers to provide hbase+zk configBryan Newbold2018-07-151-2/+13
|
* please: status-countBryan Newbold2018-06-151-0/+21
|
* please: extractBryan Newbold2018-06-151-0/+31
| | | | This script needs refactoring!
* please: split out rebuild stepsBryan Newbold2018-06-151-3/+18
|
* doc improvements and fixes to 'please' helperBryan Newbold2018-06-151-24/+23
|
* helper script for running jobsBryan Newbold2018-06-141-0/+86