aboutsummaryrefslogtreecommitdiffstats
path: root/python/sandcrawler/grobid.py
Commit message (Expand)AuthorAgeFilesLines
* grobid worker fixes for newer ia lib refactorsBryan Newbold2020-01-141-3/+9
* fix grobid tests for new wayback refactorsBryan Newbold2020-01-091-3/+3
* be more parsimonious with GROBID metadataBryan Newbold2020-01-021-2/+4
* fixes for large GROBID result skipBryan Newbold2019-12-021-2/+2
* count empty blobs as 'failed' instead of crashingBryan Newbold2019-12-011-1/+2
* cleanup unused importBryan Newbold2019-12-011-1/+0
* filter out very large GROBID XML bodiesBryan Newbold2019-12-011-0/+6
* much progress on file ingest pathBryan Newbold2019-10-221-0/+14
* we do actually want consolidateHeader=2, not 1Bryan Newbold2019-10-041-3/+3
* grobid: consolidateHeaders typoBryan Newbold2019-10-041-1/+1
* disable citation consolidation by defaultBryan Newbold2019-10-041-1/+1
* fix GROBID POST flagsBryan Newbold2019-10-041-1/+3
* handle GROBID fetch empty blob conditionBryan Newbold2019-10-031-1/+2
* have grobidworker error status indicate issues instead of bailingBryan Newbold2019-10-021-4/+13
* more counts and bugfixes in grobid_toolBryan Newbold2019-09-261-4/+0
* small improvements to GROBID toolBryan Newbold2019-09-261-0/+4
* lots of grobid tool implementation (still WIP)Bryan Newbold2019-09-261-3/+63
* start refactoring sandcrawler python common codeBryan Newbold2019-09-231-0/+44