diff options
| author | Bryan Newbold <bnewbold@archive.org> | 2020-06-16 17:28:33 -0700 | 
|---|---|---|
| committer | Bryan Newbold <bnewbold@archive.org> | 2020-06-16 17:28:36 -0700 | 
| commit | 5c32007e23a4f3b6902b760b5e06e4dd341918b3 (patch) | |
| tree | 86fe446ef6f980d09fa95867ddb0bae847cc2765 /sql/stats | |
| parent | d49ea4fb3f567351c63816e703348d8a9fd49ff0 (diff) | |
| download | sandcrawler-5c32007e23a4f3b6902b760b5e06e4dd341918b3.tar.gz sandcrawler-5c32007e23a4f3b6902b760b5e06e4dd341918b3.zip | |
initial work on PDF extraction worker
This worker fetches full PDFs, then extracts thumbnails, raw text, and
PDF metadata. Similar to GROBID worker.
Diffstat (limited to 'sql/stats')
0 files changed, 0 insertions, 0 deletions
