summaryrefslogtreecommitdiffstats
path: root/fatcat_scholar/work_pipeline.py
Commit message (Collapse)AuthorAgeFilesLines
* fetch pdftotext and pdf_meta from blobs, postgrestBryan Newbold2020-06-291-18/+45
| | | | | This replaces the temporary COVID-19 content hack with production content (text, thumbnail URLs) stored in postgrest and seaweedfs.
* flake8 fixes (partial)Bryan Newbold2020-06-031-5/+2
|
* reformat python code with blackBryan Newbold2020-06-031-68/+120
|
* more petabox timeout handlingBryan Newbold2020-05-211-0/+3
|
* handle petabox read timeouts a bitBryan Newbold2020-05-211-1/+6
|
* fix typo with UnicodeDecodeError catchBryan Newbold2020-05-211-1/+1
|
* skip pdftotext loading on unicode errorBryan Newbold2020-05-201-0/+2
|
* skip SIM items w/o page_numbers (instead of asserting)Bryan Newbold2020-05-201-1/+3
|
* fixes from manual testingBryan Newbold2020-05-201-8/+13
|
* local pdftotext cache dir hackBryan Newbold2020-05-201-1/+18
|
* fixes to release+sim pipelineBryan Newbold2020-05-201-10/+16
|
* first pass transform from pipelines to ES schemaBryan Newbold2020-05-201-16/+1
|
* WIP on SIM pipelineBryan Newbold2020-05-191-2/+2
|
* WIP on release-to-sim fetchingBryan Newbold2020-05-191-12/+49
|
* initial progress on work pipelineBryan Newbold2020-05-161-0/+305