summaryrefslogtreecommitdiffstats
path: root/fatcat_scholar/sim_pipeline.py
Commit message (Expand)AuthorAgeFilesLines
* fmt and commentsBryan Newbold2021-12-061-1/+3
* SIM pipeline: improve exception handlingBryan Newbold2021-12-061-4/+7
* SIM pipeline: fix bug w/r/t issues with no fatcat coverage at allBryan Newbold2021-12-061-2/+2
* SIM pipeline: improve issue skipping (based on suffix)Bryan Newbold2021-12-061-11/+21
* SIM pipeline: retain only one ulrichs recordBryan Newbold2021-12-061-0/+1
* lint: small cleanups, mostly E711 and E713Bryan Newbold2021-10-271-1/+1
* re-style imports (isort) on all core python filesBryan Newbold2021-10-271-10/+7
* catch/ignore ChunkedEncoding errors in fetchesBryan Newbold2021-06-111-0/+3
* schema: add 'crossref' to bundle schema, and add from_json() helperBryan Newbold2021-06-021-0/+1
* sim: catch MaxRetryErrorBryan Newbold2021-01-311-0/+2
* enable sentry exceptions for workers and pipelinesBryan Newbold2021-01-301-0/+10
* sim pipeline: improve exception catchingBryan Newbold2021-01-271-4/+5
* sim indexing: new parallel fetch structureBryan Newbold2021-01-261-0/+65
* commands: show usage on empty commandBryan Newbold2020-11-021-1/+1
* SIM pipeline: refactor issue item fetching and bundle conversionBryan Newbold2020-10-161-23/+32
* json: exclude None in output, and sort keysBryan Newbold2020-07-271-1/+1
* fix lint errors (and some small bugs)Bryan Newbold2020-06-291-2/+2
* more flake8Bryan Newbold2020-06-031-1/+1
* flake8 fixes (partial)Bryan Newbold2020-06-031-13/+4
* reformat python code with blackBryan Newbold2020-06-031-45/+65
* more petabox timeout handlingBryan Newbold2020-05-211-0/+3
* handle petabox read timeouts a bitBryan Newbold2020-05-211-1/+6
* skip SIM items w/o page_numbers (instead of asserting)Bryan Newbold2020-05-201-1/+3
* first pass transform from pipelines to ES schemaBryan Newbold2020-05-201-4/+8
* WIP on SIM pipelineBryan Newbold2020-05-191-0/+173