aboutsummaryrefslogtreecommitdiffstats
path: root/python/fatcat_tools/importers/ingest.py
Commit message (Expand)AuthorAgeFilesLines
* lint: simple, safe inline lint fixesBryan Newbold2021-11-021-6/+6
* fix missing variable in fileset ingestBryan Newbold2021-11-021-2/+1
* WIP: more fileset ingestBryan Newbold2021-10-181-13/+21
* WIP: rel fixesBryan Newbold2021-10-141-6/+6
* fileset ingest small tweaksBryan Newbold2021-10-141-21/+36
* initial implementation of fileset ingest importersBryan Newbold2021-10-141-2/+223
* new SPN web (html) importerBryan Newbold2021-10-011-26/+80
* ingest importer behavior tweaksBryan Newbold2021-10-011-8/+8
* more consistent and defensive lower-casing of DOIsBryan Newbold2021-06-231-0/+4
* ingest: swap ingest and file checks, to result in clearer stats/counts of ski...Bryan Newbold2021-06-031-2/+2
* ingest: don't accept mag and s2 URLsBryan Newbold2021-06-031-4/+4
* web ingest: terminal URL mismatch as skip, not assertBryan Newbold2020-12-301-1/+3
* ingest: allow dblp importsBryan Newbold2020-12-231-1/+1
* add dblp as an ingest source and identifierBryan Newbold2020-12-171-1/+2
* ingest: allow doaj ingest responsesBryan Newbold2020-12-171-1/+2
* html ingest: small fixes to try_update() code pathBryan Newbold2020-12-151-5/+5
* html ingest: actual xhtml mimetypeBryan Newbold2020-11-161-2/+2
* html ingest: remaining implementationBryan Newbold2020-11-061-22/+19
* ingest: progress on HTML ingestBryan Newbold2020-11-051-14/+30
* ingest: initial 'web' worker implementationBryan Newbold2020-11-051-66/+258
* ingest: whitelist -> allowlistBryan Newbold2020-11-051-3/+3
* ingest: basic checks for ingest_typeBryan Newbold2020-11-051-3/+29
* lint (flake8) tool python filesBryan Newbold2020-07-011-6/+1
* ingest importer: check that stage is consistent with releaseBryan Newbold2020-05-261-0/+5
* importers: clarify handling of ApiExceptionBryan Newbold2020-05-221-0/+1
* ingest importer: don't use glutton matchesBryan Newbold2020-05-221-3/+3
* ingest import: fix edit_extra pathBryan Newbold2020-02-181-1/+1
* ingest importer: edit_extra is a top-level keyBryan Newbold2020-02-181-1/+1
* ingest import: allow short version of corpus namesBryan Newbold2020-02-181-0/+3
* ingest importer: pass through link relBryan Newbold2020-02-181-1/+6
* check ingest_request_source existance for SPN as well as ingestBryan Newbold2020-02-061-0/+3
* additional trusted link sourcesBryan Newbold2020-02-061-0/+3
* add mag and s2 as trusted link sourcesBryan Newbold2020-02-061-1/+1
* ingest worker: handle missing ingest_request_sourceBryan Newbold2020-02-061-0/+3
* fix trivial typo in file importerBryan Newbold2020-01-201-1/+1
* ingest: improve tests, support old ingest resultsBryan Newbold2020-01-151-3/+12
* update ingest worker for schema tweaksBryan Newbold2020-01-151-8/+15
* ingest: allow more sources to auto-importBryan Newbold2020-01-151-1/+2
* importers: control update behavior with more-standard flagBryan Newbold2020-01-061-1/+1
* allow arabesque backfill ingests for some source typesBryan Newbold2019-12-241-0/+5
* fix spn/ingest importer duplication checkBryan Newbold2019-12-221-6/+8
* add ingest import file collision protectionBryan Newbold2019-12-131-0/+6
* update ingest request schemaBryan Newbold2019-12-131-2/+7
* remove default mimetype from ingest-file importerBryan Newbold2019-12-131-2/+1
* savepapernow result importerBryan Newbold2019-12-121-3/+64
* add another ingest request source to whitelistBryan Newbold2019-12-101-2/+5
* tweaks to file ingest importerBryan Newbold2019-12-031-3/+4
* re-order ingest want() for better statsBryan Newbold2019-11-151-7/+10
* project -> ingest_request_sourceBryan Newbold2019-11-151-6/+6
* ingest importer fixesBryan Newbold2019-11-151-3/+4