aboutsummaryrefslogtreecommitdiffstats
path: root/python/fatcat_tools/importers/ingest.py
Commit message (Expand)AuthorAgeFilesLines
* fileset ingest: handle missing/partial file-level metadataBryan Newbold2022-04-051-3/+3
* ingest importer: improved extra/edit_extra code flowBryan Newbold2022-04-051-20/+13
* fileset ingest: remove a TODOBryan Newbold2022-04-041-1/+0
* filesets: typo bugfix, and test 'mimetype' on entity, not extraBryan Newbold2022-04-041-1/+1
* fileset ingest: fix mimetype handlingBryan Newbold2022-03-311-4/+5
* bugfix: logic flow in fileset release checkingBryan Newbold2022-03-231-3/+6
* single-file variant of fileset importer for dataset attemptsBryan Newbold2022-03-231-0/+201
* ingest fileset fixes, and some test coverageBryan Newbold2022-03-231-13/+19
* dataset ingest: JSON object fixesBryan Newbold2022-03-221-5/+5
* typing: relatively simple type check fixesBryan Newbold2021-11-031-3/+4
* typing: initial annotations on importersBryan Newbold2021-11-031-35/+46
* fmt (black): fatcat_tools/Bryan Newbold2021-11-021-319/+374
* python: isort everythingBryan Newbold2021-11-021-0/+1
* lint: simple, safe inline lint fixesBryan Newbold2021-11-021-6/+6
* fix missing variable in fileset ingestBryan Newbold2021-11-021-2/+1
* WIP: more fileset ingestBryan Newbold2021-10-181-13/+21
* WIP: rel fixesBryan Newbold2021-10-141-6/+6
* fileset ingest small tweaksBryan Newbold2021-10-141-21/+36
* initial implementation of fileset ingest importersBryan Newbold2021-10-141-2/+223
* new SPN web (html) importerBryan Newbold2021-10-011-26/+80
* ingest importer behavior tweaksBryan Newbold2021-10-011-8/+8
* more consistent and defensive lower-casing of DOIsBryan Newbold2021-06-231-0/+4
* ingest: swap ingest and file checks, to result in clearer stats/counts of ski...Bryan Newbold2021-06-031-2/+2
* ingest: don't accept mag and s2 URLsBryan Newbold2021-06-031-4/+4
* web ingest: terminal URL mismatch as skip, not assertBryan Newbold2020-12-301-1/+3
* ingest: allow dblp importsBryan Newbold2020-12-231-1/+1
* add dblp as an ingest source and identifierBryan Newbold2020-12-171-1/+2
* ingest: allow doaj ingest responsesBryan Newbold2020-12-171-1/+2
* html ingest: small fixes to try_update() code pathBryan Newbold2020-12-151-5/+5
* html ingest: actual xhtml mimetypeBryan Newbold2020-11-161-2/+2
* html ingest: remaining implementationBryan Newbold2020-11-061-22/+19
* ingest: progress on HTML ingestBryan Newbold2020-11-051-14/+30
* ingest: initial 'web' worker implementationBryan Newbold2020-11-051-66/+258
* ingest: whitelist -> allowlistBryan Newbold2020-11-051-3/+3
* ingest: basic checks for ingest_typeBryan Newbold2020-11-051-3/+29
* lint (flake8) tool python filesBryan Newbold2020-07-011-6/+1
* ingest importer: check that stage is consistent with releaseBryan Newbold2020-05-261-0/+5
* importers: clarify handling of ApiExceptionBryan Newbold2020-05-221-0/+1
* ingest importer: don't use glutton matchesBryan Newbold2020-05-221-3/+3
* ingest import: fix edit_extra pathBryan Newbold2020-02-181-1/+1
* ingest importer: edit_extra is a top-level keyBryan Newbold2020-02-181-1/+1
* ingest import: allow short version of corpus namesBryan Newbold2020-02-181-0/+3
* ingest importer: pass through link relBryan Newbold2020-02-181-1/+6
* check ingest_request_source existance for SPN as well as ingestBryan Newbold2020-02-061-0/+3
* additional trusted link sourcesBryan Newbold2020-02-061-0/+3
* add mag and s2 as trusted link sourcesBryan Newbold2020-02-061-1/+1
* ingest worker: handle missing ingest_request_sourceBryan Newbold2020-02-061-0/+3
* fix trivial typo in file importerBryan Newbold2020-01-201-1/+1
* ingest: improve tests, support old ingest resultsBryan Newbold2020-01-151-3/+12
* update ingest worker for schema tweaksBryan Newbold2020-01-151-8/+15