summaryrefslogtreecommitdiffstats
path: root/python/fatcat_tools/transforms/ingest.py
Commit message (Collapse)AuthorAgeFilesLines
* more consistent and defensive lower-casing of DOIsBryan Newbold2021-06-231-2/+2
| | | | | | | After noticing more upper/lower ambiguity in production. In particular, we have some old ingest requests in sandcrawler DB, which get re-submitted/re-tried, which have capitalized DOIs in the link source id field.
* ingest: add per-container ingest type overridesBryan Newbold2021-05-211-1/+17
|
* ingest tool: support for setting ingest typeBryan Newbold2020-11-061-6/+6
|
* lint (flake8) tool python filesBryan Newbold2020-07-011-1/+0
|
* default to PMC ingest URLs over DOIBryan Newbold2020-02-041-4/+4
| | | | | | | For cases where there might be both PMC and DOI urls, do the europmc.org PMC ones over DOI option. May want to turn this into a config or command-line option in the future.
* remove 'oa_only' feature from ingest transformBryan Newbold2020-01-281-14/+1
| | | | Refactoring to move this filter elsewhere
* transform ingests via pmc/pmcid, not pubmed/pmidBryan Newbold2019-12-241-4/+4
|
* update ingest request schemaBryan Newbold2019-12-131-5/+22
| | | | | This is mostly changing ingest_type from 'file' to 'pdf', and adding 'link_source'/'link_source_id', plus some small cleanups.
* tweaks to ingest-file transformBryan Newbold2019-12-121-13/+7
|
* project -> ingest_request_sourceBryan Newbold2019-11-151-2/+2
|
* fix release.pmcid typoBryan Newbold2019-11-151-2/+2
|
* more ingest importer comments and countsBryan Newbold2019-11-151-1/+1
|
* add ingest request transform (and test)Bryan Newbold2019-11-151-0/+66