summaryrefslogtreecommitdiffstats
path: root/python/fatcat_tools/harvest
Commit message (Collapse)AuthorAgeFilesLines
* crossref is_update isn't what I thoughtBryan Newbold2019-12-031-6/+2
| | | | | | | | I thought this would filter for metadata updates to an existing DOI, but actually "updates" are a type of DOI (eg, a retraction). TODO: handle 'updates' field. Should both do a lookup and set work_ident appropriately, and store in crossref-specific metadata.
* review/fix all confluent-kafka produce codeBryan Newbold2019-09-203-14/+49
|
* small fixes to confluent-kafka importers/workersBryan Newbold2019-09-202-2/+2
| | | | | | | | - decrease default changelog pipeline to 5.0sec - fix missing KafkaException harvester imports - more confluent-kafka tweaks - updates to kafka consumer configs - bump elastic updates consumergroup (again)
* small kafka tweaks for robustnessBryan Newbold2019-09-201-0/+2
|
* bump max message size to ~20 MBytesBryan Newbold2019-09-202-0/+2
|
* fixes to confluent-kafka harvestersBryan Newbold2019-09-203-20/+21
|
* first draft harvesters using confluent-kafkaBryan Newbold2019-09-203-48/+104
|
* increase default harvest window to 14 daysBryan Newbold2019-04-011-2/+2
|
* HACK: force pylint to ignore urllib3 Retry importBryan Newbold2019-03-151-1/+3
| | | | | | As the code comment mentions, not sure why pylint throws this error. requests and urllib3 are recent, and this code runs fine in tests and QA, and pylint is running (in CI) within pipenv.
* MEDLINE/Pubmed noteBryan Newbold2019-03-151-2/+6
| | | | Also, arXivRaw, not arXiv (though see WIP on more-importers branch)
* fix harvester session.get() paramsBryan Newbold2019-03-061-5/+8
|
* retry/backoff for Crossref harvesterBryan Newbold2019-03-062-2/+24
|
* bunch of lint/whitespace cleanupsBryan Newbold2019-02-223-9/+6
|
* check request status codes idiomaticallyBryan Newbold2018-12-291-2/+2
|
* clean up harvester comments/docsBryan Newbold2018-11-213-50/+31
|
* use isoformat() to format datesBryan Newbold2018-11-212-4/+4
| | | | This shouldn't change behavior; it's just more consistent.
* fix loop_sleep typoBryan Newbold2018-11-212-2/+2
|
* fix datacite DOI extractionBryan Newbold2018-11-211-1/+1
|
* fix OAI-PMH name/finished messageBryan Newbold2018-11-211-1/+6
|
* fix oai-pmh issue againBryan Newbold2018-11-211-13/+14
|
* oaipmh: handle NoRecordsMatchBryan Newbold2018-11-211-5/+8
|
* initial OAI-PMH harvestersBryan Newbold2018-11-193-5/+167
|
* better DOI registrar harvestersBryan Newbold2018-11-193-48/+145
|
* bunch of pylint cleanupBryan Newbold2018-11-151-7/+12
|
* refactoring harvestersBryan Newbold2018-11-155-196/+210
|
* initial work on metadata harvest botsBryan Newbold2018-11-144-0/+197