summaryrefslogtreecommitdiffstats
path: root/python/fatcat_tools/harvest
Commit message (Expand)AuthorAgeFilesLines
* re-fmt all the fatcat_tools __init__ files for readabilityBryan Newbold2021-11-021-4/+2
* pubmed: switch default http site to retrieve update filesMartin Czygan2021-10-151-2/+4
* pubmed: workaround a networking issueMartin Czygan2021-09-091-24/+21
* pubmed: add option to ftp download with lftpMartin Czygan2021-09-081-2/+31
* pubmed harvester: add basic retry logicMartin Czygan2021-08-201-8/+21
* pubmed: update docsMartin Czygan2021-07-171-2/+3
* pubmed: do not fail when accessing missing fileMartin Czygan2021-07-171-2/+8
* pubmed: reconnect on errorMartin Czygan2021-07-161-4/+30
* small python lint fixes (no behavior change)Bryan Newbold2021-05-251-1/+1
* harvest: datacite API yields HTTP 200 with broken JSONMartin Czygan2020-08-101-1/+8
* arxiv: do retry five times of HTTP 503Martin Czygan2020-07-101-1/+1
* lint (flake8) tool python filesBryan Newbold2020-07-014-19/+6
* harvest: fail on HTTP 400Martin Czygan2020-05-291-4/+0
* rename HarvestState.next() to HarvestState.next_span()Bryan Newbold2020-05-264-5/+5
* HACK: skip pylint errors on lines that seem to be fineBryan Newbold2020-05-223-3/+3
* crossref: switch from index-date to update-dateBryan Newbold2020-03-301-1/+1
* crossref: longer comment about crossref API date fieldsBryan Newbold2020-03-301-2/+22
* Merge pull request #53 from EdwardBetts/spellingbnewbold2020-03-271-2/+2
|\
| * Correct spelling mistakesEdward Betts2020-03-271-2/+2
* | pubmed: log to stderrMartin Czygan2020-03-101-1/+1
* | pubmed: move mapping generation out of fetch_dateMartin Czygan2020-03-101-7/+8
* | harvest: fix imports from HarvestPubmedWorker cleanupMartin Czygan2020-03-101-2/+2
* | pubmed: citations is a bit more preciseMartin Czygan2020-03-091-1/+1
* | pubmed: we sync from FTPMartin Czygan2020-03-091-1/+1
* | oaipmh: HarvestPubmedWorker obsoleted by PubmedFTPWorkerMartin Czygan2020-03-091-34/+0
* | more pubmed adjustmentsMartin Czygan2020-02-222-70/+118
* | pubmed ftp: fix urlMartin Czygan2020-02-191-4/+6
* | pubmed ftp harvest and KafkaBs4XmlPusherMartin Czygan2020-02-192-0/+214
|/
* harvest: log state on startup and use stderr for diagnosticsMartin Czygan2020-02-143-17/+22
* datacite: extend range search queryMartin Czygan2019-12-271-1/+1
* avoid usage of short linksMartin Czygan2019-12-271-2/+2
* Datacite API v2 throws 400, we cannot recover from, currently.Martin Czygan2019-12-271-0/+4
* datacite: update documentation, add links to issuesMartin Czygan2019-12-271-10/+5
* datacite: use v2 of the API (flaky)Martin Czygan2019-12-271-5/+28
* refactor kafka producer in crossref harvesterBryan Newbold2019-12-061-21/+26
* crossref is_update isn't what I thoughtBryan Newbold2019-12-031-6/+2
* review/fix all confluent-kafka produce codeBryan Newbold2019-09-203-14/+49
* small fixes to confluent-kafka importers/workersBryan Newbold2019-09-202-2/+2
* small kafka tweaks for robustnessBryan Newbold2019-09-201-0/+2
* bump max message size to ~20 MBytesBryan Newbold2019-09-202-0/+2
* fixes to confluent-kafka harvestersBryan Newbold2019-09-203-20/+21
* first draft harvesters using confluent-kafkaBryan Newbold2019-09-203-48/+104
* increase default harvest window to 14 daysBryan Newbold2019-04-011-2/+2
* HACK: force pylint to ignore urllib3 Retry importBryan Newbold2019-03-151-1/+3
* MEDLINE/Pubmed noteBryan Newbold2019-03-151-2/+6
* fix harvester session.get() paramsBryan Newbold2019-03-061-5/+8
* retry/backoff for Crossref harvesterBryan Newbold2019-03-062-2/+24
* bunch of lint/whitespace cleanupsBryan Newbold2019-02-223-9/+6
* check request status codes idiomaticallyBryan Newbold2018-12-291-2/+2
* clean up harvester comments/docsBryan Newbold2018-11-213-50/+31