index
:
fatcat
bnewbold-doaj-article-harvest
bnewbold-elastic-extras
bnewbold-openapi-client-generator-v601
bnewbold-pythonclient-types
bnewbold-redoc
bnewbold-rust-gen-v5
bnewbold-sitemap
bnewbold-ubuntu-jammy
cockroach
confluent-kafka
master
preview
x-attic-auth-other-macaroon-lib
x-attic-camp
x-attic-changelog-export
x-attic-chocula
x-attic-cockroach
x-attic-golang
x-attic-more-importers
x-attic-preview
x-attic-python-rust-hacks
[no description]
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
path:
root
/
python
/
fatcat_tools
/
harvest
Commit message (
Expand
)
Author
Age
Files
Lines
*
pubmed: switch default http site to retrieve update files
Martin Czygan
2021-10-15
1
-2
/
+4
*
pubmed: workaround a networking issue
Martin Czygan
2021-09-09
1
-24
/
+21
*
pubmed: add option to ftp download with lftp
Martin Czygan
2021-09-08
1
-2
/
+31
*
pubmed harvester: add basic retry logic
Martin Czygan
2021-08-20
1
-8
/
+21
*
pubmed: update docs
Martin Czygan
2021-07-17
1
-2
/
+3
*
pubmed: do not fail when accessing missing file
Martin Czygan
2021-07-17
1
-2
/
+8
*
pubmed: reconnect on error
Martin Czygan
2021-07-16
1
-4
/
+30
*
small python lint fixes (no behavior change)
Bryan Newbold
2021-05-25
1
-1
/
+1
*
harvest: datacite API yields HTTP 200 with broken JSON
Martin Czygan
2020-08-10
1
-1
/
+8
*
arxiv: do retry five times of HTTP 503
Martin Czygan
2020-07-10
1
-1
/
+1
*
lint (flake8) tool python files
Bryan Newbold
2020-07-01
4
-19
/
+6
*
harvest: fail on HTTP 400
Martin Czygan
2020-05-29
1
-4
/
+0
*
rename HarvestState.next() to HarvestState.next_span()
Bryan Newbold
2020-05-26
4
-5
/
+5
*
HACK: skip pylint errors on lines that seem to be fine
Bryan Newbold
2020-05-22
3
-3
/
+3
*
crossref: switch from index-date to update-date
Bryan Newbold
2020-03-30
1
-1
/
+1
*
crossref: longer comment about crossref API date fields
Bryan Newbold
2020-03-30
1
-2
/
+22
*
Merge pull request #53 from EdwardBetts/spelling
bnewbold
2020-03-27
1
-2
/
+2
|
\
|
*
Correct spelling mistakes
Edward Betts
2020-03-27
1
-2
/
+2
*
|
pubmed: log to stderr
Martin Czygan
2020-03-10
1
-1
/
+1
*
|
pubmed: move mapping generation out of fetch_date
Martin Czygan
2020-03-10
1
-7
/
+8
*
|
harvest: fix imports from HarvestPubmedWorker cleanup
Martin Czygan
2020-03-10
1
-2
/
+2
*
|
pubmed: citations is a bit more precise
Martin Czygan
2020-03-09
1
-1
/
+1
*
|
pubmed: we sync from FTP
Martin Czygan
2020-03-09
1
-1
/
+1
*
|
oaipmh: HarvestPubmedWorker obsoleted by PubmedFTPWorker
Martin Czygan
2020-03-09
1
-34
/
+0
*
|
more pubmed adjustments
Martin Czygan
2020-02-22
2
-70
/
+118
*
|
pubmed ftp: fix url
Martin Czygan
2020-02-19
1
-4
/
+6
*
|
pubmed ftp harvest and KafkaBs4XmlPusher
Martin Czygan
2020-02-19
2
-0
/
+214
|
/
*
harvest: log state on startup and use stderr for diagnostics
Martin Czygan
2020-02-14
3
-17
/
+22
*
datacite: extend range search query
Martin Czygan
2019-12-27
1
-1
/
+1
*
avoid usage of short links
Martin Czygan
2019-12-27
1
-2
/
+2
*
Datacite API v2 throws 400, we cannot recover from, currently.
Martin Czygan
2019-12-27
1
-0
/
+4
*
datacite: update documentation, add links to issues
Martin Czygan
2019-12-27
1
-10
/
+5
*
datacite: use v2 of the API (flaky)
Martin Czygan
2019-12-27
1
-5
/
+28
*
refactor kafka producer in crossref harvester
Bryan Newbold
2019-12-06
1
-21
/
+26
*
crossref is_update isn't what I thought
Bryan Newbold
2019-12-03
1
-6
/
+2
*
review/fix all confluent-kafka produce code
Bryan Newbold
2019-09-20
3
-14
/
+49
*
small fixes to confluent-kafka importers/workers
Bryan Newbold
2019-09-20
2
-2
/
+2
*
small kafka tweaks for robustness
Bryan Newbold
2019-09-20
1
-0
/
+2
*
bump max message size to ~20 MBytes
Bryan Newbold
2019-09-20
2
-0
/
+2
*
fixes to confluent-kafka harvesters
Bryan Newbold
2019-09-20
3
-20
/
+21
*
first draft harvesters using confluent-kafka
Bryan Newbold
2019-09-20
3
-48
/
+104
*
increase default harvest window to 14 days
Bryan Newbold
2019-04-01
1
-2
/
+2
*
HACK: force pylint to ignore urllib3 Retry import
Bryan Newbold
2019-03-15
1
-1
/
+3
*
MEDLINE/Pubmed note
Bryan Newbold
2019-03-15
1
-2
/
+6
*
fix harvester session.get() params
Bryan Newbold
2019-03-06
1
-5
/
+8
*
retry/backoff for Crossref harvester
Bryan Newbold
2019-03-06
2
-2
/
+24
*
bunch of lint/whitespace cleanups
Bryan Newbold
2019-02-22
3
-9
/
+6
*
check request status codes idiomatically
Bryan Newbold
2018-12-29
1
-2
/
+2
*
clean up harvester comments/docs
Bryan Newbold
2018-11-21
3
-50
/
+31
*
use isoformat() to format dates
Bryan Newbold
2018-11-21
2
-4
/
+4
[next]