aboutsummaryrefslogtreecommitdiffstats
path: root/python/fatcat_tools
Commit message (Expand)AuthorAgeFilesLines
* ES schema: add best_url to file schemaBryan Newbold2020-06-041-0/+12
* harvest: fail on HTTP 400Martin Czygan2020-05-291-4/+0
* Merge branch 'bnewbold-ingest-stage' into 'master'Martin Czygan2020-05-281-0/+5
|\
| * ingest importer: check that stage is consistent with releaseBryan Newbold2020-05-261-0/+5
* | rename HarvestState.next() to HarvestState.next_span()Bryan Newbold2020-05-264-5/+5
|/
* HACK: skip pylint errors on lines that seem to be fineBryan Newbold2020-05-223-3/+3
* Merge remote-tracking branch 'github/master'Bryan Newbold2020-05-221-2/+2
|\
| * Indentity is not the same this as equality in PythonChristian Clauss2020-05-141-2/+2
* | importers: clarify handling of ApiExceptionBryan Newbold2020-05-223-4/+10
* | ingest importer: don't use glutton matchesBryan Newbold2020-05-221-3/+3
* | datacite: fix type errorMartin Czygan2020-04-221-1/+3
* | Merge branch 'martin-datacite-fix-release-contrib-raw-name-check-violation' i...bnewbold2020-04-201-0/+8
|\ \
| * | datacite: fix a raw name constraint violationMartin Czygan2020-04-201-0/+8
| |/
* | more changelog ES fixesBryan Newbold2020-04-171-4/+6
* | ES changelog worker: fixes for ident; fetch update from API if neededBryan Newbold2020-04-171-2/+9
|/
* Merge branch 'bnewbold-py37-cleanups' into 'master'bnewbold2020-04-172-6/+6
|\
| * consistently use raw string prefix for regexBryan Newbold2020-04-172-6/+6
* | Merge branch 'martin-changelog-to-es' into 'master'bnewbold2020-04-172-2/+23
|\ \
| * | derive changelog worker from release workerMartin Czygan2020-04-172-2/+23
| |/
* | changelog: limit typesMartin Czygan2020-04-161-5/+1
* | changelog: extend release_types considered documentsMartin Czygan2020-04-161-10/+19
|/
* Merge branch 'bnewbold-pubmed-get_text' into 'master'bnewbold2020-04-014-39/+47
|\
| * pubmed: use untranslated title if translated not availableBryan Newbold2020-04-011-0/+6
| * importers: replace newlines in get_text() stringsBryan Newbold2020-04-014-23/+25
| * importers: more string/get_text swapsBryan Newbold2020-03-283-27/+27
| * pubmed: bunch of .get_text() instead of .stringBryan Newbold2020-03-281-12/+12
* | crossref: switch from index-date to update-dateBryan Newbold2020-03-301-1/+1
* | crossref: longer comment about crossref API date fieldsBryan Newbold2020-03-301-2/+22
|/
* ingest: more DOI patterns to treat as OABryan Newbold2020-03-281-0/+26
* Merge pull request #53 from EdwardBetts/spellingbnewbold2020-03-274-9/+9
|\
| * Correct spelling mistakesEdward Betts2020-03-274-9/+9
* | Merge branch 'bnewbold-citeproc-fixes' into 'master'bnewbold2020-03-261-6/+12
|\ \
| * | improve citeproc/CSL web interfaceBryan Newbold2020-03-251-6/+12
* | | datacite: nameIdentifier corner caseBryan Newbold2020-03-261-1/+2
|/ /
* | jalc: avoid meaningless pages valuesBryan Newbold2020-03-231-4/+8
* | datacite: add year sanity restrictionsbnewbold2020-03-231-0/+7
* | pubmed: handle multiple ReferenceListBryan Newbold2020-03-201-1/+4
* | pubmed: update many more metadata fieldsBryan Newbold2020-03-191-0/+22
* | crossref: skip stub OUP titleBryan Newbold2020-03-191-0/+8
* | ingest: always try some lancet journalsBryan Newbold2020-03-191-0/+3
* | Merge branch 'martin-kafka-bs4-import' into 'master'Martin Czygan2020-03-105-22/+318
|\ \ | |/ |/|
| * common: use smaller batch size since XML parsing may be slowMartin Czygan2020-03-101-1/+1
| * pubmed: log to stderrMartin Czygan2020-03-101-1/+1
| * pubmed: move mapping generation out of fetch_dateMartin Czygan2020-03-101-7/+8
| * harvest: fix imports from HarvestPubmedWorker cleanupMartin Czygan2020-03-101-2/+2
| * pubmed: citations is a bit more preciseMartin Czygan2020-03-091-1/+1
| * pubmed: we sync from FTPMartin Czygan2020-03-091-1/+1
| * oaipmh: HarvestPubmedWorker obsoleted by PubmedFTPWorkerMartin Czygan2020-03-091-34/+0
| * more pubmed adjustmentsMartin Czygan2020-02-222-70/+118
| * pubmed ftp: fix urlMartin Czygan2020-02-191-4/+6