summaryrefslogtreecommitdiffstats
Commit message (Expand)AuthorAgeFilesLines
* importers: replace newlines in get_text() stringsBryan Newbold2020-04-014-23/+25
* importers: more string/get_text swapsBryan Newbold2020-03-283-27/+27
* pubmed: bunch of .get_text() instead of .stringBryan Newbold2020-03-281-12/+12
* ingest: more DOI patterns to treat as OABryan Newbold2020-03-281-0/+26
* update CONTRIBUTORSBryan Newbold2020-03-271-2/+15
* Merge pull request #53 from EdwardBetts/spellingbnewbold2020-03-2713-20/+20
|\
| * Correct spelling mistakesEdward Betts2020-03-2713-20/+20
* | Merge branch 'bnewbold-400-bad-revisions' into 'master'Martin Czygan2020-03-262-2/+14
|\ \
| * | catch ApiValueError in some generic API callsBryan Newbold2020-03-252-2/+14
* | | Merge branch 'bnewbold-citeproc-fixes' into 'master'bnewbold2020-03-265-22/+72
|\ \ \
| * | | improve citeproc/CSL web interfaceBryan Newbold2020-03-255-22/+72
* | | | datacite: nameIdentifier corner caseBryan Newbold2020-03-261-1/+2
* | | | api spec: fix a typoMartin Czygan2020-03-261-1/+1
| |/ / |/| |
* | | Merge branch 'martin-pubmed-bulk-edit-notes' into 'master'Martin Czygan2020-03-241-2/+22
|\ \ \ | |/ / |/| |
| * | notes: pubmed backfill (03/2020)Martin Czygan2020-03-241-2/+22
|/ /
* | cleanup unused code in fatcat_harvest.pyBryan Newbold2020-03-231-7/+0
* | jalc: avoid meaningless pages valuesBryan Newbold2020-03-231-4/+8
* | Merge branch 'bnewbold-datacite-year-limits' into 'master'Martin Czygan2020-03-231-0/+7
|\ \
| * | datacite: add year sanity restrictionsbnewbold2020-03-231-0/+7
|/ /
* | notes on arxiv+pubmed backfillBryan Newbold2020-03-201-0/+37
* | pubmed: handle multiple ReferenceListBryan Newbold2020-03-203-1/+222
* | pubmed: update many more metadata fieldsBryan Newbold2020-03-191-0/+22
* | crossref: skip stub OUP titleBryan Newbold2020-03-191-0/+8
* | ingest: always try some lancet journalsBryan Newbold2020-03-191-0/+3
* | Merge branch 'martin-lookup-by-identifier-issn-link' into 'master'bnewbold2020-03-181-4/+3
|\ \
| * | container lookup: link to issn portal searchMartin Czygan2020-03-181-4/+3
|/ /
* | Merge branch 'bnewbold-update-stats' into 'master'Martin Czygan2020-03-181-3/+3
|\ \
| * | update front-page statsBryan Newbold2020-03-171-3/+3
|/ /
* | bulk exports README different from SQL READMEBryan Newbold2020-03-171-1/+1
* | Merge branch 'martin-kafka-bs4-import' into 'master'Martin Czygan2020-03-1010-43/+428
|\ \
| * | common: use smaller batch size since XML parsing may be slowMartin Czygan2020-03-101-1/+1
| * | pubmed: log to stderrMartin Czygan2020-03-101-1/+1
| * | pubmed: move mapping generation out of fetch_dateMartin Czygan2020-03-102-7/+10
| * | harvest: fix imports from HarvestPubmedWorker cleanupMartin Czygan2020-03-102-4/+4
| * | pubmed: citations is a bit more preciseMartin Czygan2020-03-091-1/+1
| * | pubmed: we sync from FTPMartin Czygan2020-03-091-1/+1
| * | oaipmh: HarvestPubmedWorker obsoleted by PubmedFTPWorkerMartin Czygan2020-03-091-34/+0
| * | fatcat_import: address potential hanging, if stdin is emptyMartin Czygan2020-03-091-0/+2
| * | more pubmed adjustmentsMartin Czygan2020-02-226-71/+197
| * | pubmed ftp: fix urlMartin Czygan2020-02-191-4/+6
| * | pubmed ftp harvest and KafkaBs4XmlPusherMartin Czygan2020-02-196-21/+307
* | | add --force-crawl flag to ingest toolBryan Newbold2020-03-021-0/+5
| |/ |/|
* | pipenv: lock authlib to less than v0.13; rebuild lock fileBryan Newbold2020-02-282-112/+109
* | ES README: really need to limit to 1k esbulk batchesBryan Newbold2020-02-261-3/+3
* | Merge branch 'bnewbold-elastic-v03b'Bryan Newbold2020-02-2616-257/+674
|\ \
| * | improve is_oa flag accuracyBryan Newbold2020-02-262-10/+6
| * | update ES transform READMEBryan Newbold2020-02-261-2/+3
| * | fix fatcat_transform state filtersBryan Newbold2020-02-261-4/+4
| * | bulk ES transform: skip non-active entitiesBryan Newbold2020-02-261-0/+8
| * | ES container last tweaksBryan Newbold2020-02-262-3/+7