summaryrefslogtreecommitdiffstats
path: root/python/fatcat_tools
Commit message (Expand)AuthorAgeFilesLines
* elastic release schema updateBryan Newbold2019-05-201-2/+5
* improved CSL transform (structured author names)Bryan Newbold2019-05-201-12/+11
* make some XXX into TODOBryan Newbold2019-05-201-2/+2
* fix elastic file pdf checkBryan Newbold2019-05-161-1/+3
* elastic transforms: work around missing pdf mimetypesBryan Newbold2019-05-151-1/+1
* fix default mimetype (impacted pre-1923 files)Bryan Newbold2019-05-152-4/+9
* python implBryan Newbold2019-05-149-32/+38
* python implBryan Newbold2019-05-146-16/+16
* python: impl size_bytes -> sizeBryan Newbold2019-05-131-1/+1
* importer code updatesBryan Newbold2019-05-134-3/+18
* partial python impl of ext_id and release_stage refactorsBryan Newbold2019-05-135-29/+35
* handle null abstracts for releaseBryan Newbold2019-05-071-1/+1
* add limits to match importersBryan Newbold2019-04-233-2/+27
* archive.org isn't really a repositoryBryan Newbold2019-04-221-1/+3
* editgroup description overrideBryan Newbold2019-04-221-2/+2
* arabesque importer does require timestamp/waybackBryan Newbold2019-04-221-0/+3
* matched importer shouldn't require waybackBryan Newbold2019-04-221-5/+7
* handle API 400 in arabesque import (invalid extid)Bryan Newbold2019-04-191-7/+14
* fix arabesque importer crawl_id None bugBryan Newbold2019-04-181-1/+1
* mechanism to not double-update entitiesBryan Newbold2019-04-182-1/+9
* minor arabesque tweaksBryan Newbold2019-04-181-0/+2
* update URL rel listBryan Newbold2019-04-181-1/+10
* arabesque importer does fewer updatesBryan Newbold2019-04-181-1/+8
* arabesque importerBryan Newbold2019-04-181-0/+165
* early version of arabesque importerBryan Newbold2019-04-121-0/+1
* add SqlitePusher importer optionBryan Newbold2019-04-122-1/+21
* fix reviewer bugs (thanks pylint)Bryan Newbold2019-04-061-3/+3
* basic dummy review botBryan Newbold2019-04-062-0/+239
* improve test coverageBryan Newbold2019-04-041-0/+1
* increase default harvest window to 14 daysBryan Newbold2019-04-011-2/+2
* fix cdl_dash_dat license_slugBryan Newbold2019-03-191-7/+3
* importer for CDL/DASH dat pilot dweb datasetsBryan Newbold2019-03-192-0/+200
* new importer: wayback_staticBryan Newbold2019-03-192-0/+237
* expose bibtex and citeproc; revert /unstable/ prefixesBryan Newbold2019-03-181-1/+1
* refactor and test citeproc codeBryan Newbold2019-03-182-3/+55
* HACK: force pylint to ignore urllib3 Retry importBryan Newbold2019-03-151-1/+3
* MEDLINE/Pubmed noteBryan Newbold2019-03-151-2/+6
* more integration of transform refactorBryan Newbold2019-03-111-2/+2
* refactor transforms into sub-dirBryan Newbold2019-03-115-193/+206
* basic demo CSL/citeproc transform codeBryan Newbold2019-03-112-1/+166
* fix harvester session.get() paramsBryan Newbold2019-03-061-5/+8
* retry/backoff for Crossref harvesterBryan Newbold2019-03-062-2/+24
* 10 MByte default Kafka produce (workers)Bryan Newbold2019-03-062-2/+9
* elastic-release worker w/o APIBryan Newbold2019-03-041-4/+4
* fix elastic research worker api argBryan Newbold2019-03-041-4/+3
* include container_id in release ES schemaBryan Newbold2019-02-221-0/+1
* bunch of lint/whitespace cleanupsBryan Newbold2019-02-229-19/+12
* better/additional crossref license lookupsBryan Newbold2019-02-141-20/+58
* crossref: import subtitle as str, not list[str]Bryan Newbold2019-02-141-0/+2
* don't print missing DOIs, just countBryan Newbold2019-02-051-1/+3