summaryrefslogtreecommitdiffstats
path: root/python/fatcat_tools
Commit message (Collapse)AuthorAgeFilesLines
* initial arxivraw importer (from parser)Bryan Newbold2019-05-212-0/+299
|
* clean up JALC importer a tiny bitBryan Newbold2019-05-211-8/+3
|
* initial JSTOR importerBryan Newbold2019-05-212-0/+271
|
* initial flesh out of JALC parserBryan Newbold2019-05-213-1/+348
|
* include creator_ids in release elastic schemaBryan Newbold2019-05-201-0/+6
| | | | Intent is to allow fast creator search/lookup
* include structured contrib names in CDL/dash importerBryan Newbold2019-05-201-2/+2
|
* elastic release schema updateBryan Newbold2019-05-201-2/+5
|
* improved CSL transform (structured author names)Bryan Newbold2019-05-201-12/+11
|
* make some XXX into TODOBryan Newbold2019-05-201-2/+2
|
* fix elastic file pdf checkBryan Newbold2019-05-161-1/+3
|
* elastic transforms: work around missing pdf mimetypesBryan Newbold2019-05-151-1/+1
|
* fix default mimetype (impacted pre-1923 files)Bryan Newbold2019-05-152-4/+9
|
* python implBryan Newbold2019-05-149-32/+38
|
* python implBryan Newbold2019-05-146-16/+16
|
* python: impl size_bytes -> sizeBryan Newbold2019-05-131-1/+1
|
* importer code updatesBryan Newbold2019-05-134-3/+18
|
* partial python impl of ext_id and release_stage refactorsBryan Newbold2019-05-135-29/+35
|
* handle null abstracts for releaseBryan Newbold2019-05-071-1/+1
|
* add limits to match importersBryan Newbold2019-04-233-2/+27
|
* archive.org isn't really a repositoryBryan Newbold2019-04-221-1/+3
|
* editgroup description overrideBryan Newbold2019-04-221-2/+2
|
* arabesque importer does require timestamp/waybackBryan Newbold2019-04-221-0/+3
|
* matched importer shouldn't require waybackBryan Newbold2019-04-221-5/+7
|
* handle API 400 in arabesque import (invalid extid)Bryan Newbold2019-04-191-7/+14
|
* fix arabesque importer crawl_id None bugBryan Newbold2019-04-181-1/+1
|
* mechanism to not double-update entitiesBryan Newbold2019-04-182-1/+9
|
* minor arabesque tweaksBryan Newbold2019-04-181-0/+2
|
* update URL rel listBryan Newbold2019-04-181-1/+10
|
* arabesque importer does fewer updatesBryan Newbold2019-04-181-1/+8
|
* arabesque importerBryan Newbold2019-04-181-0/+165
|
* early version of arabesque importerBryan Newbold2019-04-121-0/+1
|
* add SqlitePusher importer optionBryan Newbold2019-04-122-1/+21
|
* fix reviewer bugs (thanks pylint)Bryan Newbold2019-04-061-3/+3
|
* basic dummy review botBryan Newbold2019-04-062-0/+239
|
* improve test coverageBryan Newbold2019-04-041-0/+1
|
* increase default harvest window to 14 daysBryan Newbold2019-04-011-2/+2
|
* fix cdl_dash_dat license_slugBryan Newbold2019-03-191-7/+3
|
* importer for CDL/DASH dat pilot dweb datasetsBryan Newbold2019-03-192-0/+200
|
* new importer: wayback_staticBryan Newbold2019-03-192-0/+237
|
* expose bibtex and citeproc; revert /unstable/ prefixesBryan Newbold2019-03-181-1/+1
|
* refactor and test citeproc codeBryan Newbold2019-03-182-3/+55
|
* HACK: force pylint to ignore urllib3 Retry importBryan Newbold2019-03-151-1/+3
| | | | | | As the code comment mentions, not sure why pylint throws this error. requests and urllib3 are recent, and this code runs fine in tests and QA, and pylint is running (in CI) within pipenv.
* MEDLINE/Pubmed noteBryan Newbold2019-03-151-2/+6
| | | | Also, arXivRaw, not arXiv (though see WIP on more-importers branch)
* more integration of transform refactorBryan Newbold2019-03-111-2/+2
|
* refactor transforms into sub-dirBryan Newbold2019-03-115-193/+206
|
* basic demo CSL/citeproc transform codeBryan Newbold2019-03-112-1/+166
| | | | Needs tests
* fix harvester session.get() paramsBryan Newbold2019-03-061-5/+8
|
* retry/backoff for Crossref harvesterBryan Newbold2019-03-062-2/+24
|
* 10 MByte default Kafka produce (workers)Bryan Newbold2019-03-062-2/+9
|
* elastic-release worker w/o APIBryan Newbold2019-03-041-4/+4
| | | | | Forgot that this worker really doesn't want/need any API connection at all; just an ApiClient to deserialize objects from Kafka.