aboutsummaryrefslogtreecommitdiffstats
path: root/python/fatcat_tools
Commit message (Collapse)AuthorAgeFilesLines
* fix missing KafkaException harvester importsconfluent-kafkaBryan Newbold2019-04-082-2/+2
|
* convert pipeline workers from pykafka to confluent-kafkaBryan Newbold2019-04-083-109/+227
|
* small kafka tweaks for robustnessBryan Newbold2019-04-082-0/+5
|
* convert importers to confluent-kafka libraryBryan Newbold2019-04-081-19/+72
|
* bump max message size to ~20 MBytesBryan Newbold2019-04-082-0/+2
|
* fixes to confluent-kafka harvestersBryan Newbold2019-04-083-20/+21
|
* first draft harvesters using confluent-kafkaBryan Newbold2019-04-063-48/+104
|
* improve test coverageBryan Newbold2019-04-041-0/+1
|
* increase default harvest window to 14 daysBryan Newbold2019-04-011-2/+2
|
* fix cdl_dash_dat license_slugBryan Newbold2019-03-191-7/+3
|
* importer for CDL/DASH dat pilot dweb datasetsBryan Newbold2019-03-192-0/+200
|
* new importer: wayback_staticBryan Newbold2019-03-192-0/+237
|
* expose bibtex and citeproc; revert /unstable/ prefixesBryan Newbold2019-03-181-1/+1
|
* refactor and test citeproc codeBryan Newbold2019-03-182-3/+55
|
* HACK: force pylint to ignore urllib3 Retry importBryan Newbold2019-03-151-1/+3
| | | | | | As the code comment mentions, not sure why pylint throws this error. requests and urllib3 are recent, and this code runs fine in tests and QA, and pylint is running (in CI) within pipenv.
* MEDLINE/Pubmed noteBryan Newbold2019-03-151-2/+6
| | | | Also, arXivRaw, not arXiv (though see WIP on more-importers branch)
* more integration of transform refactorBryan Newbold2019-03-111-2/+2
|
* refactor transforms into sub-dirBryan Newbold2019-03-115-193/+206
|
* basic demo CSL/citeproc transform codeBryan Newbold2019-03-112-1/+166
| | | | Needs tests
* fix harvester session.get() paramsBryan Newbold2019-03-061-5/+8
|
* retry/backoff for Crossref harvesterBryan Newbold2019-03-062-2/+24
|
* 10 MByte default Kafka produce (workers)Bryan Newbold2019-03-062-2/+9
|
* elastic-release worker w/o APIBryan Newbold2019-03-041-4/+4
| | | | | Forgot that this worker really doesn't want/need any API connection at all; just an ApiClient to deserialize objects from Kafka.
* fix elastic research worker api argBryan Newbold2019-03-041-4/+3
|
* include container_id in release ES schemaBryan Newbold2019-02-221-0/+1
|
* bunch of lint/whitespace cleanupsBryan Newbold2019-02-229-19/+12
|
* better/additional crossref license lookupsBryan Newbold2019-02-141-20/+58
|
* crossref: import subtitle as str, not list[str]Bryan Newbold2019-02-141-0/+2
|
* don't print missing DOIs, just countBryan Newbold2019-02-051-1/+3
|
* add some missing LICENSE_SLUG_MAPBryan Newbold2019-02-051-1/+4
|
* fix missing in_ia_sim flag in release-to-esBryan Newbold2019-02-041-0/+2
|
* flag to control boolean cast in elastic transformsBryan Newbold2019-02-011-13/+29
| | | | So these functions can be re-used in simplified webface rendering.
* yet another required field bugBryan Newbold2019-01-291-4/+5
|
* fix null name for container (required)Bryan Newbold2019-01-291-1/+5
|
* tweaks to GROBID metadata importBryan Newbold2019-01-291-3/+2
|
* crossref import tweaks/fixesBryan Newbold2019-01-291-7/+9
| | | | | - refs: article-title not title; save unstructured; authors not author - save 'language' field (already an ISO code)
* fix bug in clean() resulting in many consistency check failsBryan Newbold2019-01-292-12/+12
|
* fix refs extra ordering bugBryan Newbold2019-01-291-6/+6
|
* pass through kwargs (fixes bezerk imports)Bryan Newbold2019-01-295-5/+10
|
* ensure raw_name is not stubBryan Newbold2019-01-291-1/+4
|
* ensure abstracts aren't stubsBryan Newbold2019-01-291-2/+3
|
* add stub parse_record() to make pylint happyBryan Newbold2019-01-281-0/+4
|
* elastic doesn't do well with nullablesBryan Newbold2019-01-281-14/+14
|
* fix title length checks in crossrefBryan Newbold2019-01-281-2/+2
|
* fix rel/url order swapBryan Newbold2019-01-281-1/+1
|
* remove accidental print in release transformBryan Newbold2019-01-281-1/+0
|
* don't allow empty or single-character clean stringsBryan Newbold2019-01-281-1/+1
|
* filter short/stub original_titleBryan Newbold2019-01-281-3/+7
|
* fix typo in container transformBryan Newbold2019-01-281-1/+1
|
* fixes to transform codeBryan Newbold2019-01-281-9/+11
|