Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | fix missing KafkaException harvester importsconfluent-kafka | Bryan Newbold | 2019-04-08 | 2 | -2/+2 |
| | |||||
* | convert pipeline workers from pykafka to confluent-kafka | Bryan Newbold | 2019-04-08 | 3 | -109/+227 |
| | |||||
* | small kafka tweaks for robustness | Bryan Newbold | 2019-04-08 | 2 | -0/+5 |
| | |||||
* | convert importers to confluent-kafka library | Bryan Newbold | 2019-04-08 | 1 | -19/+72 |
| | |||||
* | bump max message size to ~20 MBytes | Bryan Newbold | 2019-04-08 | 2 | -0/+2 |
| | |||||
* | fixes to confluent-kafka harvesters | Bryan Newbold | 2019-04-08 | 3 | -20/+21 |
| | |||||
* | first draft harvesters using confluent-kafka | Bryan Newbold | 2019-04-06 | 3 | -48/+104 |
| | |||||
* | improve test coverage | Bryan Newbold | 2019-04-04 | 1 | -0/+1 |
| | |||||
* | increase default harvest window to 14 days | Bryan Newbold | 2019-04-01 | 1 | -2/+2 |
| | |||||
* | fix cdl_dash_dat license_slug | Bryan Newbold | 2019-03-19 | 1 | -7/+3 |
| | |||||
* | importer for CDL/DASH dat pilot dweb datasets | Bryan Newbold | 2019-03-19 | 2 | -0/+200 |
| | |||||
* | new importer: wayback_static | Bryan Newbold | 2019-03-19 | 2 | -0/+237 |
| | |||||
* | expose bibtex and citeproc; revert /unstable/ prefixes | Bryan Newbold | 2019-03-18 | 1 | -1/+1 |
| | |||||
* | refactor and test citeproc code | Bryan Newbold | 2019-03-18 | 2 | -3/+55 |
| | |||||
* | HACK: force pylint to ignore urllib3 Retry import | Bryan Newbold | 2019-03-15 | 1 | -1/+3 |
| | | | | | | As the code comment mentions, not sure why pylint throws this error. requests and urllib3 are recent, and this code runs fine in tests and QA, and pylint is running (in CI) within pipenv. | ||||
* | MEDLINE/Pubmed note | Bryan Newbold | 2019-03-15 | 1 | -2/+6 |
| | | | | Also, arXivRaw, not arXiv (though see WIP on more-importers branch) | ||||
* | more integration of transform refactor | Bryan Newbold | 2019-03-11 | 1 | -2/+2 |
| | |||||
* | refactor transforms into sub-dir | Bryan Newbold | 2019-03-11 | 5 | -193/+206 |
| | |||||
* | basic demo CSL/citeproc transform code | Bryan Newbold | 2019-03-11 | 2 | -1/+166 |
| | | | | Needs tests | ||||
* | fix harvester session.get() params | Bryan Newbold | 2019-03-06 | 1 | -5/+8 |
| | |||||
* | retry/backoff for Crossref harvester | Bryan Newbold | 2019-03-06 | 2 | -2/+24 |
| | |||||
* | 10 MByte default Kafka produce (workers) | Bryan Newbold | 2019-03-06 | 2 | -2/+9 |
| | |||||
* | elastic-release worker w/o API | Bryan Newbold | 2019-03-04 | 1 | -4/+4 |
| | | | | | Forgot that this worker really doesn't want/need any API connection at all; just an ApiClient to deserialize objects from Kafka. | ||||
* | fix elastic research worker api arg | Bryan Newbold | 2019-03-04 | 1 | -4/+3 |
| | |||||
* | include container_id in release ES schema | Bryan Newbold | 2019-02-22 | 1 | -0/+1 |
| | |||||
* | bunch of lint/whitespace cleanups | Bryan Newbold | 2019-02-22 | 9 | -19/+12 |
| | |||||
* | better/additional crossref license lookups | Bryan Newbold | 2019-02-14 | 1 | -20/+58 |
| | |||||
* | crossref: import subtitle as str, not list[str] | Bryan Newbold | 2019-02-14 | 1 | -0/+2 |
| | |||||
* | don't print missing DOIs, just count | Bryan Newbold | 2019-02-05 | 1 | -1/+3 |
| | |||||
* | add some missing LICENSE_SLUG_MAP | Bryan Newbold | 2019-02-05 | 1 | -1/+4 |
| | |||||
* | fix missing in_ia_sim flag in release-to-es | Bryan Newbold | 2019-02-04 | 1 | -0/+2 |
| | |||||
* | flag to control boolean cast in elastic transforms | Bryan Newbold | 2019-02-01 | 1 | -13/+29 |
| | | | | So these functions can be re-used in simplified webface rendering. | ||||
* | yet another required field bug | Bryan Newbold | 2019-01-29 | 1 | -4/+5 |
| | |||||
* | fix null name for container (required) | Bryan Newbold | 2019-01-29 | 1 | -1/+5 |
| | |||||
* | tweaks to GROBID metadata import | Bryan Newbold | 2019-01-29 | 1 | -3/+2 |
| | |||||
* | crossref import tweaks/fixes | Bryan Newbold | 2019-01-29 | 1 | -7/+9 |
| | | | | | - refs: article-title not title; save unstructured; authors not author - save 'language' field (already an ISO code) | ||||
* | fix bug in clean() resulting in many consistency check fails | Bryan Newbold | 2019-01-29 | 2 | -12/+12 |
| | |||||
* | fix refs extra ordering bug | Bryan Newbold | 2019-01-29 | 1 | -6/+6 |
| | |||||
* | pass through kwargs (fixes bezerk imports) | Bryan Newbold | 2019-01-29 | 5 | -5/+10 |
| | |||||
* | ensure raw_name is not stub | Bryan Newbold | 2019-01-29 | 1 | -1/+4 |
| | |||||
* | ensure abstracts aren't stubs | Bryan Newbold | 2019-01-29 | 1 | -2/+3 |
| | |||||
* | add stub parse_record() to make pylint happy | Bryan Newbold | 2019-01-28 | 1 | -0/+4 |
| | |||||
* | elastic doesn't do well with nullables | Bryan Newbold | 2019-01-28 | 1 | -14/+14 |
| | |||||
* | fix title length checks in crossref | Bryan Newbold | 2019-01-28 | 1 | -2/+2 |
| | |||||
* | fix rel/url order swap | Bryan Newbold | 2019-01-28 | 1 | -1/+1 |
| | |||||
* | remove accidental print in release transform | Bryan Newbold | 2019-01-28 | 1 | -1/+0 |
| | |||||
* | don't allow empty or single-character clean strings | Bryan Newbold | 2019-01-28 | 1 | -1/+1 |
| | |||||
* | filter short/stub original_title | Bryan Newbold | 2019-01-28 | 1 | -3/+7 |
| | |||||
* | fix typo in container transform | Bryan Newbold | 2019-01-28 | 1 | -1/+1 |
| | |||||
* | fixes to transform code | Bryan Newbold | 2019-01-28 | 1 | -9/+11 |
| |