Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | importer code updates | Bryan Newbold | 2019-05-13 | 4 | -3/+18 |
| | |||||
* | partial python impl of ext_id and release_stage refactors | Bryan Newbold | 2019-05-13 | 5 | -29/+35 |
| | |||||
* | handle null abstracts for release | Bryan Newbold | 2019-05-07 | 1 | -1/+1 |
| | |||||
* | add limits to match importers | Bryan Newbold | 2019-04-23 | 3 | -2/+27 |
| | |||||
* | archive.org isn't really a repository | Bryan Newbold | 2019-04-22 | 1 | -1/+3 |
| | |||||
* | editgroup description override | Bryan Newbold | 2019-04-22 | 1 | -2/+2 |
| | |||||
* | arabesque importer does require timestamp/wayback | Bryan Newbold | 2019-04-22 | 1 | -0/+3 |
| | |||||
* | matched importer shouldn't require wayback | Bryan Newbold | 2019-04-22 | 1 | -5/+7 |
| | |||||
* | handle API 400 in arabesque import (invalid extid) | Bryan Newbold | 2019-04-19 | 1 | -7/+14 |
| | |||||
* | fix arabesque importer crawl_id None bug | Bryan Newbold | 2019-04-18 | 1 | -1/+1 |
| | |||||
* | mechanism to not double-update entities | Bryan Newbold | 2019-04-18 | 2 | -1/+9 |
| | |||||
* | minor arabesque tweaks | Bryan Newbold | 2019-04-18 | 1 | -0/+2 |
| | |||||
* | update URL rel list | Bryan Newbold | 2019-04-18 | 1 | -1/+10 |
| | |||||
* | arabesque importer does fewer updates | Bryan Newbold | 2019-04-18 | 1 | -1/+8 |
| | |||||
* | arabesque importer | Bryan Newbold | 2019-04-18 | 1 | -0/+165 |
| | |||||
* | early version of arabesque importer | Bryan Newbold | 2019-04-12 | 1 | -0/+1 |
| | |||||
* | add SqlitePusher importer option | Bryan Newbold | 2019-04-12 | 2 | -1/+21 |
| | |||||
* | fix reviewer bugs (thanks pylint) | Bryan Newbold | 2019-04-06 | 1 | -3/+3 |
| | |||||
* | basic dummy review bot | Bryan Newbold | 2019-04-06 | 2 | -0/+239 |
| | |||||
* | improve test coverage | Bryan Newbold | 2019-04-04 | 1 | -0/+1 |
| | |||||
* | increase default harvest window to 14 days | Bryan Newbold | 2019-04-01 | 1 | -2/+2 |
| | |||||
* | fix cdl_dash_dat license_slug | Bryan Newbold | 2019-03-19 | 1 | -7/+3 |
| | |||||
* | importer for CDL/DASH dat pilot dweb datasets | Bryan Newbold | 2019-03-19 | 2 | -0/+200 |
| | |||||
* | new importer: wayback_static | Bryan Newbold | 2019-03-19 | 2 | -0/+237 |
| | |||||
* | expose bibtex and citeproc; revert /unstable/ prefixes | Bryan Newbold | 2019-03-18 | 1 | -1/+1 |
| | |||||
* | refactor and test citeproc code | Bryan Newbold | 2019-03-18 | 2 | -3/+55 |
| | |||||
* | HACK: force pylint to ignore urllib3 Retry import | Bryan Newbold | 2019-03-15 | 1 | -1/+3 |
| | | | | | | As the code comment mentions, not sure why pylint throws this error. requests and urllib3 are recent, and this code runs fine in tests and QA, and pylint is running (in CI) within pipenv. | ||||
* | MEDLINE/Pubmed note | Bryan Newbold | 2019-03-15 | 1 | -2/+6 |
| | | | | Also, arXivRaw, not arXiv (though see WIP on more-importers branch) | ||||
* | more integration of transform refactor | Bryan Newbold | 2019-03-11 | 1 | -2/+2 |
| | |||||
* | refactor transforms into sub-dir | Bryan Newbold | 2019-03-11 | 5 | -193/+206 |
| | |||||
* | basic demo CSL/citeproc transform code | Bryan Newbold | 2019-03-11 | 2 | -1/+166 |
| | | | | Needs tests | ||||
* | fix harvester session.get() params | Bryan Newbold | 2019-03-06 | 1 | -5/+8 |
| | |||||
* | retry/backoff for Crossref harvester | Bryan Newbold | 2019-03-06 | 2 | -2/+24 |
| | |||||
* | 10 MByte default Kafka produce (workers) | Bryan Newbold | 2019-03-06 | 2 | -2/+9 |
| | |||||
* | elastic-release worker w/o API | Bryan Newbold | 2019-03-04 | 1 | -4/+4 |
| | | | | | Forgot that this worker really doesn't want/need any API connection at all; just an ApiClient to deserialize objects from Kafka. | ||||
* | fix elastic research worker api arg | Bryan Newbold | 2019-03-04 | 1 | -4/+3 |
| | |||||
* | include container_id in release ES schema | Bryan Newbold | 2019-02-22 | 1 | -0/+1 |
| | |||||
* | bunch of lint/whitespace cleanups | Bryan Newbold | 2019-02-22 | 9 | -19/+12 |
| | |||||
* | better/additional crossref license lookups | Bryan Newbold | 2019-02-14 | 1 | -20/+58 |
| | |||||
* | crossref: import subtitle as str, not list[str] | Bryan Newbold | 2019-02-14 | 1 | -0/+2 |
| | |||||
* | don't print missing DOIs, just count | Bryan Newbold | 2019-02-05 | 1 | -1/+3 |
| | |||||
* | add some missing LICENSE_SLUG_MAP | Bryan Newbold | 2019-02-05 | 1 | -1/+4 |
| | |||||
* | fix missing in_ia_sim flag in release-to-es | Bryan Newbold | 2019-02-04 | 1 | -0/+2 |
| | |||||
* | flag to control boolean cast in elastic transforms | Bryan Newbold | 2019-02-01 | 1 | -13/+29 |
| | | | | So these functions can be re-used in simplified webface rendering. | ||||
* | yet another required field bug | Bryan Newbold | 2019-01-29 | 1 | -4/+5 |
| | |||||
* | fix null name for container (required) | Bryan Newbold | 2019-01-29 | 1 | -1/+5 |
| | |||||
* | tweaks to GROBID metadata import | Bryan Newbold | 2019-01-29 | 1 | -3/+2 |
| | |||||
* | crossref import tweaks/fixes | Bryan Newbold | 2019-01-29 | 1 | -7/+9 |
| | | | | | - refs: article-title not title; save unstructured; authors not author - save 'language' field (already an ISO code) | ||||
* | fix bug in clean() resulting in many consistency check fails | Bryan Newbold | 2019-01-29 | 2 | -12/+12 |
| | |||||
* | fix refs extra ordering bug | Bryan Newbold | 2019-01-29 | 1 | -6/+6 |
| |