Commit message (Collapse) | Author | Age | Files | Lines | ||
---|---|---|---|---|---|---|
... | ||||||
* | initial flesh out of JALC parser | Bryan Newbold | 2019-05-21 | 3 | -1/+348 | |
| | ||||||
* | include creator_ids in release elastic schema | Bryan Newbold | 2019-05-20 | 1 | -0/+6 | |
| | | | | Intent is to allow fast creator search/lookup | |||||
* | include structured contrib names in CDL/dash importer | Bryan Newbold | 2019-05-20 | 1 | -2/+2 | |
| | ||||||
* | elastic release schema update | Bryan Newbold | 2019-05-20 | 1 | -2/+5 | |
| | ||||||
* | improved CSL transform (structured author names) | Bryan Newbold | 2019-05-20 | 1 | -12/+11 | |
| | ||||||
* | make some XXX into TODO | Bryan Newbold | 2019-05-20 | 1 | -2/+2 | |
| | ||||||
* | fix elastic file pdf check | Bryan Newbold | 2019-05-16 | 1 | -1/+3 | |
| | ||||||
* | elastic transforms: work around missing pdf mimetypes | Bryan Newbold | 2019-05-15 | 1 | -1/+1 | |
| | ||||||
* | fix default mimetype (impacted pre-1923 files) | Bryan Newbold | 2019-05-15 | 2 | -4/+9 | |
| | ||||||
* | python impl | Bryan Newbold | 2019-05-14 | 9 | -32/+38 | |
| | ||||||
* | python impl | Bryan Newbold | 2019-05-14 | 6 | -16/+16 | |
| | ||||||
* | python: impl size_bytes -> size | Bryan Newbold | 2019-05-13 | 1 | -1/+1 | |
| | ||||||
* | importer code updates | Bryan Newbold | 2019-05-13 | 4 | -3/+18 | |
| | ||||||
* | partial python impl of ext_id and release_stage refactors | Bryan Newbold | 2019-05-13 | 5 | -29/+35 | |
| | ||||||
* | handle null abstracts for release | Bryan Newbold | 2019-05-07 | 1 | -1/+1 | |
| | ||||||
* | add limits to match importers | Bryan Newbold | 2019-04-23 | 3 | -2/+27 | |
| | ||||||
* | archive.org isn't really a repository | Bryan Newbold | 2019-04-22 | 1 | -1/+3 | |
| | ||||||
* | editgroup description override | Bryan Newbold | 2019-04-22 | 1 | -2/+2 | |
| | ||||||
* | arabesque importer does require timestamp/wayback | Bryan Newbold | 2019-04-22 | 1 | -0/+3 | |
| | ||||||
* | matched importer shouldn't require wayback | Bryan Newbold | 2019-04-22 | 1 | -5/+7 | |
| | ||||||
* | handle API 400 in arabesque import (invalid extid) | Bryan Newbold | 2019-04-19 | 1 | -7/+14 | |
| | ||||||
* | fix arabesque importer crawl_id None bug | Bryan Newbold | 2019-04-18 | 1 | -1/+1 | |
| | ||||||
* | mechanism to not double-update entities | Bryan Newbold | 2019-04-18 | 2 | -1/+9 | |
| | ||||||
* | minor arabesque tweaks | Bryan Newbold | 2019-04-18 | 1 | -0/+2 | |
| | ||||||
* | update URL rel list | Bryan Newbold | 2019-04-18 | 1 | -1/+10 | |
| | ||||||
* | arabesque importer does fewer updates | Bryan Newbold | 2019-04-18 | 1 | -1/+8 | |
| | ||||||
* | arabesque importer | Bryan Newbold | 2019-04-18 | 1 | -0/+165 | |
| | ||||||
* | early version of arabesque importer | Bryan Newbold | 2019-04-12 | 1 | -0/+1 | |
| | ||||||
* | add SqlitePusher importer option | Bryan Newbold | 2019-04-12 | 2 | -1/+21 | |
| | ||||||
* | fix reviewer bugs (thanks pylint) | Bryan Newbold | 2019-04-06 | 1 | -3/+3 | |
| | ||||||
* | basic dummy review bot | Bryan Newbold | 2019-04-06 | 2 | -0/+239 | |
| | ||||||
* | improve test coverage | Bryan Newbold | 2019-04-04 | 1 | -0/+1 | |
| | ||||||
* | increase default harvest window to 14 days | Bryan Newbold | 2019-04-01 | 1 | -2/+2 | |
| | ||||||
* | fix cdl_dash_dat license_slug | Bryan Newbold | 2019-03-19 | 1 | -7/+3 | |
| | ||||||
* | importer for CDL/DASH dat pilot dweb datasets | Bryan Newbold | 2019-03-19 | 2 | -0/+200 | |
| | ||||||
* | new importer: wayback_static | Bryan Newbold | 2019-03-19 | 2 | -0/+237 | |
| | ||||||
* | expose bibtex and citeproc; revert /unstable/ prefixes | Bryan Newbold | 2019-03-18 | 1 | -1/+1 | |
| | ||||||
* | refactor and test citeproc code | Bryan Newbold | 2019-03-18 | 2 | -3/+55 | |
| | ||||||
* | HACK: force pylint to ignore urllib3 Retry import | Bryan Newbold | 2019-03-15 | 1 | -1/+3 | |
| | | | | | | As the code comment mentions, not sure why pylint throws this error. requests and urllib3 are recent, and this code runs fine in tests and QA, and pylint is running (in CI) within pipenv. | |||||
* | MEDLINE/Pubmed note | Bryan Newbold | 2019-03-15 | 1 | -2/+6 | |
| | | | | Also, arXivRaw, not arXiv (though see WIP on more-importers branch) | |||||
* | more integration of transform refactor | Bryan Newbold | 2019-03-11 | 1 | -2/+2 | |
| | ||||||
* | refactor transforms into sub-dir | Bryan Newbold | 2019-03-11 | 5 | -193/+206 | |
| | ||||||
* | basic demo CSL/citeproc transform code | Bryan Newbold | 2019-03-11 | 2 | -1/+166 | |
| | | | | Needs tests | |||||
* | fix harvester session.get() params | Bryan Newbold | 2019-03-06 | 1 | -5/+8 | |
| | ||||||
* | retry/backoff for Crossref harvester | Bryan Newbold | 2019-03-06 | 2 | -2/+24 | |
| | ||||||
* | 10 MByte default Kafka produce (workers) | Bryan Newbold | 2019-03-06 | 2 | -2/+9 | |
| | ||||||
* | elastic-release worker w/o API | Bryan Newbold | 2019-03-04 | 1 | -4/+4 | |
| | | | | | Forgot that this worker really doesn't want/need any API connection at all; just an ApiClient to deserialize objects from Kafka. | |||||
* | fix elastic research worker api arg | Bryan Newbold | 2019-03-04 | 1 | -4/+3 | |
| | ||||||
* | include container_id in release ES schema | Bryan Newbold | 2019-02-22 | 1 | -0/+1 | |
| | ||||||
* | bunch of lint/whitespace cleanups | Bryan Newbold | 2019-02-22 | 9 | -19/+12 | |
| |