Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | fmt and comments | Bryan Newbold | 2021-12-06 | 1 | -1/+3 |
| | |||||
* | SIM pipeline: improve exception handling | Bryan Newbold | 2021-12-06 | 1 | -4/+7 |
| | |||||
* | SIM pipeline: fix bug w/r/t issues with no fatcat coverage at all | Bryan Newbold | 2021-12-06 | 1 | -2/+2 |
| | |||||
* | SIM pipeline: improve issue skipping (based on suffix) | Bryan Newbold | 2021-12-06 | 1 | -11/+21 |
| | |||||
* | SIM pipeline: retain only one ulrichs record | Bryan Newbold | 2021-12-06 | 1 | -0/+1 |
| | |||||
* | lint: small cleanups, mostly E711 and E713 | Bryan Newbold | 2021-10-27 | 1 | -1/+1 |
| | |||||
* | re-style imports (isort) on all core python files | Bryan Newbold | 2021-10-27 | 1 | -10/+7 |
| | |||||
* | catch/ignore ChunkedEncoding errors in fetches | Bryan Newbold | 2021-06-11 | 1 | -0/+3 |
| | |||||
* | schema: add 'crossref' to bundle schema, and add from_json() helper | Bryan Newbold | 2021-06-02 | 1 | -0/+1 |
| | | | | | from_json() refactor was an earlier TODO, to reduce duplication when updating fields on this class | ||||
* | sim: catch MaxRetryError | Bryan Newbold | 2021-01-31 | 1 | -0/+2 |
| | |||||
* | enable sentry exceptions for workers and pipelines | Bryan Newbold | 2021-01-30 | 1 | -0/+10 |
| | | | | It is otherwise difficult to debug multi-million record pipelines. | ||||
* | sim pipeline: improve exception catching | Bryan Newbold | 2021-01-27 | 1 | -4/+5 |
| | |||||
* | sim indexing: new parallel fetch structure | Bryan Newbold | 2021-01-26 | 1 | -0/+65 |
| | |||||
* | commands: show usage on empty command | Bryan Newbold | 2020-11-02 | 1 | -1/+1 |
| | |||||
* | SIM pipeline: refactor issue item fetching and bundle conversion | Bryan Newbold | 2020-10-16 | 1 | -23/+32 |
| | |||||
* | json: exclude None in output, and sort keys | Bryan Newbold | 2020-07-27 | 1 | -1/+1 |
| | | | | | | | | | | These are both size/performance enhancements. Not including 'None' values will reduce document sizes on-disk and over network, particularly for intermediate objects. Sorting by key should improve compression ratios across multiple documents, both on-disk (gzip) and in elasticsearch itself: https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-disk-usage.html#_put_fields_in_the_same_order_in_documents | ||||
* | fix lint errors (and some small bugs) | Bryan Newbold | 2020-06-29 | 1 | -2/+2 |
| | |||||
* | more flake8 | Bryan Newbold | 2020-06-03 | 1 | -1/+1 |
| | |||||
* | flake8 fixes (partial) | Bryan Newbold | 2020-06-03 | 1 | -13/+4 |
| | |||||
* | reformat python code with black | Bryan Newbold | 2020-06-03 | 1 | -45/+65 |
| | |||||
* | more petabox timeout handling | Bryan Newbold | 2020-05-21 | 1 | -0/+3 |
| | |||||
* | handle petabox read timeouts a bit | Bryan Newbold | 2020-05-21 | 1 | -1/+6 |
| | |||||
* | skip SIM items w/o page_numbers (instead of asserting) | Bryan Newbold | 2020-05-20 | 1 | -1/+3 |
| | |||||
* | first pass transform from pipelines to ES schema | Bryan Newbold | 2020-05-20 | 1 | -4/+8 |
| | |||||
* | WIP on SIM pipeline | Bryan Newbold | 2020-05-19 | 1 | -0/+173 |