Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | pipenv: update ftfy to 6.x | Bryan Newbold | 2021-11-02 | 1 | -1/+1 |
| | |||||
* | pipenv: add sentry-sdk (to make future raven removal easier) | Bryan Newbold | 2021-11-02 | 1 | -0/+1 |
| | |||||
* | pipenv: pin elasticsearch client to prevent UnsupportedProductError | Bryan Newbold | 2021-11-02 | 1 | -1/+3 |
| | |||||
* | pipenv: additional dev tools (black, types, isort, mypy) | Bryan Newbold | 2021-11-02 | 1 | -0/+11 |
| | |||||
* | small python tweaks for annotations, imports | Bryan Newbold | 2021-11-02 | 3 | -3/+7 |
| | |||||
* | try some type annotations | Bryan Newbold | 2021-11-02 | 4 | -70/+79 |
| | |||||
* | reviewer: add annotations required by mypy | Bryan Newbold | 2021-11-02 | 1 | -2/+3 |
| | |||||
* | temporary hack around filesets.manifest order instability | Bryan Newbold | 2021-11-02 | 1 | -3/+4 |
| | | | | | | May need some change in fatcatd or schema? This isn't a new issue, that part of schema has been around for a long time, just getting detected now with these tests. | ||||
* | fix missing variable in fileset ingest | Bryan Newbold | 2021-11-02 | 1 | -2/+1 |
| | |||||
* | web: improve git version generation | Bryan Newbold | 2021-11-02 | 1 | -1/+1 |
| | | | | | | This shouldn't change behavior on `master` branch, but in some cases (unsigned / no-message tags), should display better short version names in the footer. | ||||
* | Merge branch 'bnewbold-import-fileset' | Bryan Newbold | 2021-11-02 | 9 | -4/+507 |
|\ | |||||
| * | WIP: more fileset ingest | Bryan Newbold | 2021-10-18 | 1 | -13/+21 |
| | | |||||
| * | python: gitignore more | Bryan Newbold | 2021-10-15 | 1 | -0/+2 |
| | | |||||
| * | WIP: rel fixes | Bryan Newbold | 2021-10-14 | 1 | -6/+6 |
| | | |||||
| * | fileset ingest small tweaks | Bryan Newbold | 2021-10-14 | 1 | -21/+36 |
| | | |||||
| * | initial implementation of fileset ingest importers | Bryan Newbold | 2021-10-14 | 3 | -3/+298 |
| | | |||||
| * | ingest: handle datasets, components, other ingest types | Bryan Newbold | 2021-10-14 | 1 | -1/+15 |
| | | |||||
| * | generic fileset importer class, with test coverage | Bryan Newbold | 2021-10-14 | 6 | -0/+169 |
| | | |||||
* | | Merge branch 'bnewbold-match-get' | Bryan Newbold | 2021-11-02 | 4 | -9/+60 |
|\ \ | |||||
| * | | match: fix access_options in return | Bryan Newbold | 2021-10-18 | 1 | -3/+7 |
| | | | |||||
| * | | access: populate thumbnail_url for PDFs | Bryan Newbold | 2021-10-18 | 1 | -3/+9 |
| | | | |||||
| * | | add GET w/ query params to reference match endpoint (and JSON version) | Bryan Newbold | 2021-10-18 | 3 | -6/+47 |
| |/ | |||||
* / | pubmed: switch default http site to retrieve update files | Martin Czygan | 2021-10-15 | 1 | -2/+4 |
|/ | | | | | | | Proxy started to throw: "dial tcp: lookup ftp.ncbi.nlm.nih.gov on [::1]:53: read udp [::1]:45178->[::1]:53: read: connection refused" NIH has a http version on it's own, try to use that. | ||||
* | web: minor typo correction | Bryan Newbold | 2021-10-13 | 1 | -1/+1 |
| | |||||
* | web: editor username /u/<username> helper | Bryan Newbold | 2021-10-13 | 2 | -0/+16 |
| | |||||
* | web: container lookup and display features | Bryan Newbold | 2021-10-13 | 3 | -7/+13 |
| | |||||
* | python: additional test coverage for v0.4 changes | Bryan Newbold | 2021-10-13 | 2 | -2/+19 |
| | |||||
* | dblp import: basic support for handles as identifiers | Bryan Newbold | 2021-10-13 | 1 | -1/+5 |
| | |||||
* | python: normalization/validation support for handle identifiers (hdl) | Bryan Newbold | 2021-10-13 | 1 | -0/+33 |
| | |||||
* | dblp import: fix typos in identifier parsing | Bryan Newbold | 2021-10-13 | 1 | -2/+1 |
| | |||||
* | python: partial importer utilization of new schema changes | Bryan Newbold | 2021-10-13 | 3 | -6/+18 |
| | |||||
* | python: test coverage of rust schema changes | Bryan Newbold | 2021-10-13 | 4 | -2/+59 |
| | |||||
* | python: implement ES schema changes | Bryan Newbold | 2021-10-13 | 1 | -4/+17 |
| | |||||
* | web: implement new schema changes | Bryan Newbold | 2021-10-13 | 6 | -11/+45 |
| | |||||
* | Merge branch 'bnewbold-ingest-tweaks' into 'master' | bnewbold | 2021-10-02 | 4 | -39/+139 |
|\ | | | | | | | | | ingest importer behavior tweaks See merge request webgroup/fatcat!120 | ||||
| * | kafka import: optional 'force-flush' mode for some importers | Bryan Newbold | 2021-10-01 | 2 | -0/+16 |
| | | | | | | | | Behavior and motivation described in the kafka json import comment. | ||||
| * | new SPN web (html) importer | Bryan Newbold | 2021-10-01 | 3 | -27/+111 |
| | | |||||
| * | ingest importer behavior tweaks | Bryan Newbold | 2021-10-01 | 1 | -8/+8 |
| | | | | | | | | | | - change order of 'want()' checks, so that result counts are clearer - don't require GROBID success for file imports with SPN | ||||
| * | importer common: more verbose logging (with counts) | Bryan Newbold | 2021-10-01 | 1 | -4/+4 |
| | | |||||
* | | datacite: skip empty abstracts | Martin Czygan | 2021-10-01 | 4 | -2/+95 |
|/ | | | | | Do not add abstracts where `clean` results in the empty string - this violates a constraint: `either abstract_sha1 or content is required` | ||||
* | default ingest request topic now '-daily'; configurable for ingest_tool.py | Bryan Newbold | 2021-09-30 | 4 | -4/+9 |
| | |||||
* | Merge branch 'martin-pubmed-ftp-extramuros' into 'master' | Martin Czygan | 2021-09-09 | 1 | -24/+21 |
|\ | | | | | | | | | pubmed: workaround a networking issue See merge request webgroup/fatcat!118 | ||||
| * | pubmed: workaround a networking issue | Martin Czygan | 2021-09-09 | 1 | -24/+21 |
| | | | | | | | | | | | | use an http proxy (https://github.com/miku/ftpup) to fetch files from FTP, keep some retry logic; also, hardcoding the proxy path as this should be a temporary workaround | ||||
* | | trivial blank line lint | Bryan Newbold | 2021-09-08 | 1 | -1/+0 |
|/ | |||||
* | pubmed: add option to ftp download with lftp | Martin Czygan | 2021-09-08 | 1 | -2/+31 |
| | | | | | lftp is a classic command line ftp client, and we hope that its retry capabilities are enough of a workaround for the current networking issue | ||||
* | pubmed harvester: add basic retry logic | Martin Czygan | 2021-08-20 | 1 | -8/+21 |
| | | | | | | | | Related to a previous issue with seemingly random EOFError from FTP connections, this patch wrap "ftpretr" helper function with a basic retry. Refs: fatcat-workers/issues/92151, fatcat-workers/issues/91102 | ||||
* | web: fix stats rowspan (oops) | Bryan Newbold | 2021-08-12 | 1 | -1/+1 |
| | |||||
* | web: remove confusing 'references' row from stats table | Bryan Newbold | 2021-08-12 | 1 | -3/+0 |
| | | | | Now that we have refcat, which is a different number | ||||
* | refs: default to *not* consolidating works | Bryan Newbold | 2021-08-06 | 1 | -1/+1 |
| | | | | | | | We don't handle counts for consolidated refs yet, so just don't consolidate. This should fix, eg, "Showing 1-18 of 19" type UX confusion, with the trade-off that some works will be duplicated in inbound ref tables. | ||||
* | web: update front-page static stats | Bryan Newbold | 2021-08-06 | 1 | -3/+3 |
| |