summaryrefslogtreecommitdiffstats
path: root/python
Commit message (Collapse)AuthorAgeFilesLines
* update flake8 configurationBryan Newbold2021-11-021-9/+15
|
* pipenv: update lockfileBryan Newbold2021-11-021-404/+728
|
* pipenv: update ftfy to 6.xBryan Newbold2021-11-021-1/+1
|
* pipenv: add sentry-sdk (to make future raven removal easier)Bryan Newbold2021-11-021-0/+1
|
* pipenv: pin elasticsearch client to prevent UnsupportedProductErrorBryan Newbold2021-11-021-1/+3
|
* pipenv: additional dev tools (black, types, isort, mypy)Bryan Newbold2021-11-021-0/+11
|
* small python tweaks for annotations, importsBryan Newbold2021-11-023-3/+7
|
* try some type annotationsBryan Newbold2021-11-024-70/+79
|
* reviewer: add annotations required by mypyBryan Newbold2021-11-021-2/+3
|
* temporary hack around filesets.manifest order instabilityBryan Newbold2021-11-021-3/+4
| | | | | | May need some change in fatcatd or schema? This isn't a new issue, that part of schema has been around for a long time, just getting detected now with these tests.
* fix missing variable in fileset ingestBryan Newbold2021-11-021-2/+1
|
* web: improve git version generationBryan Newbold2021-11-021-1/+1
| | | | | | This shouldn't change behavior on `master` branch, but in some cases (unsigned / no-message tags), should display better short version names in the footer.
* Merge branch 'bnewbold-import-fileset'Bryan Newbold2021-11-029-4/+507
|\
| * WIP: more fileset ingestBryan Newbold2021-10-181-13/+21
| |
| * python: gitignore moreBryan Newbold2021-10-151-0/+2
| |
| * WIP: rel fixesBryan Newbold2021-10-141-6/+6
| |
| * fileset ingest small tweaksBryan Newbold2021-10-141-21/+36
| |
| * initial implementation of fileset ingest importersBryan Newbold2021-10-143-3/+298
| |
| * ingest: handle datasets, components, other ingest typesBryan Newbold2021-10-141-1/+15
| |
| * generic fileset importer class, with test coverageBryan Newbold2021-10-146-0/+169
| |
* | Merge branch 'bnewbold-match-get'Bryan Newbold2021-11-024-9/+60
|\ \
| * | match: fix access_options in returnBryan Newbold2021-10-181-3/+7
| | |
| * | access: populate thumbnail_url for PDFsBryan Newbold2021-10-181-3/+9
| | |
| * | add GET w/ query params to reference match endpoint (and JSON version)Bryan Newbold2021-10-183-6/+47
| |/
* / pubmed: switch default http site to retrieve update filesMartin Czygan2021-10-151-2/+4
|/ | | | | | | Proxy started to throw: "dial tcp: lookup ftp.ncbi.nlm.nih.gov on [::1]:53: read udp [::1]:45178->[::1]:53: read: connection refused" NIH has a http version on it's own, try to use that.
* web: minor typo correctionBryan Newbold2021-10-131-1/+1
|
* web: editor username /u/<username> helperBryan Newbold2021-10-132-0/+16
|
* web: container lookup and display featuresBryan Newbold2021-10-133-7/+13
|
* python: additional test coverage for v0.4 changesBryan Newbold2021-10-132-2/+19
|
* dblp import: basic support for handles as identifiersBryan Newbold2021-10-131-1/+5
|
* python: normalization/validation support for handle identifiers (hdl)Bryan Newbold2021-10-131-0/+33
|
* dblp import: fix typos in identifier parsingBryan Newbold2021-10-131-2/+1
|
* python: partial importer utilization of new schema changesBryan Newbold2021-10-133-6/+18
|
* python: test coverage of rust schema changesBryan Newbold2021-10-134-2/+59
|
* python: implement ES schema changesBryan Newbold2021-10-131-4/+17
|
* web: implement new schema changesBryan Newbold2021-10-136-11/+45
|
* Merge branch 'bnewbold-ingest-tweaks' into 'master'bnewbold2021-10-024-39/+139
|\ | | | | | | | | ingest importer behavior tweaks See merge request webgroup/fatcat!120
| * kafka import: optional 'force-flush' mode for some importersBryan Newbold2021-10-012-0/+16
| | | | | | | | Behavior and motivation described in the kafka json import comment.
| * new SPN web (html) importerBryan Newbold2021-10-013-27/+111
| |
| * ingest importer behavior tweaksBryan Newbold2021-10-011-8/+8
| | | | | | | | | | - change order of 'want()' checks, so that result counts are clearer - don't require GROBID success for file imports with SPN
| * importer common: more verbose logging (with counts)Bryan Newbold2021-10-011-4/+4
| |
* | datacite: skip empty abstractsMartin Czygan2021-10-014-2/+95
|/ | | | | Do not add abstracts where `clean` results in the empty string - this violates a constraint: `either abstract_sha1 or content is required`
* default ingest request topic now '-daily'; configurable for ingest_tool.pyBryan Newbold2021-09-304-4/+9
|
* Merge branch 'martin-pubmed-ftp-extramuros' into 'master'Martin Czygan2021-09-091-24/+21
|\ | | | | | | | | pubmed: workaround a networking issue See merge request webgroup/fatcat!118
| * pubmed: workaround a networking issueMartin Czygan2021-09-091-24/+21
| | | | | | | | | | | | use an http proxy (https://github.com/miku/ftpup) to fetch files from FTP, keep some retry logic; also, hardcoding the proxy path as this should be a temporary workaround
* | trivial blank line lintBryan Newbold2021-09-081-1/+0
|/
* pubmed: add option to ftp download with lftpMartin Czygan2021-09-081-2/+31
| | | | | lftp is a classic command line ftp client, and we hope that its retry capabilities are enough of a workaround for the current networking issue
* pubmed harvester: add basic retry logicMartin Czygan2021-08-201-8/+21
| | | | | | | | Related to a previous issue with seemingly random EOFError from FTP connections, this patch wrap "ftpretr" helper function with a basic retry. Refs: fatcat-workers/issues/92151, fatcat-workers/issues/91102
* web: fix stats rowspan (oops)Bryan Newbold2021-08-121-1/+1
|
* web: remove confusing 'references' row from stats tableBryan Newbold2021-08-121-3/+0
| | | | Now that we have refcat, which is a different number