aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* pubmed: do not fail when accessing missing fileMartin Czygan2021-07-171-2/+8
| | | | | | | after a sync gap (e.g. 06/07 2021) harvester wanted to fetch a file, that was not on the server (any more) - do not fail in this case we'll need to backfill missing records via full data dump
* Merge branch 'martin-pubmed-eof-sentry-91102' into 'master'Martin Czygan2021-07-161-4/+30
|\ | | | | | | | | pubmed: reconnect on error See merge request webgroup/fatcat!110
| * pubmed: reconnect on errorMartin Czygan2021-07-161-4/+30
|/ | | | | | | | | ftp retrieval would run but fail with EOFError on /pubmed/updatefiles/pubmed21n1328_stats.html - not able to find the root cause; using a fresh client, the exact same file would work just fine. So when we retry, we reconnect on failure. Refs: sentry #91102.
* CHANGELOG updates (unreleased)Bryan Newbold2021-07-131-0/+7
|
* web: fix flask/werkzeug encoding for mediawiki oauthBryan Newbold2021-07-131-1/+4
|
* web: fix missing ext_ids default for deleted entity viewBryan Newbold2021-07-131-1/+1
|
* web: fix 'file' entity edit form linksBryan Newbold2021-07-021-1/+1
|
* web: missing trailing parensBryan Newbold2021-07-021-1/+1
|
* web: PMCID external link improvementBryan Newbold2021-07-022-2/+2
|
* Merge branch 'bnewbold-more-doi-lower' into 'master'Martin Czygan2021-07-023-3/+8
|\ | | | | | | | | more consistent and defensive lower-casing of DOIs See merge request webgroup/fatcat!109
| * more consistent and defensive lower-casing of DOIsBryan Newbold2021-06-233-3/+8
| | | | | | | | | | | | | | After noticing more upper/lower ambiguity in production. In particular, we have some old ingest requests in sandcrawler DB, which get re-submitted/re-tried, which have capitalized DOIs in the link source id field.
* | tests: small citeproc style changes (to match Pipfile.lock update)Bryan Newbold2021-06-232-3/+4
| |
* | pipenv: regenerate lock fileBryan Newbold2021-06-231-26/+68
| |
* | pipenv: add pydantic; add surt; narrow dynaconfBryan Newbold2021-06-231-1/+3
| |
* | old dblp hacking notesBryan Newbold2021-06-231-0/+72
|/
* stats snapshot (2021-06-23)Bryan Newbold2021-06-232-0/+47
|
* SQL dumps: more pigz (vs. gzip) for speedBryan Newbold2021-06-171-2/+2
|
* fatcat_ref ES schema: more doc_values; source_year not source_release_yearBryan Newbold2021-06-171-5/+2
|
* Merge branch 'martin-datacite-none-title-sentry-88350' into 'master'Martin Czygan2021-06-114-2/+97
|\ | | | | | | | | datacite: more careful title string access; fixes sentry #88350 See merge request webgroup/fatcat!108
| * datacite: more careful title string access; fixes sentry #88350Martin Czygan2021-06-114-2/+97
|/ | | | | Caused by a partial "title entry without title" coming *first* (e.g. just holding, e.g. a language, like: {'lang': 'da'}
* Merge branch 'bnewbold-clean-doi-lower' into 'master'Martin Czygan2021-06-101-1/+4
|\ | | | | | | | | clean_doi() should lower-case returned DOI See merge request webgroup/fatcat!107
| * clean_doi() should lower-case returned DOIBryan Newbold2021-06-071-1/+4
|/ | | | | | | | | | Code in a number of places (including Pubmed importer) assumed that this was already lower-casing DOIs, resulting in some broken metadata getting created. See also: https://github.com/internetarchive/fatcat/issues/83 This is just the first step of mitigation.
* web: fix DOAJ article links (remove trailing slash)Bryan Newbold2021-06-041-1/+1
|
* dblp tests: skip redundant seek(0)Bryan Newbold2021-06-031-6/+1
|
* ingest: swap ingest and file checks, to result in clearer stats/counts of ↵Bryan Newbold2021-06-031-2/+2
| | | | skipping
* ingest: don't accept mag and s2 URLsBryan Newbold2021-06-031-4/+4
|
* update dblp pre-import notes and pipenv python version (3.8)Bryan Newbold2021-06-032-6/+11
|
* dblp import notes and bulk edit CHANGELOG updateBryan Newbold2021-06-032-1/+47
|
* DOAJ bulk import notes, and update bulk edit changelogBryan Newbold2021-06-022-0/+89
|
* bump fuzzycat dependency to 0.1.21Bryan Newbold2021-06-022-20/+18
|
* web: fix spacing for doaj/dblp identifiers in SERPBryan Newbold2021-05-311-1/+1
|
* ingest: don't 'track_total_hits' for ES 7.x count()Bryan Newbold2021-05-311-1/+1
|
* web: bugfix dblp vs. doaj display logicBryan Newbold2021-05-311-1/+1
|
* update fuzzycat to 0.1.20Bryan Newbold2021-05-312-31/+94
|
* Merge branch 'bnewbold-lint-fixes' into 'master'Martin Czygan2021-05-287-9/+8
|\ | | | | | | | | various lint fixes; should un-break CI See merge request webgroup/fatcat!106
| * makefile: add pylint -E invocation to 'make lint', to match CIBryan Newbold2021-05-251-0/+1
| |
| * skip pylint on 'assigning-non-slot' warnings in Flask 2.0Bryan Newbold2021-05-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'permanent' field is still valid to set to a boolean in Flask 2.0; not sure why pylint is unhappy in CI (causing test failures). Don't see any problem running test suite locally. Flask API docs: https://flask.palletsprojects.com/en/2.0.x/api/?highlight=permanent#flask.session.permanent And code (recent master branch): https://github.com/pallets/flask/blob/4240ace59710d86c478111affd4ad6fb4c8cad9e/src/flask/sessions.py#L20
| * changelog worker: fix file/fileset typo, caught by lintBryan Newbold2021-05-251-1/+1
| | | | | | | | | | This would have been resulting in some releases not getting re-indexed into search.
| * small python lint fixes (no behavior change)Bryan Newbold2021-05-255-6/+4
|/
* update CHANGELOG regarding python dependencies; call it v0.3.4v0.3.4Bryan Newbold2021-05-251-1/+4
|
* Merge branch 'bnewbold-pallets-updates' into 'master'Martin Czygan2021-05-252-152/+167
|\ | | | | | | | | bump Flask to 2.x; other deps upgraded automatically See merge request webgroup/fatcat!105
| * bump Flask to 2.x; other deps upgraded automaticallyBryan Newbold2021-05-212-152/+167
|/
* ingest: add per-container ingest type overridesBryan Newbold2021-05-212-1/+23
|
* fix arabesque sqlite3 examples to have 14-digit timestampsBryan Newbold2021-05-211-0/+0
|
* arabesque importer: ensure full 14-digit timestampsBryan Newbold2021-05-211-1/+3
|
* Andrew W. Mellon FoundationBryan Newbold2021-05-182-3/+3
|
* more interesting example entities (eg, to crawl)Bryan Newbold2021-05-181-0/+19
|
* elasticsearch ref schema: 6 shards, not 12Bryan Newbold2021-05-181-1/+1
|
* Merge branch 'bnewbold-pipenv-cleanup' into 'master'bnewbold2021-04-232-327/+277
|\ | | | | | | | | pipenv cleanup See merge request webgroup/fatcat!104
| * pipenv: re-lock projectBryan Newbold2021-04-191-301/+253
| |