aboutsummaryrefslogtreecommitdiffstats
path: root/python
Commit message (Collapse)AuthorAgeFilesLines
* kafka import: optional 'force-flush' mode for some importersBryan Newbold2021-10-012-0/+16
| | | | Behavior and motivation described in the kafka json import comment.
* new SPN web (html) importerBryan Newbold2021-10-013-27/+111
|
* ingest importer behavior tweaksBryan Newbold2021-10-011-8/+8
| | | | | - change order of 'want()' checks, so that result counts are clearer - don't require GROBID success for file imports with SPN
* importer common: more verbose logging (with counts)Bryan Newbold2021-10-011-4/+4
|
* default ingest request topic now '-daily'; configurable for ingest_tool.pyBryan Newbold2021-09-304-4/+9
|
* Merge branch 'martin-pubmed-ftp-extramuros' into 'master'Martin Czygan2021-09-091-24/+21
|\ | | | | | | | | pubmed: workaround a networking issue See merge request webgroup/fatcat!118
| * pubmed: workaround a networking issueMartin Czygan2021-09-091-24/+21
| | | | | | | | | | | | use an http proxy (https://github.com/miku/ftpup) to fetch files from FTP, keep some retry logic; also, hardcoding the proxy path as this should be a temporary workaround
* | trivial blank line lintBryan Newbold2021-09-081-1/+0
|/
* pubmed: add option to ftp download with lftpMartin Czygan2021-09-081-2/+31
| | | | | lftp is a classic command line ftp client, and we hope that its retry capabilities are enough of a workaround for the current networking issue
* pubmed harvester: add basic retry logicMartin Czygan2021-08-201-8/+21
| | | | | | | | Related to a previous issue with seemingly random EOFError from FTP connections, this patch wrap "ftpretr" helper function with a basic retry. Refs: fatcat-workers/issues/92151, fatcat-workers/issues/91102
* web: fix stats rowspan (oops)Bryan Newbold2021-08-121-1/+1
|
* web: remove confusing 'references' row from stats tableBryan Newbold2021-08-121-3/+0
| | | | Now that we have refcat, which is a different number
* refs: default to *not* consolidating worksBryan Newbold2021-08-061-1/+1
| | | | | | | We don't handle counts for consolidated refs yet, so just don't consolidate. This should fix, eg, "Showing 1-18 of 19" type UX confusion, with the trade-off that some works will be duplicated in inbound ref tables.
* web: update front-page static statsBryan Newbold2021-08-061-3/+3
|
* refs: format (commas) large refs hit countsBryan Newbold2021-08-061-1/+1
|
* refs web: correct URL to refs section of guideBryan Newbold2021-08-041-1/+1
|
* refs: web UI tweaks for iterated CSL schemaBryan Newbold2021-08-032-6/+26
|
* refs: fix typo preventing CSL from rendering in refs outputBryan Newbold2021-07-271-1/+1
|
* refs: start the most basic/minimal web refs test coverage ('integration' level)Bryan Newbold2021-07-274-0/+1094
|
* refs: revert fatcat-pubmed -> pubmed truncationBryan Newbold2021-07-271-4/+1
| | | | This was just going to be confusing
* refs: lint fixesBryan Newbold2021-07-272-2/+3
|
* refs: several small improvements to web UIBryan Newbold2021-07-275-35/+71
|
* refs: slightly better match form (will change)Bryan Newbold2021-07-271-42/+46
|
* refs: show up to 8 authors in summary tablesBryan Newbold2021-07-271-4/+4
|
* refs: support for wikipedia outbound refs, and display in tablesBryan Newbold2021-07-274-8/+69
|
* refs: fix offset/limit bugBryan Newbold2021-07-271-1/+1
|
* refs: generalize web endpoints; JSON content negotiation; openlibrary ↵Bryan Newbold2021-07-234-41/+166
| | | | inbound view; etc
* refs: change mind about URL structure againBryan Newbold2021-07-231-2/+2
|
* web: refactor refs table into separate refs_macros fileBryan Newbold2021-07-233-74/+127
|
* refs: small refactors/tweaksBryan Newbold2021-07-231-11/+17
|
* remove unused imports (lint)Bryan Newbold2021-07-233-8/+4
|
* web: always log upstream errors (may be redundant)Bryan Newbold2021-07-231-0/+2
|
* pylint: skip pydantic import check (dynamic/extensions)Bryan Newbold2021-07-232-8/+4
|
* refs: refactor web paths; enrich refs as generic; remove old refs linkBryan Newbold2021-07-234-66/+52
|
* refs fetch: add some hacks; sort hitsBryan Newbold2021-07-231-6/+16
|
* release view: improve biblio metadata display in central columnBryan Newbold2021-07-231-13/+14
|
* match UI: improve form layoutBryan Newbold2021-07-231-13/+16
|
* improvements to fuzzy refs viewBryan Newbold2021-07-233-47/+75
| | | | | | | | - fixes to release summary macro - show tab counts correctly by re-using generic entity get helper - table styling; 'prev' link - openlibrary access links - parse-and-match button for unmatched+unstructured refs
* fixes for newer ref indexBryan Newbold2021-07-232-50/+11
|
* web: inbound/outbound refs as links (temporarily); change URL namesBryan Newbold2021-07-233-3/+7
|
* web: initial implementation of fuzzy citation parsing and matching toolBryan Newbold2021-07-233-0/+173
|
* references: refactor to point to access_options transform; comment out CSL ↵Bryan Newbold2021-07-231-57/+8
| | | | fields
* partial access options transform for releasesBryan Newbold2021-07-231-0/+58
|
* web: template macro to display release entry summaryBryan Newbold2021-07-231-0/+52
|
* first iteration of basic citation inbound/outbound viewsBryan Newbold2021-07-233-1/+146
|
* initial inbound/outbound reference query helpersBryan Newbold2021-07-231-0/+450
|
* pubmed: update docsMartin Czygan2021-07-171-2/+3
|
* pubmed: do not fail when accessing missing fileMartin Czygan2021-07-171-2/+8
| | | | | | | after a sync gap (e.g. 06/07 2021) harvester wanted to fetch a file, that was not on the server (any more) - do not fail in this case we'll need to backfill missing records via full data dump
* pubmed: reconnect on errorMartin Czygan2021-07-161-4/+30
| | | | | | | | | ftp retrieval would run but fail with EOFError on /pubmed/updatefiles/pubmed21n1328_stats.html - not able to find the root cause; using a fresh client, the exact same file would work just fine. So when we retry, we reconnect on failure. Refs: sentry #91102.
* web: fix flask/werkzeug encoding for mediawiki oauthBryan Newbold2021-07-131-1/+4
|