summaryrefslogtreecommitdiffstats
path: root/python
Commit message (Collapse)AuthorAgeFilesLines
* web: fix editgrouop action/help overlapBryan Newbold2021-02-261-2/+5
|
* web: release search (SERP) changesBryan Newbold2021-02-262-7/+24
| | | | | - show preservation status instead of fulltext tag - more external identifiers
* web: release view improvementsBryan Newbold2021-02-261-111/+79
|
* web: container bar improvements (eg, kbart holdings)Bryan Newbold2021-02-261-8/+27
|
* web: generic view improvements (entities, lists)Bryan Newbold2021-02-269-69/+113
|
* web: tweak display of files, webcapture, filesetBryan Newbold2021-02-263-34/+34
|
* update SPN textBryan Newbold2021-02-261-11/+20
|
* web: updates to homepageBryan Newbold2021-02-263-11/+54
| | | | Not sure all of these will stick
* web: format search result countsBryan Newbold2021-02-251-2/+2
|
* entity metadata template: show 'extra' at the bottomBryan Newbold2021-02-241-3/+3
|
* handle no-volumes coverageBryan Newbold2021-02-241-5/+6
| | | | Instead of an error (iframe-like), shows a blank "no data" chart.
* update homepage statsBryan Newbold2021-02-241-3/+3
|
* Merge branch 'master' of github.com:internetarchive/fatcatBryan Newbold2021-02-241-154/+136
|\
| * Bump cryptography from 3.3.1 to 3.3.2 in /pythondependabot[bot]2021-02-101-154/+136
| | | | | | | | | | | | | | | | Bumps [cryptography](https://github.com/pyca/cryptography) from 3.3.1 to 3.3.2. - [Release notes](https://github.com/pyca/cryptography/releases) - [Changelog](https://github.com/pyca/cryptography/blob/master/CHANGELOG.rst) - [Commits](https://github.com/pyca/cryptography/compare/3.3.1...3.3.2) Signed-off-by: dependabot[bot] <support@github.com>
* | elasticsearch: simple new dblp and doaj fieldsBryan Newbold2021-01-201-0/+4
| |
* | about: small copy editsBryan Newbold2021-01-151-10/+9
| | | | | | | | Thanks Cari S!
* | web: integrity is sha256-HASH, not sha256=HASHBryan Newbold2021-01-081-2/+2
|/
* Makefile: rename 'dev' to 'serve'; don't run 'lint' for 'test'Bryan Newbold2021-01-051-6/+6
|
* pipenv: switch to python3.8 (and re-build lock)Bryan Newbold2021-01-052-109/+64
| | | | | This commit has *only* the pipenv change from python3.7 -> python3.8 and lockfile update.
* small python 3.7 -> 3.8 tweaksBryan Newbold2021-01-052-3/+3
|
* web ingest: terminal URL mismatch as skip, not assertBryan Newbold2020-12-301-1/+3
|
* dblp release import: skip arxiv_id releasesBryan Newbold2020-12-241-0/+9
|
* normalizer: test for un-versioned arxiv_idBryan Newbold2020-12-241-0/+4
|
* dblp import: fix arxiv_id typoBryan Newbold2020-12-231-1/+1
| | | | Would have been caught by mypy!
* ingest: allow dblp importsBryan Newbold2020-12-231-1/+1
|
* fuzzy: set 120 second timeout on ES lookupsBryan Newbold2020-12-231-1/+1
|
* dblp: polish HTML scrape/extract pipelineBryan Newbold2020-12-171-0/+14
|
* dblp: flesh out update code path (especially to add container_id linkage)Bryan Newbold2020-12-171-2/+6
|
* dblp: run fuzzy matching at try_update time (same as DOAJ)Bryan Newbold2020-12-171-1/+8
|
* improve dblp release importBryan Newbold2020-12-173-4/+17
|
* very simple dblp container importerBryan Newbold2020-12-177-7/+256
|
* dblp release importer: container_id lookup TSV, and dump JSON modeBryan Newbold2020-12-172-13/+73
|
* basic test coverage of dblp release importerBryan Newbold2020-12-174-0/+503
|
* wikidata QID normalize helperBryan Newbold2020-12-171-2/+24
|
* initial implementation of dblp release importer (in progress)Bryan Newbold2020-12-173-0/+474
|
* add 'lxml' mode for large XML file import, and multi-tagsBryan Newbold2020-12-173-19/+31
|
* fix sloppy is_preserved ES transfom test failureBryan Newbold2020-12-171-1/+1
|
* add dblp as an ingest source and identifierBryan Newbold2020-12-171-1/+2
|
* ingest: allow doaj ingest responsesBryan Newbold2020-12-171-1/+2
|
* bug fix: is_preserved should always be boolBryan Newbold2020-12-171-2/+2
|
* Merge branch 'bnewbold-doaj-fuzzy' into 'master'bnewbold2020-12-187-267/+544
|\ | | | | | | | | DOAJ import fuzzy match filter See merge request webgroup/fatcat!92
| * update fuzzy helper to pass 'reason' through to import codeBryan Newbold2020-12-172-5/+5
| | | | | | | | | | The motivation for this change is to enable passing the 'reason' through to edit extra metadata, in cases where we merge or cluster releases.
| * pipenv: bump fuzzycat to 0.1.9Bryan Newbold2020-12-172-5/+5
| |
| * add fuzzy match filtering to DOAJ importerBryan Newbold2020-12-162-4/+23
| | | | | | | | | | | | | | | | | | | | | | In this default configuration, any entities with a fuzzy match (even "ambiguous") will be skipped at import time, to prevent creating duplicates. This is conservative towards not creating new/duplicate entities. In the future, as we get more confidence in fuzzy match/verification, we can start to ignore AMBIGUOUS, handle EXACT as same release, and merge STRONG (and WEAK?) matches under the same work entity.
| * add fuzzy matching helper to importer base classBryan Newbold2020-12-163-2/+147
| | | | | | | | Using fuzzycat. Add basic test coverage.
| * pipenv: add fuzzycat dependencyBryan Newbold2020-12-162-261/+374
| |
* | entity update worker: treat fileset and webcapture updates like file updatesBryan Newbold2020-12-161-3/+25
| | | | | | | | | | | | | | | | | | When webcapture or fileset entities are updated, then the release entities associated with them also need to be updated (and work entities, recursively). A TODO is to handle the case where a release_id is *removed* as well as *added*, and reprocess the releases in that case as well.
* | fix indentationBryan Newbold2020-12-161-2/+2
| |
* | have release elasticsearch transform count webcaptures and filesets towards ↵Bryan Newbold2020-12-161-26/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | preservation These are simple/partial changes to have webcaptures and filesets show up in 'preservation', 'in_ia', and 'in_web' ES schema flags. A longer-term TODO is to update the ES schema to have more granular analytic flags. Also includes a small generalization refactor for URL object parsing into preservation status, shared across file+fileset+webcapture entity types (all have similar URL objects with url+rel fields).
* | improve release elasticsearch transform test coverageBryan Newbold2020-12-163-11/+86
| |