Commit message (Collapse) | Author | Age | Files | Lines | ||
---|---|---|---|---|---|---|
... | ||||||
* | web: release search (SERP) changes | Bryan Newbold | 2021-02-26 | 2 | -7/+24 | |
| | | | | | - show preservation status instead of fulltext tag - more external identifiers | |||||
* | web: release view improvements | Bryan Newbold | 2021-02-26 | 1 | -111/+79 | |
| | ||||||
* | web: container bar improvements (eg, kbart holdings) | Bryan Newbold | 2021-02-26 | 1 | -8/+27 | |
| | ||||||
* | web: generic view improvements (entities, lists) | Bryan Newbold | 2021-02-26 | 9 | -69/+113 | |
| | ||||||
* | web: tweak display of files, webcapture, fileset | Bryan Newbold | 2021-02-26 | 3 | -34/+34 | |
| | ||||||
* | update SPN text | Bryan Newbold | 2021-02-26 | 1 | -11/+20 | |
| | ||||||
* | web: updates to homepage | Bryan Newbold | 2021-02-26 | 3 | -11/+54 | |
| | | | | Not sure all of these will stick | |||||
* | web: format search result counts | Bryan Newbold | 2021-02-25 | 1 | -2/+2 | |
| | ||||||
* | entity metadata template: show 'extra' at the bottom | Bryan Newbold | 2021-02-24 | 1 | -3/+3 | |
| | ||||||
* | handle no-volumes coverage | Bryan Newbold | 2021-02-24 | 1 | -5/+6 | |
| | | | | Instead of an error (iframe-like), shows a blank "no data" chart. | |||||
* | update homepage stats | Bryan Newbold | 2021-02-24 | 1 | -3/+3 | |
| | ||||||
* | Merge branch 'master' of github.com:internetarchive/fatcat | Bryan Newbold | 2021-02-24 | 1 | -154/+136 | |
|\ | ||||||
| * | Bump cryptography from 3.3.1 to 3.3.2 in /python | dependabot[bot] | 2021-02-10 | 1 | -154/+136 | |
| | | | | | | | | | | | | | | | | Bumps [cryptography](https://github.com/pyca/cryptography) from 3.3.1 to 3.3.2. - [Release notes](https://github.com/pyca/cryptography/releases) - [Changelog](https://github.com/pyca/cryptography/blob/master/CHANGELOG.rst) - [Commits](https://github.com/pyca/cryptography/compare/3.3.1...3.3.2) Signed-off-by: dependabot[bot] <support@github.com> | |||||
* | | elasticsearch: simple new dblp and doaj fields | Bryan Newbold | 2021-01-20 | 1 | -0/+4 | |
| | | ||||||
* | | about: small copy edits | Bryan Newbold | 2021-01-15 | 1 | -10/+9 | |
| | | | | | | | | Thanks Cari S! | |||||
* | | web: integrity is sha256-HASH, not sha256=HASH | Bryan Newbold | 2021-01-08 | 1 | -2/+2 | |
|/ | ||||||
* | Makefile: rename 'dev' to 'serve'; don't run 'lint' for 'test' | Bryan Newbold | 2021-01-05 | 1 | -6/+6 | |
| | ||||||
* | pipenv: switch to python3.8 (and re-build lock) | Bryan Newbold | 2021-01-05 | 2 | -109/+64 | |
| | | | | | This commit has *only* the pipenv change from python3.7 -> python3.8 and lockfile update. | |||||
* | small python 3.7 -> 3.8 tweaks | Bryan Newbold | 2021-01-05 | 2 | -3/+3 | |
| | ||||||
* | web ingest: terminal URL mismatch as skip, not assert | Bryan Newbold | 2020-12-30 | 1 | -1/+3 | |
| | ||||||
* | dblp release import: skip arxiv_id releases | Bryan Newbold | 2020-12-24 | 1 | -0/+9 | |
| | ||||||
* | normalizer: test for un-versioned arxiv_id | Bryan Newbold | 2020-12-24 | 1 | -0/+4 | |
| | ||||||
* | dblp import: fix arxiv_id typo | Bryan Newbold | 2020-12-23 | 1 | -1/+1 | |
| | | | | Would have been caught by mypy! | |||||
* | ingest: allow dblp imports | Bryan Newbold | 2020-12-23 | 1 | -1/+1 | |
| | ||||||
* | fuzzy: set 120 second timeout on ES lookups | Bryan Newbold | 2020-12-23 | 1 | -1/+1 | |
| | ||||||
* | dblp: polish HTML scrape/extract pipeline | Bryan Newbold | 2020-12-17 | 1 | -0/+14 | |
| | ||||||
* | dblp: flesh out update code path (especially to add container_id linkage) | Bryan Newbold | 2020-12-17 | 1 | -2/+6 | |
| | ||||||
* | dblp: run fuzzy matching at try_update time (same as DOAJ) | Bryan Newbold | 2020-12-17 | 1 | -1/+8 | |
| | ||||||
* | improve dblp release import | Bryan Newbold | 2020-12-17 | 3 | -4/+17 | |
| | ||||||
* | very simple dblp container importer | Bryan Newbold | 2020-12-17 | 7 | -7/+256 | |
| | ||||||
* | dblp release importer: container_id lookup TSV, and dump JSON mode | Bryan Newbold | 2020-12-17 | 2 | -13/+73 | |
| | ||||||
* | basic test coverage of dblp release importer | Bryan Newbold | 2020-12-17 | 4 | -0/+503 | |
| | ||||||
* | wikidata QID normalize helper | Bryan Newbold | 2020-12-17 | 1 | -2/+24 | |
| | ||||||
* | initial implementation of dblp release importer (in progress) | Bryan Newbold | 2020-12-17 | 3 | -0/+474 | |
| | ||||||
* | add 'lxml' mode for large XML file import, and multi-tags | Bryan Newbold | 2020-12-17 | 3 | -19/+31 | |
| | ||||||
* | fix sloppy is_preserved ES transfom test failure | Bryan Newbold | 2020-12-17 | 1 | -1/+1 | |
| | ||||||
* | add dblp as an ingest source and identifier | Bryan Newbold | 2020-12-17 | 1 | -1/+2 | |
| | ||||||
* | ingest: allow doaj ingest responses | Bryan Newbold | 2020-12-17 | 1 | -1/+2 | |
| | ||||||
* | bug fix: is_preserved should always be bool | Bryan Newbold | 2020-12-17 | 1 | -2/+2 | |
| | ||||||
* | Merge branch 'bnewbold-doaj-fuzzy' into 'master' | bnewbold | 2020-12-18 | 7 | -267/+544 | |
|\ | | | | | | | | | DOAJ import fuzzy match filter See merge request webgroup/fatcat!92 | |||||
| * | update fuzzy helper to pass 'reason' through to import code | Bryan Newbold | 2020-12-17 | 2 | -5/+5 | |
| | | | | | | | | | | The motivation for this change is to enable passing the 'reason' through to edit extra metadata, in cases where we merge or cluster releases. | |||||
| * | pipenv: bump fuzzycat to 0.1.9 | Bryan Newbold | 2020-12-17 | 2 | -5/+5 | |
| | | ||||||
| * | add fuzzy match filtering to DOAJ importer | Bryan Newbold | 2020-12-16 | 2 | -4/+23 | |
| | | | | | | | | | | | | | | | | | | | | | | In this default configuration, any entities with a fuzzy match (even "ambiguous") will be skipped at import time, to prevent creating duplicates. This is conservative towards not creating new/duplicate entities. In the future, as we get more confidence in fuzzy match/verification, we can start to ignore AMBIGUOUS, handle EXACT as same release, and merge STRONG (and WEAK?) matches under the same work entity. | |||||
| * | add fuzzy matching helper to importer base class | Bryan Newbold | 2020-12-16 | 3 | -2/+147 | |
| | | | | | | | | Using fuzzycat. Add basic test coverage. | |||||
| * | pipenv: add fuzzycat dependency | Bryan Newbold | 2020-12-16 | 2 | -261/+374 | |
| | | ||||||
* | | entity update worker: treat fileset and webcapture updates like file updates | Bryan Newbold | 2020-12-16 | 1 | -3/+25 | |
| | | | | | | | | | | | | | | | | | | When webcapture or fileset entities are updated, then the release entities associated with them also need to be updated (and work entities, recursively). A TODO is to handle the case where a release_id is *removed* as well as *added*, and reprocess the releases in that case as well. | |||||
* | | fix indentation | Bryan Newbold | 2020-12-16 | 1 | -2/+2 | |
| | | ||||||
* | | have release elasticsearch transform count webcaptures and filesets towards ↵ | Bryan Newbold | 2020-12-16 | 1 | -26/+57 | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | preservation These are simple/partial changes to have webcaptures and filesets show up in 'preservation', 'in_ia', and 'in_web' ES schema flags. A longer-term TODO is to update the ES schema to have more granular analytic flags. Also includes a small generalization refactor for URL object parsing into preservation status, shared across file+fileset+webcapture entity types (all have similar URL objects with url+rel fields). | |||||
* | | improve release elasticsearch transform test coverage | Bryan Newbold | 2020-12-16 | 3 | -11/+86 | |
| | | ||||||
* | | small release_to_elasticsearch refactors | Bryan Newbold | 2020-12-16 | 1 | -7/+12 | |
| | | | | | | | | | | | | | | These should have almost no change in behavior, but improve code quality. The one behavior change is counting ftp URLs as "in_web" |