Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | matching: include contribs,files in release entity | Bryan Newbold | 2021-10-27 | 1 | -1/+1 |
| | | | | | | | | | | This makes several downstream applications simpler, like showing PDF links without an additional fatcat API fetch. The 'contrib' entities may be required as part of bibliographic matching (checking the creator names as well as the release-local versions of the name) In theory we could add webcaptures,filesets as well, but those are still rare, and occasionally result in very large sub-documents. | ||||
* | start larger refactoring: remove cluster | Martin Czygan | 2021-09-24 | 1 | -2/+0 |
| | | | | | | | | | | | | | | | | | | background: verifying hundreds of millions of documents turned out to be a bit slow; anecdata: running clustering and verification over 1.8B inputs tooks over 50h; cf. the Go port (skate) required about 2-4h for those operations. Also: with Go we do not need the extra GNU parallel wrapping. In any case, we aim for fuzzycat refactoring to provide: * better, more configurable verification and small scale matching * removal of batch clustering code (and improve refcat docs) * a place for a bit more generic, similarity based utils The most important piece in fuzzycat is a CSV file containing hand picked test examples for verification - and the code that is able to fulfill that test suite. We want to make this part more robust. | ||||
* | matching: run an additional es query for fuzzy matching | Martin Czygan | 2021-09-21 | 1 | -1/+73 |
| | |||||
* | style: apply formatting | Martin Czygan | 2021-09-21 | 1 | -1/+2 |
| | |||||
* | matching: actually return the specified number of results | Martin Czygan | 2021-09-15 | 1 | -2/+2 |
| | |||||
* | lint: remove unused imports | Bryan Newbold | 2021-05-31 | 1 | -1/+0 |
| | |||||
* | matching: handle extid not found case (fatcat API HTTP 400 or 404) | Bryan Newbold | 2021-05-31 | 1 | -1/+7 |
| | |||||
* | dynaconf: switch to fuzzycat.config import across project | Bryan Newbold | 2021-04-13 | 1 | -1/+1 |
| | | | | This is the recommended way to use dynaconf. | ||||
* | address es hits.total change in ES7 | Martin Czygan | 2021-04-12 | 1 | -4/+5 |
| | | | | * https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking-changes-7.0.html | ||||
* | matching: a list is required | Martin Czygan | 2021-03-16 | 1 | -1/+1 |
| | |||||
* | [testing] use api for id lookups | Martin Czygan | 2020-12-23 | 1 | -20/+21 |
| | |||||
* | inject configuration | Martin Czygan | 2020-12-23 | 1 | -1/+5 |
| | |||||
* | matching: fix import | Martin Czygan | 2020-12-21 | 1 | -0/+1 |
| | |||||
* | update docs | Martin Czygan | 2020-12-19 | 1 | -0/+2 |
| | |||||
* | update notes | Martin Czygan | 2020-12-17 | 1 | -0/+1 |
| | |||||
* | apply style fixes | Martin Czygan | 2020-12-17 | 1 | -8/+4 |
| | |||||
* | update docs | Martin Czygan | 2020-12-17 | 1 | -3/+4 |
| | |||||
* | pass through api | Martin Czygan | 2020-12-17 | 1 | -9/+13 |
| | |||||
* | add missing function | Martin Czygan | 2020-12-16 | 1 | -1/+59 |
| | |||||
* | docs and release match command | Martin Czygan | 2020-12-16 | 1 | -1/+1 |
| | |||||
* | matching stub | Martin Czygan | 2020-12-15 | 1 | -6/+71 |
| | |||||
* | include matching (stub) | Martin Czygan | 2020-12-15 | 1 | -0/+91 |
| | |||||
* | large overhaul | Martin Czygan | 2020-08-17 | 1 | -147/+0 |
| | | | | | | * separate all fatcat related code into fatcat submodule * more type annotations * add verify_serial_name for journal names | ||||
* | adjust formatting | Martin Czygan | 2020-08-12 | 1 | -1/+2 |
| | |||||
* | fix imports | Martin Czygan | 2020-08-12 | 1 | -1/+1 |
| | |||||
* | improve docs and imports | Martin Czygan | 2020-08-12 | 1 | -9/+8 |
| | |||||
* | try: all matching methods should start with match | Martin Czygan | 2020-08-12 | 1 | -1/+1 |
| | |||||
* | add matching submodule | Martin Czygan | 2020-08-12 | 1 | -0/+147 |