Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | turn "match_release_fuzzy" into a class | Martin Czygan | 2021-11-16 | 1 | -4/+3 |
| | | | | | | | | Goal of this refactoring was to make the matching process a bit more configurable by using a class and a cascade of queries. For a limited test set: `FuzzyReleaseMatcher.match` is works the same as `match_release_fuzzy`. | ||||
* | start larger refactoring: remove cluster | Martin Czygan | 2021-09-24 | 1 | -8/+8 |
| | | | | | | | | | | | | | | | | | | background: verifying hundreds of millions of documents turned out to be a bit slow; anecdata: running clustering and verification over 1.8B inputs tooks over 50h; cf. the Go port (skate) required about 2-4h for those operations. Also: with Go we do not need the extra GNU parallel wrapping. In any case, we aim for fuzzycat refactoring to provide: * better, more configurable verification and small scale matching * removal of batch clustering code (and improve refcat docs) * a place for a bit more generic, similarity based utils The most important piece in fuzzycat is a CSV file containing hand picked test examples for verification - and the code that is able to fulfill that test suite. We want to make this part more robust. | ||||
* | style: apply formatting | Martin Czygan | 2021-09-21 | 1 | -5/+4 |
| | |||||
* | DOI clean/normalize helper; and use in verification etc | Bryan Newbold | 2021-07-01 | 1 | -3/+3 |
| | |||||
* | verify: page count parsing and comparison improvements | Bryan Newbold | 2021-07-01 | 1 | -1/+3 |
| | |||||
* | workaround for a case found in refs: | Martin Czygan | 2021-02-15 | 1 | -0/+6 |
| | | | | | | | | | | | | | | | | | | * https://fatcat.wiki/release/2n7pyugxenb73gope52bn6m2ru * https://fatcat.wiki/release/p4bettvcszgn5d3zls5ogdjk4u Refs: Niaudet P. Steroid-sensitive idiopathic nephrotic syndrome in children. Pediatric Nephrology. 5th ed. Philadelphia: Lippincott Williams & Wilkins, 2004; pp 543–556. Doc: * https://fatcat.wiki/release/lc3d5q62zfa2rjyk2m7nr346nm, T-lymphocyte activation in steroid-sensitive nephrotic syndrome in childhood, by T J Neuhaus, V Shah, R E Callard, T M Barratt | ||||
* | format docs | Martin Czygan | 2021-01-09 | 1 | -1/+2 |
| | |||||
* | case: translation in title | Martin Czygan | 2021-01-08 | 1 | -0/+10 |
| | |||||
* | add cases | Martin Czygan | 2021-01-04 | 1 | -3/+5 |
| | |||||
* | tweak year a bit more, add case | Martin Czygan | 2020-12-24 | 1 | -1/+1 |
| | |||||
* | tweak comparison, add test | Martin Czygan | 2020-12-24 | 1 | -1/+1 |
| | |||||
* | matching: fix import | Martin Czygan | 2020-12-21 | 1 | -2/+3 |
| | |||||
* | add verify_release_entities wrapper | Martin Czygan | 2020-12-19 | 1 | -1/+6 |
| | |||||
* | rename reason: dummy to unknown | Martin Czygan | 2020-12-18 | 1 | -2/+2 |
| | |||||
* | update notes | Martin Czygan | 2020-12-17 | 1 | -0/+2 |
| | |||||
* | remove obsolete assert | Martin Czygan | 2020-12-17 | 1 | -3/+0 |
| | |||||
* | check doi existence | Martin Czygan | 2020-12-17 | 1 | -1/+3 |
| | |||||
* | update stats | Martin Czygan | 2020-12-17 | 1 | -41/+40 |
| | |||||
* | update docs | Martin Czygan | 2020-12-17 | 1 | -8/+3 |
| | |||||
* | add flags | Martin Czygan | 2020-12-17 | 1 | -1/+1 |
| | |||||
* | single item verification | Martin Czygan | 2020-12-15 | 1 | -48/+50 |
| | |||||
* | update docs | Martin Czygan | 2020-12-14 | 1 | -0/+7 |
| | |||||
* | verify: move out some code to utils | Martin Czygan | 2020-12-14 | 1 | -16/+6 |
| | |||||
* | update docs | Martin Czygan | 2020-12-12 | 1 | -0/+1 |
| | |||||
* | update docs | Martin Czygan | 2020-12-12 | 1 | -15/+28 |
| | |||||
* | get rid of magic strings | Martin Czygan | 2020-12-12 | 1 | -9/+12 |
| | |||||
* | update readme | Martin Czygan | 2020-12-12 | 1 | -45/+56 |
| | |||||
* | fix imports | Martin Czygan | 2020-12-12 | 1 | -1/+1 |
| | |||||
* | move helper function into method | Martin Czygan | 2020-12-12 | 1 | -4/+3 |
| | |||||
* | add type hint | Martin Czygan | 2020-12-12 | 1 | -1/+2 |
| | |||||
* | get rid of 'ok' and 'miss' | Martin Czygan | 2020-12-12 | 1 | -52/+52 |
| | |||||
* | add generic doi version case | Martin Czygan | 2020-12-11 | 1 | -17/+16 |
| | |||||
* | 158 cases | Martin Czygan | 2020-12-10 | 1 | -0/+2 |
| | |||||
* | add cases | Martin Czygan | 2020-12-10 | 1 | -0/+22 |
| | |||||
* | add versioned doi pattern | Martin Czygan | 2020-12-10 | 1 | -1/+12 |
| | |||||
* | pmid doi pair case | Martin Czygan | 2020-12-10 | 1 | -1/+1 |
| | |||||
* | separate static data | Martin Czygan | 2020-12-10 | 1 | -3225/+8 |
| | |||||
* | add a few more dummy cases | Martin Czygan | 2020-12-09 | 1 | -0/+9 |
| | |||||
* | add subdoc case | Martin Czygan | 2020-12-09 | 1 | -1/+9 |
| | |||||
* | update verify.csv | Martin Czygan | 2020-12-09 | 1 | -1/+2 |
| | |||||
* | add two more cases | Martin Czygan | 2020-12-09 | 1 | -1/+24 |
| | |||||
* | add another case | Martin Czygan | 2020-12-09 | 1 | -0/+14 |
| | |||||
* | another case | Martin Czygan | 2020-12-09 | 1 | -0/+18 |
| | |||||
* | publications over N year apart are most likely different | Martin Czygan | 2020-12-04 | 1 | -2/+11 |
| | | | | N=40 hardcoded for now, but should be probably a parameter. | ||||
* | update cases; ok.work_id | Martin Czygan | 2020-12-04 | 1 | -0/+3 |
| | |||||
* | case: ignore choice review | Martin Czygan | 2020-12-04 | 1 | -0/+10 |
| | |||||
* | add case | Martin Czygan | 2020-12-03 | 1 | -0/+12 |
| | |||||
* | add iop case | Martin Czygan | 2020-12-02 | 1 | -2/+12 |
| | |||||
* | add cases | Martin Czygan | 2020-12-02 | 1 | -1/+11 |
| | |||||
* | add case | Martin Czygan | 2020-12-02 | 1 | -0/+7 |
| |