| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We keep the name, since the api - "matcher.match(release)" - is the
same; simplified queries; at most one query is performed against
elasticsearch; parallel release retrieval from the API; optional support
for release year windows;
Test cases are expressed in yaml and will be auto-loaded from the
specified directory; test work against the current search endpoint,
which means the actual output may change on index updates; for the
moment, we think this setup is relatively simple and not too unstable.
about: title contrib, partial name
input: >
{
"contribs": [
{
"raw_name": "Adams"
}
],
"title": "digital libraries",
"ext_ids": {}
}
release_year_padding: 1
expected:
- 7rmvqtrb2jdyhcxxodihzzcugy
- a2u6ougtsjcbvczou6sazsulcm
- dy45vilej5diros6zmax46nm4e
- exuwhhayird4fdjmmsiqpponlq
- gqrj7jikezgcfpjfazhpf4e7c4
- mkmqt3453relbpuyktnmsg6hjq
- t2g5sl3dgzchtnq7dynxyzje44
- t4tvenhrvzamraxrvvxivxmvga
- wd3oeoi3bffknfbg2ymleqc4ja
- y63a6dhrfnb7bltlxfynydbojy
|
|
|
|
|
| |
Instead, use `FuzzyReleaseMatcher.match`, which has approximately the
same behavior.
|
| |
|
|
|
|
|
|
|
|
| |
Goal of this refactoring was to make the matching process a bit more
configurable by using a class and a cascade of queries.
For a limited test set: `FuzzyReleaseMatcher.match` is works the same as
`match_release_fuzzy`.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This makes several downstream applications simpler, like showing PDF
links without an additional fatcat API fetch. The 'contrib' entities may
be required as part of bibliographic matching (checking the creator
names as well as the release-local versions of the name)
In theory we could add webcaptures,filesets as well, but those are still
rare, and occasionally result in very large sub-documents.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
background: verifying hundreds of millions of documents turned out to be
a bit slow; anecdata: running clustering and verification over 1.8B
inputs tooks over 50h; cf. the Go port (skate) required about 2-4h for
those operations. Also: with Go we do not need the extra GNU parallel
wrapping.
In any case, we aim for fuzzycat refactoring to provide:
* better, more configurable verification and small scale matching
* removal of batch clustering code (and improve refcat docs)
* a place for a bit more generic, similarity based utils
The most important piece in fuzzycat is a CSV file containing hand
picked test examples for verification - and the code that is able to
fulfill that test suite. We want to make this part more robust.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
Some of these are a little redundant, in that calling code could
trivially re-implement. However, I think these are good starters for
stable external API interfaces, leaving us room to iterate and refactor
lower-level implementations behind the scenes.
|
| |
|
|
|
|
|
| |
This file has been passed around a couple times and should probably be
published as a pypi.org project at some point.
|
|
|
|
| |
This is the recommended way to use dynaconf.
|
| |
|
| |
|
|
|
|
| |
* https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking-changes-7.0.html
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* https://fatcat.wiki/release/2n7pyugxenb73gope52bn6m2ru
* https://fatcat.wiki/release/p4bettvcszgn5d3zls5ogdjk4u
Refs:
Niaudet P. Steroid-sensitive idiopathic nephrotic
syndrome in children. Pediatric Nephrology. 5th ed.
Philadelphia: Lippincott Williams & Wilkins, 2004; pp
543–556.
Doc:
* https://fatcat.wiki/release/lc3d5q62zfa2rjyk2m7nr346nm, T-lymphocyte
activation in steroid-sensitive nephrotic syndrome in childhood, by T J
Neuhaus, V Shah, R E Callard, T M Barratt
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|