aboutsummaryrefslogtreecommitdiffstats
path: root/README.md
diff options
context:
space:
mode:
authorMartin Czygan <martin.czygan@gmail.com>2020-09-04 18:30:55 +0200
committerMartin Czygan <martin.czygan@gmail.com>2020-09-04 18:30:55 +0200
commit87d0fd90205bc9e5ed7d849e801f6ef2ca5c077e (patch)
tree0751c03d8b7d31deb7e9e2f392316df7641425b3 /README.md
parente8fdd47282c987637ecb4a6f7fd7518cca12b8d9 (diff)
downloadfuzzycat-87d0fd90205bc9e5ed7d849e801f6ef2ca5c077e.tar.gz
fuzzycat-87d0fd90205bc9e5ed7d849e801f6ef2ca5c077e.zip
note on approach
Diffstat (limited to 'README.md')
-rw-r--r--README.md10
1 files changed, 10 insertions, 0 deletions
diff --git a/README.md b/README.md
index 9e413af..d23d00f 100644
--- a/README.md
+++ b/README.md
@@ -17,6 +17,16 @@ The goal is to group releases under works and to implement a versions feature.
This repository contains both generic code for matching as well as fatcat
specific code using the fatcat openapi client.
+## Approach
+
+There are probably a few assumption we can make:
+
+* If two strings are given, an exact string match does not mean equality (at
+ all), e.g. "Acta geographica" has currently eight associated ISSN, and a
+title like "Buchbesprechungen" appears many hundreds of times.
+* ...
+* ...
+
## Datasets
* release and container metadata from: [https://archive.org/details/fatcat_bulk_exports_2020-08-05](https://archive.org/details/fatcat_bulk_exports_2020-08-05).