update notes

author: Martin Czygan <martin.czygan@gmail.com> 2021-08-21 13:19:20 +0200
committer: Martin Czygan <martin.czygan@gmail.com> 2021-08-21 13:19:20 +0200
commit: d6b744dbbcd8e8e13ecb03fe267f59bceeda933a (patch)
tree: f40fcdc6299d5eaeeff19ebf4ec35d38d9662d72 /notes/maintenance.md
parent: f9d1857a6940e5beca5f08dc41193ad70672827b (diff)
download: refcat-d6b744dbbcd8e8e13ecb03fe267f59bceeda933a.tar.gz
refcat-d6b744dbbcd8e8e13ecb03fe267f59bceeda933a.zip
1 files changed, 4 insertions, 2 deletions
diff --git a/notes/maintenance.md b/notes/maintenance.md
index 448e73e..16fab4d 100644
--- a/notes/maintenance.md
+++ b/notes/maintenance.md
@@ -21,8 +21,10 @@ For example:
 2. it refereces articles and web pages, books, etc; we can get this information from the data or grobid
 3. we lookup the title on P in some existing data store; we lookup normalized
    title in some normalized data store; we could just exact of fuzzy match
-   against elasticsearch; we generate match candidates, e.g. where all references live
-4. we verify matches
+   against elasticsearch; we generate match candidates, e.g. where all references
+   live (here: batch requires high performance, whereas continuous would be about
+   order of 100K per day ).
+4. we verify matches (here: batch needs to be fast again; 1M/min or the like)
 5. we update the index and add new edges between document
 6. we add all references found into the "reference store"
author	Martin Czygan <martin.czygan@gmail.com>	2021-08-21 13:19:20 +0200
committer	Martin Czygan <martin.czygan@gmail.com>	2021-08-21 13:19:20 +0200
commit	d6b744dbbcd8e8e13ecb03fe267f59bceeda933a (patch)
tree	f40fcdc6299d5eaeeff19ebf4ec35d38d9662d72 /notes/maintenance.md
parent	f9d1857a6940e5beca5f08dc41193ad70672827b (diff)
download	refcat-d6b744dbbcd8e8e13ecb03fe267f59bceeda933a.tar.gz refcat-d6b744dbbcd8e8e13ecb03fe267f59bceeda933a.zip