From d6b744dbbcd8e8e13ecb03fe267f59bceeda933a Mon Sep 17 00:00:00 2001 From: Martin Czygan Date: Sat, 21 Aug 2021 13:19:20 +0200 Subject: update notes --- notes/maintenance.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/notes/maintenance.md b/notes/maintenance.md index 448e73e..16fab4d 100644 --- a/notes/maintenance.md +++ b/notes/maintenance.md @@ -21,8 +21,10 @@ For example: 2. it refereces articles and web pages, books, etc; we can get this information from the data or grobid 3. we lookup the title on P in some existing data store; we lookup normalized title in some normalized data store; we could just exact of fuzzy match - against elasticsearch; we generate match candidates, e.g. where all references live -4. we verify matches + against elasticsearch; we generate match candidates, e.g. where all references + live (here: batch requires high performance, whereas continuous would be about + order of 100K per day ). +4. we verify matches (here: batch needs to be fast again; 1M/min or the like) 5. we update the index and add new edges between document 6. we add all references found into the "reference store" -- cgit v1.2.3