aboutsummaryrefslogtreecommitdiffstats
path: root/notes/maintenance.md
diff options
context:
space:
mode:
Diffstat (limited to 'notes/maintenance.md')
-rw-r--r--notes/maintenance.md21
1 files changed, 21 insertions, 0 deletions
diff --git a/notes/maintenance.md b/notes/maintenance.md
new file mode 100644
index 0000000..70a77a4
--- /dev/null
+++ b/notes/maintenance.md
@@ -0,0 +1,21 @@
+# Maintenance Notes
+
+## Continuous Update Ideas
+
+Currently, we derive the graph from raw data blob, e.g. references, fatcat
+database, open library database dump, wikipedia dump.
+
+Goal would be to start a service and let the graph index (or whatever data
+store) be updated as new data arrives.
+
+For example:
+
+1. new publication (P) arrives
+2. it refereces articles and web pages, books, etc; we can get this information from the data or grobid
+3. we lookup the title on P in some existing data store; we lookup normalized
+ title in some normalized data store; we could just exact of fuzzy match
+ against elasticsearch; we generate match candidates, e.g. where all references live
+4. we verify matches
+5. we update the index and add new edges between document
+6. we add all references found into the "reference store"
+