aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorMartin Czygan <martin.czygan@gmail.com>2021-08-21 01:05:48 +0200
committerMartin Czygan <martin.czygan@gmail.com>2021-08-21 01:05:48 +0200
commit0a4703adc9ba9e8797296b7cedb0b38ef426beb7 (patch)
tree8eca4aeec9a3736492421923f91ba91aa6930182
parent1ef3787da274ea6cf0d0c7e132e75082b15d675f (diff)
downloadrefcat-0a4703adc9ba9e8797296b7cedb0b38ef426beb7.tar.gz
refcat-0a4703adc9ba9e8797296b7cedb0b38ef426beb7.zip
add notes
-rw-r--r--notes/gitlab_runner_outage.md5
-rw-r--r--notes/maintenance.md21
-rw-r--r--python/refcat/tasks.py4
3 files changed, 28 insertions, 2 deletions
diff --git a/notes/gitlab_runner_outage.md b/notes/gitlab_runner_outage.md
new file mode 100644
index 0000000..525d48a
--- /dev/null
+++ b/notes/gitlab_runner_outage.md
@@ -0,0 +1,5 @@
+# GitLab Runner Docker
+
+* https://stackoverflow.com/questions/50325932/gitlab-runner-docker-could-not-resolve-host
+
+Trying `clone_url` with ip, https://docs.gitlab.com/runner/configuration/advanced-configuration.html.
diff --git a/notes/maintenance.md b/notes/maintenance.md
new file mode 100644
index 0000000..70a77a4
--- /dev/null
+++ b/notes/maintenance.md
@@ -0,0 +1,21 @@
+# Maintenance Notes
+
+## Continuous Update Ideas
+
+Currently, we derive the graph from raw data blob, e.g. references, fatcat
+database, open library database dump, wikipedia dump.
+
+Goal would be to start a service and let the graph index (or whatever data
+store) be updated as new data arrives.
+
+For example:
+
+1. new publication (P) arrives
+2. it refereces articles and web pages, books, etc; we can get this information from the data or grobid
+3. we lookup the title on P in some existing data store; we lookup normalized
+ title in some normalized data store; we could just exact of fuzzy match
+ against elasticsearch; we generate match candidates, e.g. where all references live
+4. we verify matches
+5. we update the index and add new edges between document
+6. we add all references found into the "reference store"
+
diff --git a/python/refcat/tasks.py b/python/refcat/tasks.py
index b92bd56..591acbd 100644
--- a/python/refcat/tasks.py
+++ b/python/refcat/tasks.py
@@ -141,8 +141,8 @@ from graph, and if not available use raw input.
> QA things
-* [ ] find duplicates and clean them up
-* [ ] generate stats on match types
+* [x] find duplicates and clean them up
+* [x] generate stats on match types
TODO: Unmatched
---------------