aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorMartin Czygan <martin.czygan@gmail.com>2021-04-27 21:38:07 +0200
committerMartin Czygan <martin.czygan@gmail.com>2021-04-27 21:38:07 +0200
commit0cf00f57575fb71e79d9a4b1bd7b3d59a682c63a (patch)
tree4242172362932557f624297645d690eb3ed075db
parent9728cd3d48a4490b67cd7c03aa7f41de6a069771 (diff)
downloadrefcat-0cf00f57575fb71e79d9a4b1bd7b3d59a682c63a.tar.gz
refcat-0cf00f57575fb71e79d9a4b1bd7b3d59a682c63a.zip
update notes
-rw-r--r--python/notes/version_3.md26
1 files changed, 26 insertions, 0 deletions
diff --git a/python/notes/version_3.md b/python/notes/version_3.md
index 7fce20f..66840bf 100644
--- a/python/notes/version_3.md
+++ b/python/notes/version_3.md
@@ -276,3 +276,29 @@ Sidenote, also in refs:
```
How many titles have "s p a c e s" in title?
+
+----
+
+ISBN normalization.
+
+In refs, we mostly have ISBN in unstrcutured:
+
+```
+ISBN 3-906166-35-X.
+ISBN 978-0- 470-25003-7.
+Austria. ISBN 3-900051-07-0, URL 962 http://www.R-project.org. (2007).
+ISBN 88-13-19785-3
+ISBN GB3N-CL4-5HL4.
+```
+
+About 600/1M "isbn" in unstructured.
+
+```
+$ zstdcat -T0 fatcat_scholar_work_fulltext.refs.json.zst | head -1000000 | jq .biblio.unstructured | grep -c -i isbn
+675
+```
+
+So maybe 500k isbn in total?
+
+* need to find them, then validate them
+