aboutsummaryrefslogtreecommitdiffstats
path: root/python/notes
diff options
context:
space:
mode:
Diffstat (limited to 'python/notes')
-rw-r--r--python/notes/version_3.md26
1 files changed, 26 insertions, 0 deletions
diff --git a/python/notes/version_3.md b/python/notes/version_3.md
index 7fce20f..66840bf 100644
--- a/python/notes/version_3.md
+++ b/python/notes/version_3.md
@@ -276,3 +276,29 @@ Sidenote, also in refs:
```
How many titles have "s p a c e s" in title?
+
+----
+
+ISBN normalization.
+
+In refs, we mostly have ISBN in unstrcutured:
+
+```
+ISBN 3-906166-35-X.
+ISBN 978-0- 470-25003-7.
+Austria. ISBN 3-900051-07-0, URL 962 http://www.R-project.org. (2007).
+ISBN 88-13-19785-3
+ISBN GB3N-CL4-5HL4.
+```
+
+About 600/1M "isbn" in unstructured.
+
+```
+$ zstdcat -T0 fatcat_scholar_work_fulltext.refs.json.zst | head -1000000 | jq .biblio.unstructured | grep -c -i isbn
+675
+```
+
+So maybe 500k isbn in total?
+
+* need to find them, then validate them
+