aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorMartin Czygan <martin.czygan@gmail.com>2022-01-11 10:46:20 +0100
committerMartin Czygan <martin.czygan@gmail.com>2022-01-11 10:46:20 +0100
commit97b293dbed0b699602d88889224677b6b4e8d7e5 (patch)
tree6adcc13cc55a1449eaaa5f14434355291bb844ad
parenta07ad381d1bc98d803c951a5088de70f6039393d (diff)
downloadrefcat-97b293dbed0b699602d88889224677b6b4e8d7e5.tar.gz
refcat-97b293dbed0b699602d88889224677b6b4e8d7e5.zip
notes: refcat update
-rw-r--r--notes/2022_01_10_refcat_update.md15
1 files changed, 15 insertions, 0 deletions
diff --git a/notes/2022_01_10_refcat_update.md b/notes/2022_01_10_refcat_update.md
new file mode 100644
index 0000000..795a9d4
--- /dev/null
+++ b/notes/2022_01_10_refcat_update.md
@@ -0,0 +1,15 @@
+# Refcat update
+
+* new refs export, about 10% more (2.7B)
+* new fatcat export
+
+New wikipedia extraction:
+
+```
+martin@ia601101:/magna/data/wikipedia_citations_2020-07-14 $ LC_ALL=C grep ID_list minimal_dataset.json | grep -c DOI
+1442189
+
+$ jq -rc '.refs[] | select(.ID_list != null) | {"URL": .URL, "Title": .title, "ID_list": .ID_list}' enwiki-20211201-pages-articles.citations.json | pv -l > minimal.json
+$ grep -c DOI minimal.json
+1932578
+```