From 3bf4710e2e63eb6706b444fc244a8cdfe59fac0c Mon Sep 17 00:00:00 2001
From: Martin Czygan <martin.czygan@gmail.com>
Date: Wed, 21 Apr 2021 20:15:53 +0200
Subject: note on dups

---
 python/notes/version_3.md | 12 ++++++++++++
 1 file changed, 12 insertions(+)

(limited to 'python/notes')

diff --git a/python/notes/version_3.md b/python/notes/version_3.md
index 0656d39..4ed4df4 100644
--- a/python/notes/version_3.md
+++ b/python/notes/version_3.md
@@ -2,12 +2,21 @@
 
 V2 plus:
 
+* [ ] no dups
 * [ ] unmatched
 * [ ] wikipedia
 * [ ] some unstrucutured refs
 * [ ] OL
 * [ ] weblinks
 
+## Duplicates
+
+```
+$ zstdcat -T0 /magna/refcat/BiblioRefV2/date-2021-02-20.json.zst | jq -rc 'select(.source_release_ident == .target_release_ident)'
+```
+
+Only 0.001% though.
+
 ## Unstructured
 
 * about 300M w/o title, etc.
@@ -250,3 +259,6 @@ Options:
 * can sort refs by source ident
 
 That's almost the same, as the matching process, just another function working on the match group.
+
+----
+
-- 
cgit v1.2.3