From 0b4db31a797a582c25942e693d531ee37b618674 Mon Sep 17 00:00:00 2001
From: Martin Czygan <martin.czygan@gmail.com>
Date: Wed, 12 Aug 2020 15:05:51 +0200
Subject: note on optimization: marisa-trie

Currently, the JSON mapping is 172M, turning this into a dict takes a
bit, plus consumes GBs of memory. For exact lookups, we might want to
use marisa-trie:

> String data in a MARISA-trie may take up to 50x-100x less memory than
in a standard Python dict; the raw lookup speed is comparable; trie also
provides fast advanced methods like prefix search.
---
 notes/plan.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/notes/plan.md b/notes/plan.md
index 1660f25..0e319ae 100644
--- a/notes/plan.md
+++ b/notes/plan.md
@@ -7,6 +7,7 @@
 ## Containers
 
 * [ ] create notebook on duplicates
+* [ ] static mapping, that is efficient to store, maybe via: https://github.com/pytries/marisa-trie
 
 ## Bulk
 
-- 
cgit v1.2.3