note on optimization: marisa-trie

Currently, the JSON mapping is 172M, turning this into a dict takes a bit, plus consumes GBs of memory. For exact lookups, we might want to use marisa-trie: > String data in a MARISA-trie may take up to 50x-100x less memory than in a standard Python dict; the raw lookup speed is comparable; trie also provides fast advanced methods like prefix search.
author: Martin Czygan <martin.czygan@gmail.com> 2020-08-12 15:05:51 +0200
committer: Martin Czygan <martin.czygan@gmail.com> 2020-08-12 15:05:51 +0200
commit: 0b4db31a797a582c25942e693d531ee37b618674 (patch)
tree: 3363c9ba42e711234a911931e65ac184520892a3 /notes
parent: 703fdbebc53352036bfa9e9a13599421e38d949e (diff)
download: fuzzycat-0b4db31a797a582c25942e693d531ee37b618674.tar.gz
fuzzycat-0b4db31a797a582c25942e693d531ee37b618674.zip
1 files changed, 1 insertions, 0 deletions
diff --git a/notes/plan.md b/notes/plan.md
index 1660f25..0e319ae 100644
--- a/notes/plan.md
+++ b/notes/plan.md
@@ -7,6 +7,7 @@
 ## Containers
 
 * [ ] create notebook on duplicates
+* [ ] static mapping, that is efficient to store, maybe via: https://github.com/pytries/marisa-trie
 
 ## Bulk
author	Martin Czygan <martin.czygan@gmail.com>	2020-08-12 15:05:51 +0200
committer	Martin Czygan <martin.czygan@gmail.com>	2020-08-12 15:05:51 +0200
commit	0b4db31a797a582c25942e693d531ee37b618674 (patch)
tree	3363c9ba42e711234a911931e65ac184520892a3 /notes
parent	703fdbebc53352036bfa9e9a13599421e38d949e (diff)
download	fuzzycat-0b4db31a797a582c25942e693d531ee37b618674.tar.gz fuzzycat-0b4db31a797a582c25942e693d531ee37b618674.zip