aboutsummaryrefslogtreecommitdiffstats
path: root/notes
diff options
context:
space:
mode:
authorMartin Czygan <martin.czygan@gmail.com>2020-08-12 15:05:51 +0200
committerMartin Czygan <martin.czygan@gmail.com>2020-08-12 15:05:51 +0200
commit0b4db31a797a582c25942e693d531ee37b618674 (patch)
tree3363c9ba42e711234a911931e65ac184520892a3 /notes
parent703fdbebc53352036bfa9e9a13599421e38d949e (diff)
downloadfuzzycat-0b4db31a797a582c25942e693d531ee37b618674.tar.gz
fuzzycat-0b4db31a797a582c25942e693d531ee37b618674.zip
note on optimization: marisa-trie
Currently, the JSON mapping is 172M, turning this into a dict takes a bit, plus consumes GBs of memory. For exact lookups, we might want to use marisa-trie: > String data in a MARISA-trie may take up to 50x-100x less memory than in a standard Python dict; the raw lookup speed is comparable; trie also provides fast advanced methods like prefix search.
Diffstat (limited to 'notes')
-rw-r--r--notes/plan.md1
1 files changed, 1 insertions, 0 deletions
diff --git a/notes/plan.md b/notes/plan.md
index 1660f25..0e319ae 100644
--- a/notes/plan.md
+++ b/notes/plan.md
@@ -7,6 +7,7 @@
## Containers
* [ ] create notebook on duplicates
+* [ ] static mapping, that is efficient to store, maybe via: https://github.com/pytries/marisa-trie
## Bulk