Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | update notes on cluster, nb | Martin Czygan | 2020-10-22 | 1 | -1/+47 |
| | |||||
* | update notes on clustering | Martin Czygan | 2020-10-22 | 1 | -0/+18 |
| | |||||
* | update cluster notes | Martin Czygan | 2020-10-22 | 1 | -0/+27 |
| | |||||
* | notes: clustering | Martin Czygan | 2020-10-22 | 1 | -0/+11 |
| | |||||
* | cluster variants | Martin Czygan | 2020-10-21 | 1 | -0/+54 |
| | |||||
* | update various docs; start data issue log | Martin Czygan | 2020-09-03 | 2 | -1/+1 |
| | |||||
* | add notes on abbrevs | Martin Czygan | 2020-08-15 | 2 | -0/+2260 |
| | |||||
* | update plan | Martin Czygan | 2020-08-14 | 1 | -0/+5 |
| | |||||
* | note on optimization: marisa-trie | Martin Czygan | 2020-08-12 | 1 | -0/+1 |
| | | | | | | | | | | Currently, the JSON mapping is 172M, turning this into a dict takes a bit, plus consumes GBs of memory. For exact lookups, we might want to use marisa-trie: > String data in a MARISA-trie may take up to 50x-100x less memory than in a standard Python dict; the raw lookup speed is comparable; trie also provides fast advanced methods like prefix search. | ||||
* | add notes/todo | Martin Czygan | 2020-08-12 | 1 | -0/+17 |