diff options
author | Bryan Newbold <bnewbold@archive.org> | 2022-12-23 15:24:33 -0800 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2022-12-23 15:24:33 -0800 |
commit | c42a7f46d29ca9d6e7c84b1b0617f272f71f25a5 (patch) | |
tree | efe52857722e33473626b74e41a5fcdbcdd30a6c /notes/examples/random_datasets.md | |
parent | 3a7a946a1f271cd0f334129fa3bb51c451b82966 (diff) | |
download | sandcrawler-c42a7f46d29ca9d6e7c84b1b0617f272f71f25a5.tar.gz sandcrawler-c42a7f46d29ca9d6e7c84b1b0617f272f71f25a5.zip |
notes: old examples
Diffstat (limited to 'notes/examples/random_datasets.md')
-rw-r--r-- | notes/examples/random_datasets.md | 19 |
1 files changed, 19 insertions, 0 deletions
diff --git a/notes/examples/random_datasets.md b/notes/examples/random_datasets.md new file mode 100644 index 0000000..b69132c --- /dev/null +++ b/notes/examples/random_datasets.md @@ -0,0 +1,19 @@ + +Possible external datasets to ingest (which are not entire platforms): + +- https://research.google/tools/datasets/ +- https://openslr.org/index.html +- https://www.kaggle.com/datasets?sort=votes&tasks=true +- https://archive.ics.uci.edu/ml/datasets.php + +Existing archive.org datasets to ingest: + +- https://archive.org/details/allthemusicllc-datasets + +Papers on archive.org to ingest: + +- <https://archive.org/details/journals?and%5B%5D=%21collection%3Aarxiv+%21collection%3Ajstor_ejc+%21collection%3Apubmed&sin=> +- <https://archive.org/details/biorxiv> +- <https://archive.org/details/philosophicaltransactions?tab=collection> +- <https://archive.org/search.php?query=doi%3A%2A> +- <https://archive.org/details/folkscanomy_academic> |