diff options
author | Bryan Newbold <bnewbold@archive.org> | 2021-10-04 16:13:30 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2021-10-15 18:15:25 -0700 |
commit | 147319ae00a6b788104209083f65cbaa4329c862 (patch) | |
tree | b3a880781acccbff1f298b74f28b067b670df605 /proposals | |
parent | 271f110e5ad4091e8d683b4365bc565ae0466916 (diff) | |
download | sandcrawler-147319ae00a6b788104209083f65cbaa4329c862.tar.gz sandcrawler-147319ae00a6b788104209083f65cbaa4329c862.zip |
dataset ingest: start enumerating examples
Diffstat (limited to 'proposals')
-rw-r--r-- | proposals/2021-09-09_dataset_ingest.md | 34 |
1 files changed, 34 insertions, 0 deletions
diff --git a/proposals/2021-09-09_dataset_ingest.md b/proposals/2021-09-09_dataset_ingest.md index 801a8e5..d4d2be4 100644 --- a/proposals/2021-09-09_dataset_ingest.md +++ b/proposals/2021-09-09_dataset_ingest.md @@ -183,3 +183,37 @@ Second implement fatcat importer and test locally and/or in QA. Lastly implement infrastructure, automation, and other "glue". + +## Example Entities + +### ArchiveOrg: CAT dataset + +<https://archive.org/details/CAT_DATASET> + +`release_36vy7s5gtba67fmyxlmijpsaui` + +### + +<https://archive.org/details/academictorrents_70e0794e2292fc051a13f05ea6f5b6c16f3d3635> + +doi:10.1371/journal.pone.0120448 + +Single .rar file + +### Dataverse + +<https://dataverse.rsu.lv/dataset.xhtml?persistentId=doi:10.48510/FK2/IJO02B> + +Single excel file + +### Dataverse + +<https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/CLSFKX&version=1.1> + +doi:10.7910/DVN/CLSFKX + +Mulitple files; multiple versions? + +Single file inside: + +<https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/CLSFKX/XWEHBB> |