diff options
author | Bryan Newbold <bnewbold@robocracy.org> | 2019-05-07 17:30:10 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@robocracy.org> | 2019-05-07 17:30:10 -0700 |
commit | ac4f52745fd1bde08a4655a1737e0d3e085abee7 (patch) | |
tree | 7e2006ca9e7f618029ffef67326814c478e9136b | |
parent | e812de925ab60b3fb20a58a65dc897336487e34a (diff) | |
download | fatcat-bnewbold-sitemap.tar.gz fatcat-bnewbold-sitemap.zip |
WIP: sitemap.xml notes/templatebnewbold-sitemap
-rw-r--r-- | extra/sitemap/README.md | 23 | ||||
-rw-r--r-- | extra/sitemap/sitemap.xml | 6 |
2 files changed, 29 insertions, 0 deletions
diff --git a/extra/sitemap/README.md b/extra/sitemap/README.md new file mode 100644 index 00000000..6963bb1f --- /dev/null +++ b/extra/sitemap/README.md @@ -0,0 +1,23 @@ + +Google has a limit of 50k lines / 10 MByte for text sitemap files, and 50K +lines / 50 MByte for XML site map files. + +With a baseline of 100 million entities, that requires an index file pointing +to at least 2000x individual sitemaps. 3 hex characters is 12 bits, or 4096 +options; seems like an ok granularity to start with. + +Should look in to what archive.org does to generate their sitemap.xml, seems +simple, and comes in batches of exactly 50k. + +## Text Sitemaps + +Should be possible to create simple text-style sitemaps, one URL per line, and +link to these from a sitemap index. This is appealing because the sitemaps can +be generated very quickly from identifier SQL dump files, run through UNIX +commands (eg, to split and turn into URLs). Some script to create an XML +sitemap index to point at all the sitemaps would still be needed though. + + +## Resources + +Google sitemap verifier: https://support.google.com/webmasters/answer/7451001 diff --git a/extra/sitemap/sitemap.xml b/extra/sitemap/sitemap.xml new file mode 100644 index 00000000..4404bdc2 --- /dev/null +++ b/extra/sitemap/sitemap.xml @@ -0,0 +1,6 @@ +<?xml version="1.0" encoding="UTF-8"?> +<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> + <url> + <loc>{{page[0]|safe}}</loc> + </url> +</urlset> |