aboutsummaryrefslogtreecommitdiffstats
path: root/extra/sitemap/README.md
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2021-04-29 10:03:47 -0700
committerBryan Newbold <bnewbold@archive.org>2021-04-29 10:03:47 -0700
commit6b1f87c12f7d40a3016910b214579a368c747df4 (patch)
treed52e555663a3fe395fe0024098735adaf8e10494 /extra/sitemap/README.md
parent4b152e02d1a0d0d7a9a391ed211ecd6f304d6962 (diff)
downloadfatcat-scholar-6b1f87c12f7d40a3016910b214579a368c747df4.tar.gz
fatcat-scholar-6b1f87c12f7d40a3016910b214579a368c747df4.zip
sitemap generation
Diffstat (limited to 'extra/sitemap/README.md')
-rw-r--r--extra/sitemap/README.md21
1 files changed, 21 insertions, 0 deletions
diff --git a/extra/sitemap/README.md b/extra/sitemap/README.md
new file mode 100644
index 0000000..242378a
--- /dev/null
+++ b/extra/sitemap/README.md
@@ -0,0 +1,21 @@
+
+## HOWTO: Update
+
+Requires [fatcat-cli](https://gitlab.com/bnewbold/fatcat-cli) and `jq`
+installed. Run these commands on a production machine.
+
+ cd /srv/fatcat_scholar/sitemap
+ export DATE=`date --iso-8601`
+ /srv/fatcat_scholar/src/extra/sitemap/work_urls_query.sh $DATE
+ rm *.txt.gz
+ /srv/fatcat/src/extra/sitemap/generate_sitemap_indices.py
+
+## Background
+
+Google has a limit of 50k lines / 10 MByte for text sitemap files, and 50K
+lines / 50 MByte for XML site map files. Google Scholar has indicated a smaller
+20k URL / 5 MB limit.
+
+## Resources
+
+Google sitemap verifier: https://support.google.com/webmasters/answer/7451001