diff options
author | Bryan Newbold <bnewbold@archive.org> | 2020-04-03 15:16:17 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2020-04-03 15:16:17 -0700 |
commit | fb767adb9472ff85b46b5a383f3986950b12dd27 (patch) | |
tree | 724af4412353c627b0eae26fd4d7fd1164bf2b55 | |
parent | 4cbbdf33ee2a9651f79f96e4bf290d8bc721f69d (diff) | |
download | fatcat-covid19-fb767adb9472ff85b46b5a383f3986950b12dd27.tar.gz fatcat-covid19-fb767adb9472ff85b46b5a383f3986950b12dd27.zip |
move more directories around
-rw-r--r-- | extra/scrape/.gitignore (renamed from scrape/.gitignore) | 0 | ||||
-rw-r--r-- | extra/scrape/README.md (renamed from scrape/README.md) | 0 | ||||
-rwxr-xr-x | extra/scrape/parse_cnki_tables.py (renamed from scrape/parse_cnki_tables.py) | 0 | ||||
-rwxr-xr-x | extra/scrape/parse_wanfang_html.py (renamed from scrape/parse_wanfang_html.py) | 0 | ||||
-rw-r--r-- | notes/fulltext_search.md | 3 | ||||
-rw-r--r-- | notes/machine_translation.md | 22 |
6 files changed, 25 insertions, 0 deletions
diff --git a/scrape/.gitignore b/extra/scrape/.gitignore index b2bc71b..b2bc71b 100644 --- a/scrape/.gitignore +++ b/extra/scrape/.gitignore diff --git a/scrape/README.md b/extra/scrape/README.md index 97bb6fe..97bb6fe 100644 --- a/scrape/README.md +++ b/extra/scrape/README.md diff --git a/scrape/parse_cnki_tables.py b/extra/scrape/parse_cnki_tables.py index 3763550..3763550 100755 --- a/scrape/parse_cnki_tables.py +++ b/extra/scrape/parse_cnki_tables.py diff --git a/scrape/parse_wanfang_html.py b/extra/scrape/parse_wanfang_html.py index 85187f5..85187f5 100755 --- a/scrape/parse_wanfang_html.py +++ b/extra/scrape/parse_wanfang_html.py diff --git a/notes/fulltext_search.md b/notes/fulltext_search.md new file mode 100644 index 0000000..938abff --- /dev/null +++ b/notes/fulltext_search.md @@ -0,0 +1,3 @@ + +https://ambar.cloud/blog/2016/11/24/es-hl-large-docs/ +https://ambar.cloud/blog/2017/01/02/es-large-text/ diff --git a/notes/machine_translation.md b/notes/machine_translation.md new file mode 100644 index 0000000..519f0c1 --- /dev/null +++ b/notes/machine_translation.md @@ -0,0 +1,22 @@ + +Overall concept is to use machine translation to fill gaps or provide a +starting point for human translation. Quality would obviously be best from +experience human translators with domain knowledge. + +Ideas: +- translate GROBID XML or JSON output offline +- call a platform/API to translate individual papers +- can directly link to google translate (or other platform) of the paper + +## Free Software + +NiuTrans: Chinese-to-english focus, C++ + +Apertium: popular, but no Chinese? + +Moses: no Chinese examples? + + +## Platforms + +Google Translate |