aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--README.md19
-rw-r--r--notes/Abbreviations.md (renamed from notes/abbrev.md)0
-rw-r--r--notes/Todo.md (renamed from notes/plan.md)2
-rw-r--r--projects/grobid_refs/.gitignore2
-rw-r--r--projects/grobid_refs/README.md5
5 files changed, 26 insertions, 2 deletions
diff --git a/README.md b/README.md
index 7c6468d..2fe2e5e 100644
--- a/README.md
+++ b/README.md
@@ -42,3 +42,22 @@ user 29177m5.516s
sys 4927m3.277s
```
+## Data issues
+
+### A republised article
+
+There is "student BMJ" and "BMJ" - this (html) article (interview) has been
+first published on "sbmj" (Published 07 July 2011), then "bmj" (Published 10
+August 2011).
+
+> Notes; Originally published as: Student BMJ 2011;19:d3983
+
+* https://www.bmj.com/content/343/sbmj.d3983
+* https://www.bmj.com/content/343/bmj.d4964
+
+It is essentially the same text, same title, author, just different DOI and
+probably a different recorded date.
+
+Generic pattern "republication" duplicate:
+
+* metadata mostly same, except date and doi
diff --git a/notes/abbrev.md b/notes/Abbreviations.md
index 5106d5b..5106d5b 100644
--- a/notes/abbrev.md
+++ b/notes/Abbreviations.md
diff --git a/notes/plan.md b/notes/Todo.md
index 94c1297..2c548b0 100644
--- a/notes/plan.md
+++ b/notes/Todo.md
@@ -1,4 +1,4 @@
-# Plan
+# Todo
## Releases
diff --git a/projects/grobid_refs/.gitignore b/projects/grobid_refs/.gitignore
new file mode 100644
index 0000000..bd98a73
--- /dev/null
+++ b/projects/grobid_refs/.gitignore
@@ -0,0 +1,2 @@
+*.pdf
+
diff --git a/projects/grobid_refs/README.md b/projects/grobid_refs/README.md
index 13ca3fc..15eaae0 100644
--- a/projects/grobid_refs/README.md
+++ b/projects/grobid_refs/README.md
@@ -2,5 +2,8 @@
References extracted from [grobid](https://grobid.readthedocs.io).
-Example grobid output: [grobid.tei.xml](grobid.tei.xml).
+Example grobid outputs:
+
+* [grobid.tei.xml](grobid.tei.xml), [pdf](http://dss.in.tum.de/files/brandt-research/me.pdf) -- here grobid does not extract many refs; GS looks ok
+* [](), [pdf](https://ia803202.us.archive.org/21/items/jstor-1064270/1064270.pdf)