From 250181aead188499ce8a567183d1287289a127b5 Mon Sep 17 00:00:00 2001 From: Martin Czygan Date: Sat, 31 Oct 2020 00:42:08 +0100 Subject: cleanup dirs --- projects/.gitkeep | 0 projects/README.md | 17 ---- projects/fuzzycat.png | Bin 28757 -> 0 bytes projects/grobid_refs/.gitignore | 2 - projects/grobid_refs/README.md | 22 ---- projects/grobid_refs/grobid.tei.xml | 195 ------------------------------------ projects/oai_harvest_md/.gitignore | 1 - projects/oai_harvest_md/Makefile | 5 - projects/oai_harvest_md/README.md | 20 ---- projects/titlelist/README.md | 6 -- 10 files changed, 268 deletions(-) delete mode 100644 projects/.gitkeep delete mode 100644 projects/README.md delete mode 100644 projects/fuzzycat.png delete mode 100644 projects/grobid_refs/.gitignore delete mode 100644 projects/grobid_refs/README.md delete mode 100644 projects/grobid_refs/grobid.tei.xml delete mode 100644 projects/oai_harvest_md/.gitignore delete mode 100644 projects/oai_harvest_md/Makefile delete mode 100644 projects/oai_harvest_md/README.md delete mode 100644 projects/titlelist/README.md (limited to 'projects') diff --git a/projects/.gitkeep b/projects/.gitkeep deleted file mode 100644 index e69de29..0000000 diff --git a/projects/README.md b/projects/README.md deleted file mode 100644 index bfbbaef..0000000 --- a/projects/README.md +++ /dev/null @@ -1,17 +0,0 @@ -# Datasets - -Example datasets for fuzzycat, fatcat fuzzy matching utilities. - -* repo: [fuzycat](https://github.com/miku/fuzzycat) -* data: [fuzzycat_samples](https://archive.org/details/fuzzycat_samples) - -## Grobid References (grobid_refs) - -## Title list (titlelist) - -## Name only containers (name_only_containers) - -## OAI harvest metadata - -* [https://archive.org/details/oai_harvest_20200215](https://archive.org/details/oai_harvest_20200215) -* [oai.ndjson.zst](https://archive.org/download/oai_harvest_20200215/oai.ndjson.zst) diff --git a/projects/fuzzycat.png b/projects/fuzzycat.png deleted file mode 100644 index 27f6ed4..0000000 Binary files a/projects/fuzzycat.png and /dev/null differ diff --git a/projects/grobid_refs/.gitignore b/projects/grobid_refs/.gitignore deleted file mode 100644 index bd98a73..0000000 --- a/projects/grobid_refs/.gitignore +++ /dev/null @@ -1,2 +0,0 @@ -*.pdf - diff --git a/projects/grobid_refs/README.md b/projects/grobid_refs/README.md deleted file mode 100644 index 498e68b..0000000 --- a/projects/grobid_refs/README.md +++ /dev/null @@ -1,22 +0,0 @@ -# Grobid refs - -References extracted from [grobid](https://grobid.readthedocs.io). - -## TODO - -* For a given reference string in grobid, find a matching release in fatcat. - -## Approach - -Two general ways: - -* do queries against elasticsearch, which would max out at a few hundred queries/s -* offline compute a key (e.g. title, title ngram plus authors, etc.); then do comparisons - -## Misc - -Example grobid outputs: - -* [grobid.tei.xml](grobid.tei.xml), [pdf](http://dss.in.tum.de/files/brandt-research/me.pdf) -- here grobid does not extract many refs; GS looks ok -* [](), [pdf](https://ia803202.us.archive.org/21/items/jstor-1064270/1064270.pdf) - diff --git a/projects/grobid_refs/grobid.tei.xml b/projects/grobid_refs/grobid.tei.xml deleted file mode 100644 index d484048..0000000 --- a/projects/grobid_refs/grobid.tei.xml +++ /dev/null @@ -1,195 +0,0 @@ - - - - - - Minimal Extending Sets in Tournaments - - - - - - - - - - FelixBrandt - brandtf@in.tum.de - - Institut für Informatik TU -
- München -
-
-
- - PaulHarrenstein - - Dept. of Computer Science - University of Oxford - - - - HansGeorgSeedig - seedigh@in.tum.de - - Institut für Informatik TU München - - - Minimal Extending Sets in Tournaments -
- - - - - - (Extended Abstract) -
-
-
- - - - - GROBID - A machine learning software for extracting information from scholarly documents - - - - - - - - [Applied computing]: Economics; [Computing methodologies]: Distributed Artificial Intelligence- Multiagent systems General Terms Theory - Economics - Algorithms Keywords Tournament solutions - Banks set - Minimal extending set - - - -

In 2011, Brandt proposed a new tournament solution called the minimal extending set (ME ). It was conjectured that ME satisfies a large number of desirable properties. In this paper, we non-constructively show that ME fails to satisfy most of these properties. However, no concrete examples of these violations are known and it appears that ME satisfies these properties for all practical purposes. This casts doubt on the axiomatic method.

-
-
-
- - -
INTRODUCTION

Many problems in multiagent decision making can be addressed using tournament solutions. Examples of wellstudied tournament solutions are the Copeland set, the Banks set, or the minimal covering set [4]. A common benchmark for tournament solutions is which desirable properties they satisfy (see e.g., [4], for an overview of such properties).

In 2011, Brandt [1] proposed a new tournament solution called ME and an associated graph-theoretic conjecture. If the conjecture had held, ME would have satisfied virtually all desirable properties that are usually considered in the literature on tournament solutions. In 2013, however, the existence of a counter-example with about 10 136 alternatives was shown. The proof is non-constructive and uses the probabilistic method [3].

This left open which of the properties are actually satisfied by ME . In this paper, we resolve these open questions. Using the counter-example by Brandt et al. [3] we show that ME fails to satisfy most properties (such as monotonicity and stability) while it does satisfy a stronger version of idempotency, irregularity, and membership in the Banks set. We also prove that computing ME is NP-hard.

-
PRELIMINARIES

A tournament T is a pair (A, ), where A is a set of alternatives and is an asymmetric and complete (and thus irreflexive) binary relation on A, usually referred to as the dominance relation. The dominance relation can be extended to sets of alternatives by writing a B when a b for all b ∈ B. Let BT (a) denote the set of all subsets B ⊆ A such that |B is transitive and a B \ {a}.

A tournament solution is a function that maps a tournament to a nonempty subset of its alternatives. The Banks set BA chooses maximal elements of inclusion-maximal transitive subtournaments, i.e.,

BA(T ) = {a ∈ A : ∃B ∈ BT (a) such that b : b B}. A subset of alternatives B ⊆ A is called S-stable for tour- nament solution S if a / ∈ S(B ∪ {a}) for all a ∈ A \ B.

In particular, we refer to BA-stable sets as extending sets. The union of all inclusion minimal extending sets defines the tournament solution ME [1], i.e.,

ME (T ) = {B is BA-stable : ∀C B : C is not BA-stable}.

An example of a tournament where BA and ME differ is given in Figure 1

-
PROPERTIES OF ME

Dominance-based properties.

First, we consider two properties that are based on the dominance relation. Monotonicity prescribes that a chosen alternative should still be chosen if it is reinforced. The second property, independence of unchosen alternatives, states that the choice set should be unaffected by changes in the dominance relation between unchosen alternatives. Theorem 1. ME satisfies neither monotonicity nor independence of unchosen alternatives.

-
Choice-theoretic properties.

An important class of properties concern the consistency of choice and relate choices from different subtournaments of the same tournament to each other. A relatively strong property of this type is stability, which requires that a set is chosen from two different sets of alternatives if and only if it is chosen from the union of these sets [2]. S(A ∪ B)).

In Figure 2, the logical relations between the different properties are depicted. Theorem 2. ME satisfies α ⊇ but neither α ⊆ nor γ ⊇ .

As a consequence, ME is not stable but satisfies idempotency. It is still open whether ME satisfies γ ⊆ .

-
Relationships to other tournament solutions.

Besides the axiomatic properties of ME , we are also interested in its set-theoretic relationships to other tournament solutions.

Theorem 3. For all tournaments T , ME (T ) ⊆ BA(T ).

This also implies that the irregularity of BA [4, Theorem 7.1.3] extends to ME .

It is unknown whether the tournament equilibrium set is always contained in ME and whether ME is always contained in the minimal covering set.

-
Computational complexity.

An important property of every tournament solution is whether it can be computed efficiently. By a reduction from 3SAT, we can show that this is not the case for ME . Theorem 4. Deciding whether an alternative in a tournament is contained in ME is NP-hard.

Membership of the problem in NP seems rather unlikely. The best upper bound we know of is Σ p 3 .

-
CONCLUSION AND DISCUSSION

We have analyzed the axiomatic as well as computational properties of the tournament solution ME . Results were mixed. In conclusion, ME It is worth pointing out that ME 's violation of monotonicity, stability, and independence of unchosen alternatives crucially depends on the existence of tournaments with more than one minimal extending set. Not only is the size of known tournaments of this type enormous (about 10 136 alternatives) but, furthermore, these tournaments are very likely to be extremely rare. In effect, ME does satisfy these properties in all scenarios in which tournaments only admit a unique minimal extending set. Hence, it is fair to say that ME satisfies the considered properties for all practical purposes. This, in turn, may be interpreted as a criticism of the axiomatic method in general: For what does it mean if a tournament solution (or any other mathematical object) in principle violates some desirable properties, but no concrete example of a violation is known and will perhaps ever be known?

Figure 1 :In this tournament, ME (T ) = {a, b, d} whereas BA(T ) = {a, b, c, d}. An edge from a to b corresponds to a b. Omitted edges point rightwards.
-
.
-
A tournament solution S is stable if for all (A, ), B, C ⊆ A, and X ⊆ B ∩ C, X = S(B) = S(C) if and only if X = S(B ∪ C). Stability can be factorized into conditions α and γ by considering each implication in the above equivalence separately. A tournament solution S satisfies α, if for all sets of alternatives A, B, and X with X ⊆ A∩B, X = S(A∪B) implies X = S(A) = S(B). A tournament solution S satisfies γ, if for all sets of alternatives A, B, S(A) = S(B) implies S(A ∪ B) = S(A) = S(B). For a finer analysis, we split α and γ into two conditions [2, Remark 1]. A tournament solution S satisfies α ⊆ (or α ⊇ ) if S(A) ⊆ B ⊆ A implies S(B) ⊆ S(A) (or S(B) ⊇ S(A)). Similarly, a tournament solution S satisfies γ ⊆ (or γ ⊇ ) if for all A, B, and X, it holds that X = S(A) = S(B) implies X ⊆ S(A ∪ B) (or X ⊇
-
Figure 2 :Implications of stability properties.
-
(i) is not monotonic, (ii) is not independent of unchosen alternatives,(iii) satisfies α ⊇ and idempotency, (iv) does not satisfy α ⊆ and γ ⊇ and is not stable, (v) satisfies irregularity, (vi) is contained in the Banks set, (vii) is NP-hard to compute, and (viii) satisfies composition-consistency [1].
- - - -
-
ACKNOWLEDGEMENTS

This work was supported by the Deutsche Forschungsgemeinschaft under grant BR 2312/7-2. Paul Harrenstein was supported by the ERC under Advanced Grant 291528 ("RACE").

-
- -
- - - - - - Minimal stable sets in tournaments - - FBrandt - - - - Journal of Economic Theory - - 146 - 4 - - - - - - - - - Set-rationalizable choice and self-stability - - FBrandt - - - PHarrenstein - - - - Journal of Economic Theory - - 146 - 4 - - - - - - - - - A counterexample to a conjecture of Schwartz - - FBrandt - - - MChudnovsky - - - IKim - - - GLiu - - - SNorin - - - AScott - - - PSeymour - - - SThomassé - - - - Social Choice and Welfare - - 40 - - - - - - - - - Tournament Solutions and Majority Voting - - J.-FLaslier - - - - Springer - - - - - -
-
-
-
diff --git a/projects/oai_harvest_md/.gitignore b/projects/oai_harvest_md/.gitignore deleted file mode 100644 index 96077ed..0000000 --- a/projects/oai_harvest_md/.gitignore +++ /dev/null @@ -1 +0,0 @@ -oai.ndjson.zst diff --git a/projects/oai_harvest_md/Makefile b/projects/oai_harvest_md/Makefile deleted file mode 100644 index f9deb7f..0000000 --- a/projects/oai_harvest_md/Makefile +++ /dev/null @@ -1,5 +0,0 @@ -SHELL := /bin/bash - -oai_harvest_20200215.ndjson.zst: - wget -c https://archive.org/download/oai_harvest_20200215/oai.ndjson.zst - diff --git a/projects/oai_harvest_md/README.md b/projects/oai_harvest_md/README.md deleted file mode 100644 index 5f2b655..0000000 --- a/projects/oai_harvest_md/README.md +++ /dev/null @@ -1,20 +0,0 @@ -# OAI metadata matching - -Goal: end-to-end data workflow (acquisition, harvest, matching, new release entities). - -## Plan - -* [ ] get JSON version, via [oai_harvest_20200215](https://archive.org/details/oai_harvest_20200215) -* [ ] filter out out of scope data -* [ ] (a) for items that have a doi, figure out, whether we already have md for this doi via API -* [ ] (b) for items w/o doi, get a list of (id, title) -* [ ] run fuzzy matching over title list to find out which one we have - -## Get data - -``` -$ make -``` - -* compressed 12G, around 100G uncompressed - diff --git a/projects/titlelist/README.md b/projects/titlelist/README.md deleted file mode 100644 index a58f50e..0000000 --- a/projects/titlelist/README.md +++ /dev/null @@ -1,6 +0,0 @@ -# Title list - -Given a list of over 20M publication titles, determine matches, to generate a -seed list. - - -- cgit v1.2.3