Historians face a specific challenge: they need to derive conclusions from evidence which is always incomplete, contradictory and anything but precise. Historical information systems and software systems used for their implementation must therefore be based upon a model of information which reflects these properties of the available sources.

[ A slightly reformatted pdf of this post for printing is available here ]

To clarify: the kind of historical research I have in mind does not focus on creating literary art forms reflecting opinions about the past, but on strictly source based studies which need to draw conclusions from dates mentioned in inherently dull administrative records, from prices and measurements mentioned, from shadowy networks between common people which have rarely been involved in acclaimed political actions and from an analysis of the language used in mass documents, not the literary highlights of the epoch. Though the understanding of political actions and literary highlights may, of course, profit from such a mundane background and become an object of the tools developed for its analysis.

To make this slightly more concrete: In the so called Digital Humanities recently the argument has turned up that surprisingly historians scarcely use data bases, so a model for such, based on a new concept, “factoids”, should be presented to them from the outside [Bradley 2014]. It is confusing that the application of information technology within history has moved so far away from data bases, which undoubtedly have been the guide fossils of computer applications within history in the eighties and nineties, that a distinguished specialist like Paul Bradley can consider data base applications as unknown within historical research [Pasin 2015]. We will not discuss the reasons for that, which might be sought in the disciplinary focus of the so called Digital Humanities, but also in the way in which historical research presents its face to the outside world during recent years. The following text is very much based on the assumption that data base technologies *are *among the most central information technologies for historians.

The relationship between this post and the previous one – *On Information in Historical Sources – *is explained in Appendix III below.

**1. Examples ^{1} for Concrete Problems.**

Vagueness, contradictions and lack of information together are responsible for the difficulties in deriving coherent knowledge about the past from surviving sources. To solve the problems created by them, it is helpful to differentiate cleanly between the individual types of problems.

**1.1 Vagueness I**: A data point is beyond doubt; but it is an interval or a set, rather than a single value.

Examples:

(Ex 1.1) *Price of wheat in March: 23d – 26d.*

(Ex 1.2) *Early 18*^{th}* century.*

(Ex 1.3) *He was born in Prussia.*

(Ex 1.4) *By occupation he was an innkeeper and a farmer.*

**1.2 Vagueness II: **A data point is expressed on a scale, on which a desired operation is undefined for the intuitively underlying data type.

Examples:

(Ex 2.1) *Bread prices were very high during the summer.*

(Ex 2.2) *She was young.*

(Ex 2.3) *She came from a nearby village.*

(Ex 2.4) *Their son was an artisan.*

**1.3 Uncertainty I: ** Only one of a set of data points can apply.

Examples:

(Ex 3.1) *She was 25 or 35 years old.*

(Ex 3.2) *An event took place before Easter 1435 or Easter 1453.*

(Ex 3.3) *They bought iron tools in “the market”. (Assuming three towns with names containing “market” qualifying.)*

(Ex 3.4) *The source allows reading the occupation as “tailor” or “sailor”.*

**1.4 Uncertainty II: **A precise data point exists; but it is doubtful, whether it is correct.

Examples:

(Ex 4.1) *The amount “25” is probably given in florin unless it is guilders.*

(Ex 4.2) *This cannot have happened much before 1618, though the text mentions “1590”.*

(Ex 4.3) *A location “x-town” is mentioned. As that was founded much later another, earlier, settlement with the same name is probably referred to.*

(Ex 4.4) *The charter claims to assign a rent of type “x”, which is appearing in all other documents only 100 years later.*

(Ex 4.5) *Data points derived from different sources are of different validity, as the sources from which they are derived are of different quality.*

**1.5 Inconsistency: **Contradictory data points are supported.

Comment: The same examples as for Uncertainty I can be envisaged here. (In the following text referred to as examples 5.1 – 5.4 when used for inconsistency.) The difference between the two cases is as follows: Uncertainty I refers to cases, where contradictions appear out of the difficulties created by the interpretation of a source where inherent consistency can be expected. “Inconsistency” in the sense defined here arises out of conflicting information about one entity (person, location, transaction …) which is referred to by different sources. Beyond the examples mentioned in 1.3 these are e.g.:

(Ex 5.5) *The name of one individual spelled differently in different sources.*

(Ex 5.6) *Variance between witnesses of a text.*

**1.6 Incompleteness: **No concrete data point is known, though it must exist.

Comment: In any historical information system we must carefully differentiate between three incarnations of the “missing data” problem.

Examples:

(Ex 6.1)* The case, where an attribute of an entity must have existed, is unknown, however – a person for which no information about the age can be found in the sources.*

(Ex 6.2)* The case, where an attribute can partially be derived from another attribute – a person who died in 1754 must have been born before that.*

(Ex 6.3) *The case, where a potential attribute is not applicable – every person can have a marriage date, does not have to, however.*

**1.7 Polyvalence I:** A data point is valid within more than one datatype.

Examples:

(Ex 7.1) * A topographical name is given as “Ansbach-Bayreuth”. Within any procedures interpreting character strings, this is a name. It indicates indirectly a temporal frame of between 1792 and 1806, as the political entity with that name existed only than. Within procedures interpreting spatial information, it is an area that can be processed for spatial queries.*

(Ex 7.2) *A calendar date is given as “Tuesday before St. X.”. Within textual processing, the character string can indicate the occurrence of a patron saint considered to be indicative of a given ruling family. Within temporal processing it is a temporal data point.*

**1.8 Polyvalence II:** A data point can be interpreted as a value of more than one conceptual variable.

Comment: It is very difficult to contemplate a system processing information, where the separation between abstract concepts and concrete realizations of these concepts does not exist. Usually this is expressed by the existence of a variable which in different stages of processing gets assigned concrete values or data points. There exist cases, however, where it is not clear, *which *variable is assigned a specific data point.

Examples:

(Ex 8.1) *It is unclear, whether the character string “Adenau” refers to a surname or a place of origin.*

(Ex 8.2) *It is unclear, whether the date “March 23*^{rd}*, 1764” refers to the date a will was drawn up or to the date of death of the deceased.*

(Ex 8.3) *It is unclear, whether the character string “Kolding” refers to a place of birth or to the previous place of residence.*

(Ex 8.4) *It is unclear, whether an attribute – the date of birth, e.g. – refers to one person or to another.*

(Ex 8.5) *It is unclear, whether a group of data points refers to an object of one type or of another. (Is a person described in a list the son of the preceding one or a hired helper?)*

**1.9 Negations: **A data point defines a value which is *not* applicable but offers no clue on which value should be used instead.

(Ex 9.1) *A source – or the comment of a researcher – raises a negative claim:”not present in March 1820”.*

**2. An Attempt to Identify Underlying Abstract Problems.**

The examples given above have been roughly ordered by decreasing obviousness. (Ex 1.2) –“Early 18^{th} century” – is a problem, that appears in virtually all data base oriented systems ever designed for art history. (Ex 8.3) – unclear relationship between a spatial instance and the concept for which it is an instance – is almost never handled by existing software and occurs only when researchers provide highly complex models for the factoids extracted from a historical source. We assume, that to provide concrete solutions to these problems requires first abstract solutions for four classes of problems based on which a software system can than provide concrete instances.

**2.1 Structural Requirements and Implications**

When we try to provide a framework for the solution of the listed problems the order in which we have introduced them is not reproduced. The concrete solutions for each of the examples quite frequently require abstract properties of the software provided, which are derived from more than one of the abstract solutions. We have introduced the “separation between abstract concepts and concrete realizations of these concepts” at a late stage, but it goes back to a very fundamental issue in information technology, the conceptual definition of a “variable”. A variable in almost all software systems is originally a scalar, representing exactly one abstract concept with a well-defined data type. A scalar can be generalized to a vector and untyped programming languages are anything but exotic. If we look closer these generalizations are not quite as well supported as we might believe.

Let’s assume we have two scalar variables A and B, with values “3” and “5” respectively. Than the two following expressions are obvious and well understood:

if (A < B) …

C = A – B

Now let’s assume that the same variables A** **and B are vectors with values {“3”, “4”, “5”} and {“5”, “4”, “3”} respectively. The interpretation of

if (A[i] < B[i]) …

C = A[i] – B[i]

remains clear, of course.

if (A < B) …

is almost certainly false; but what is the result of

if (A == B) … ?

If we assume that the vectors represent ordered sets it is false. If we assume that the sets are unordered it is true. Obviously the one equality operator has to be replaced by at least two operators specifying different types of equality.

We will deal with proposals for solutions – or at least with a map which way would lead to solutions – within an integrated data model later. For the time being let us just summarize, that some of the phenomena described above result in requirements for the basic building blocks used to bind data points to concepts – usually called “variables” (attributes, properties) – which have implications for the operations defined for them.

As especially (Ex 8.4) above (*It is unclear, whether an attribute – the date of birth, e.g. – refers to one person or to another*) indicates, similar considerations hold true for the larger constructs holding such basic building blocks – records, data structures, objects, nodes. This they do recursively.

**2.2 Imprecise Data Types**

Rereading the initial list of examples after the description of the structural requirements, we will find the need for imprecise data types underlying a very large part of the examples. The imprecision inherent in many data points as illustrated by (Ex 1.1) – (Ex 1.3) may seem quite specialized at first look. This specialization relates only to the kind of data being encountered. To handle it requires operations which are general for individual data types.

Let us assume two variables A and B with values “3-10” and “5-7”, where in the following discussion we will use min(x) for the smallest possible value (i.e., min(A) == “3”) and max(x) for the largest possible value (max(B) == “7”).

For

if (A < B) …

different results can be argued for. If we assume that we want to consider a large part of all existing data points, we could opt for

(A < B) ::= min(A) < min(B) ==> resulting in “true” for the example.

If we want to restrict ourselves to highly plausible cases, we could define

(A < B) ::= min(A) < min(B) && max(A) < max(B) ==> resulting in “false” for the example.

Looking for the largest possible result set, including counter-intuitive cases, we could look for

(A < B) ::= min(A) < max(B) ==> resulting in “true” for the example.

For temporal information the same logic applies when we translate temporal conditions into conditions between the limiting dates of a calendrical date range. This is solved by a numerical comparison of the offsets that represent the distance of these dates from a given date 0 on the time line.

For spatial information^{2}, the same logic applies to the areas representing two spatial data points. The semantics of the spatial case is somewhat confusing. The distance between two areas will be inherently imprecise: An operationalization as the interval between the two closest points and the two most distant ones would be one extreme, a distance between the two centers of gravity the other. While the later might be interpreted as a crisp value, a distance measure between areas which does not consider the shapes of the areas at least somewhat seems to be not very reasonable.

We started with the statement, that (Ex 1.1) – (Ex 1.3) may seem more specialized. Noticing that the expression of the conditions applicable to spatial and temporal reasoning [Kohlas 1995 305-340] is done in terms of constructs derived from numerical comparisons leads to the assumption, that we will find many more cases where such derived constructs are also useful in the underlying solutions for other problems.

Again we will discuss concrete proposals for solutions later. Let us just point out, that even textual comparisons can be mapped unto such numerical comparisons – e.g. defining string equality as range for a Levenshtein distance.

Final remark, the importance of which will become clear only at the end of the next section: Our focus here are data points, where there *is* no scalar value. The price of bread within a week *is* a range.

**2.3** **Vague Evaluations**

Initial clarification. When in the remainder of this paper we use the uncapitalized attribute “fuzzy” we refer to the general meaning of that term in the Oxford English dictionary: *blurred; indistinct; imprecisely defined; confused; vague*. When we use the capitalized form “Fuzzy” we refer mainly to the Fuzzy Set Theory introduced by Lotfi Zadeh, occasionally tacitly including references to similar formal theories for the handling of non-crisp values^{3} – Rough Sets^{4}, (Dempster–Shafer [Kohlas 1995]) belief values ^{5} etc.^{6}

In the preceding section we have discussed imprecise data, but we have done so with the tacit assumption, that a comparison operation must be either true or false. If we look more closely at the example two variables A and B with values “3-10” and “5-7” we can easily see, that another interpretation is equally plausible. If we assume integer values to make things more intuitive, the following is obvious.

Of the data points covered by A {3, 4, 5, 6, 7, 8, 9, 10} 2 or 25 % are smaller than the data points in B {5, 6, 7}. This can be interpreted as “25 % of A is smaller than B” or: the statement

(A < B) resulting in 25 % true.

Similar reasoning can be applied to (Ex 1.2) *Early 18*^{th}* century *(as typically used in art historical data bases for the ascribed creation of objects)*. *Let us assume A to be defined as “1695 – 1699” and B as “Early 18^{th} century”, operationalized as “1700 – 1739”.

(A < B) obviously results in true.

Now let A be “Early 18^{th} century”, operationalized as “1700 – 1739”, and B “1720 – 1729”.

Following our argument from the numerical case we might say (A < B) results in 25 % true, as the number of years represented by B represents 25 % of those represented by A.

Intuitively this seems to be much less convincing. An interpretation of “Early 18^{th} century” operationalized as “1700 – 1739” will for most people include the tacit assumption, that the true date is more probably within the interval “1700 – 1719” than within “1720 – 1739”. To arrive at a result, which is equivalent to our intuitive understanding, we would have to find a more dynamic solution. We could, e.g., say, that we assign to each of the years in the interval 1700 – 1739 a probability for this year to be the year of creation. To map the focus on the earlier years we could assign:

♦ a probability of 3,5 % to each of the years 1700 – 1719,

♦ a probability of 2 % to each of the years 1720 – 1729 and

♦ a probability of 1 % to each of the years 1730 – 1739.

That these probabilities 70 plus 20 plus 10 % sum to 100 indicates our conviction, that the date of origin must have been in the overall time bracket.

Lets use again for A “Early 18^{th} century” operationalized as “1700 – 1739” and for B “1720 – 1729”.

(A < B) results in the sum of the probabilities for the years in the interval “1700 – 1739” being smaller than the lower limit of 1720 – 1729, 70 % that is, augmented by an estimate for the probability that the true value in the interval 1700 – 1739 that falls into the 1720 – 1729 bracket is still lower than the true value within the 1720 – 1729 interval. Assuming that the true date of 1720 – 1729 is equi-probable for that interval we can plausibly estimate that as 50 % of the probability that the true value for “1700 – 1739” falls into the 1720 – 1729 bracket, that is 50 % of 20 %. So we arrive at a final estimate of 70 % plus 10 % == 80 %.

That representation of a linguistic expression by a numeric expression on an underlying scale takes also care of the (Ex 2.1) – (Ex 2.4).

Solutions like this – again, in more detail below – are useful particularly if we are using complex logical expressions, where many of the components are imprecise or fuzzy as in the example just given. As summary we can simply state, that tools for the evaluation of intervals connected to probabilities for the different points of the interval are needed, as well as tools to express the numerical representation of vague or fuzzy linguistic terms.

Final remark, comparing the class of problems dealt with in this section with the ones from the previous section: Our focus here are data points, where there *is* a scalar value, which we just do not know. A person *has* a specific age at a given point in time. We may know only, within which range it lies, not what it is precisely.

Similarly: computing the distance between the point of residence and the place of birth for a person, where for the later only an area instead of a locality is known, asks for a different approach for comparison than the distance between two areas discussed in the preceding section.

The computational approaches to solve these two classes of problems are different; have to be usable in combination, however. An object of art *has* often been created within a range of dates, not on one specific day – think of the Parthenon^{7} – the true range may be fuzzily hidden within an estimated one, though.

**2.4 Vague Flow Control**

The concepts discussed in the previous three sections are useful, if we want to evaluate queries, which direct transactions within an information system which terminate immediately. If we ask a data base for “every tax payer who paid less than x in taxes” or “all objects created between x and y” we can assume, that once the query has been answered, the system can safely forget about it. Therefore all systems known to me which implement the kind of reasoning loosely described above employ this non-binary logic to evaluate complex expressions, but at the end of the evaluation there is a crisp decision what shall be done. At the end of the day, administering a flood control system: Shall we open the floodgate, yes or no? (Flood control being a frequently quoted application case for fuzzy reasoning. All intermediate steps are fuzzy; the final result is crisp.)

This can take various forms:

If (truth_value(A < B) > 0.60) do_something();

else if (truth_value(A < B) > 0.30) do_something_else();

else do_something_totally_different();

or

validity= truth_value(A < B);

processing(x,y,z, …, validity);

For many standard applications this will suffice. It becomes problematical when an information system is not used for atomic operations or transactions, where the consequences of successive queries are conceptually independent of each other, but for the support of research processes, where individual transactions depend on the results of earlier ones and a later decision within the process may change the assumptions underlying the results of an earlier query.

Let’s consider (Ex 8.4): *It is unclear, whether an attribute – the date of birth, e.g. – refers to one person or to another.*

The most intuitive operationalization of such a situation would be to connect the data point with the “date of birth” attribute of both objects representing the two persons, indicating in both cases, that calculations based upon this attribute have to be flagged as somewhat doubtful, e.g. as a fuzzy information with a perceived validity of 50 %.

Embedding this in more precise language:

Assume a census list from 1840 where a person’s entry be defined as

Person ::= { surname, first name, date of birth, location }

Assume “location” to be a very weak criterion for identification, as it indicates a previous residence, it is not clear, whether this was a previous household where a maid served, her place of birth, the village responsible for her under the poor law etc. etc.

Assume furthermore a register of baptisms defined as

Child ::= { date of baptism, date of birth mother, first name mother, location }

Assume the surname of the mother to be unknown as only the husband’s name is listed.

(1) Let there be two successive census entries, where it is not clear to which of two women a specific date of birth (dob) belongs to, while for the additional attributes it is clear, where they belong:

Person A: { Miller, Mary, March 15^{th }1820, A-Village }

Person B: { Smith, Mary, March 15^{th }1820, B-Village }

Adding a superscript to indicate the percentage with which a doubtful entry can be assigned to one of the two persons we get:

Person A: { Miller^{100}, Mary^{100}, March 15^{th }1820^{50}, A-Village^{100}}

Person B: { Smith^{100}, Mary^{100}, March 15^{th }1820^{50}, B-Village^{100}}

(2) Let there also be two entries in the register of baptisms:

Child X: { January 15^{th} 1845, March 15^{th }1820, Mary, A-Village}

Child Y: { March 16^{th} 1845, March 15^{th }1820, Mary, B-Village }

It is clear that in such a situation A would be chosen as mother of X. Date of birth, first name and location agree. However: date of birth discriminates very well between cases, First name “Mary” not all all – in some historic communities 40 % of the female population is baptized Mary. And whether a unclear criterion as “location” has been defined above means anything after five years is doubtful.

Side effects: (a) As two births by the same woman cannot occur within 3 months, A is no candidate for mother of Y anymore. (b) When the date of birth has been used as criterion for linking A, the date of birth for B becomes unknown.

Now let two years pass, during which additional sources are connected to each other. During that process we discover that B married somebody who can be a plausible father for X. This means: (a) B should henceforth be considered as mother of X instead of A, her date of birth becomes known. (b) A looses her date of birth, but can again be considered as mother of Y.

A comfortable possibility to revert to such previously discarded links has to the best of my knowledge never been implemented. We have to “remember” the reasons for the decisions made “forever”, or at least as long, as all further operations to create the higher order information object “kinship network” out of the lower order information objects “census list” and “register of baptism” has not been completed. Or, more realistically: As long as we are not absolutely sure, that no other sources will turn up, which contain data points which make it worthwhile to re-examine the structure of the object “kinship network”.

I’d like to emphasize that this problem is not as esoteric as it may appear to be. “Remembering the decision process by which the connections have been created” is indeed a problem that has been plaguing record linkage processes for a long time: how to undo a previous decision and its effects, when at a later stage additional data lead to the decision to rescind the previous one.

If we accept the necessity that a non binary decision needs to keep all possible answers “up in the air”, a few truly tricky problems arise. Consider keeping both outcomes of the following decision available:

If (truth_value(A < B) > 0.60) delete_object(X);

else if (truth_value(A < B) > 0.30) raise_priority_object(X);

**2.5 Negations**

Negations – negated chunks of information, as in (Ex 9.1) – are frequently considered a particularly difficult problem. We list them here, therefore, under a separate header. We assume that they represent a problem, which has to be solved either within the domain of structural requirements (2.1) or that of imprecise data types (2.2). In our opinion, the seemingly extraordinary difficulty of that problem arises from the unspoken assumption, that a data processing system only contains data points which ascertain a fact, not such which deny one. As soon as we allow, either on the structural level or on the level of data typed values, uncertainty and vagueness to enter, it is relatively straightforward to require that conceptually each data point has some quality which for simplicity’s sake we’ll call truth at the moment, which is somewhere between 0.0 and 1.0 [Devlin 1991]. As soon as we allow that quality to deviate from 1.0, assigning 0.0 becomes a rather obvious way to describe negated information. But, as just said at the beginning of this paragraph, that solution has to be contained in the general properties of the way in which we represent information.

**3. A Mostly Sobering Experience.**

I felt quite justified to list the problems in section 1 with some authority, as for a considerable part of them I have implemented solutions in a software system, κλειω [Thaller 1993], which was intended to allow the handling of all historical sources as close to the original as possible, between 1978 and sometime after 2010. The end of the systems active life is a bit unclear, as it is even now used tacitly under the hood of a few web projects, but the experience relevant here, specifically the computer supported work on integrating information drawn from various sources into more complex knowledge bases (2020’s terminology), is connected mainly to approximately 1978 – 1997.

Indeed, even a solution for (Ex. 8.5) – I*s a person described in a list the son of the preceding one or a hired helper? – *has been part of the very first draft of October 1978, as it was identified in the introductory rounds of collecting necessary features from the group of historians to be supported by the project. And it was dutifully tested the last time the system benchmark was run to test for side effects of the latest changes to support a library oriented project ca. 2010.

But: To the best of my knowledge these features developed for one project and applied in precisely one project, were applied less than ten times even in that one. The reason is simple. To make an actual effect for an analysis, the problems discussed here have to be:

(a) Sufficiently common in the data that their handling during the preparation of the source for analysis is needed frequently enough to make learning how to employ the software tools provided easier than finding some quick and dirty ad hoc way (“I’ll have to remember to look again at the end at the cases x, y and z”) for a problem, which with just a few cases will not change the results much anyway.

(b) Sufficiently significant that they can actually be used in analysis. Assigning all sorts of weighting factors, when their only effect is that a statistical result changes one order of magnitude below the smallest recognizable significance level, is unrewarding. That is as true for areas where “significance” is less easily defined as in statistical analysis.

(c) Sufficiently well analyzed that a solution implemented becomes fully transparent, or if not transparent, has implications which are easy to understand. That family reconstitutions and similar approaches have been popular in the eighties and nineties and have been undertaken much more rarely recently has many reasons in the fads, fashions and hypes of historical research. But the fact, that – as quoted in section 2.4 – the problem on “how to undo a previous decision and its effects, when at a later stage additional data lead to the decision to rescind the previous one” has never really been smoothly solved, so all results left lingering doubts … that fact has at the very least not *encouraged* people to engage in that kind of analysis. Which is a bit unfortunate, as one would expect that in the age of “linked data” the appeal of combining small chunks of information from different sources should be much higher than ever before.

I would like to emphasize that on the other hand solutions for that part of the problems presented at the beginning of this paper which fulfilled the criteria just specified have been implemented successful and have been used quite heavily.

The technical solution for (Ex 1.4) “*By occupation he was an innkeeper and a farmer.” *(technically: all data points can be vectors) was used in virtually all of the projects ever supported by the software. And the bundle of solutions provided for data representing numeric intervals have been used frequently. Even if the more esoteric fringes, like the slightly unusual comparison operator “A equal circa B” may not all have been used in production runs. (Assuming that A and B are intervals, each with a minimal (Min(x)) and a maximal (Max(x)) value, “A equal circa B” is operationalized as “[Min(A)] <= [Min(B)] && [Max(A)] >= [Max(B)] || Min(B)] <= [Min(A)] && [Max(B)] >= [Max(A)]”.^{8}

Therefore I would say that the effort needed for any attempt to solve the problems described initially in a complete and encompassing solution, rather than providing ad hoc solutions for isolated problems, makes sense only if these solutions are embedded into a working environment, where two conditions are fulfilled:

(a) Truly massive historical sources are available for processing. And with truly massive I think about hundreds of millions, or better billions of data points, extracted by appropriate feature extraction tools out of systematically digitized material. (Few, if any, of the early record linkage projects went much beyond one million and most stayed much below this level.)

(b) There exists currently the tacit assumption that a historical information system should allow to formulate individual and unrelated queries or requests for analytic steps, where the results of each query or step do not influence the next ones. That needs to be re-conceptualized in favor of technical support for a long term research process, where the result of each of these small building blocks is smoothly integrated back into the data base, or, rather: information base.

In my opinion condition (a) becomes more viable day by day, almost without conscious effort of the historical community. Condition (b) takes more effort; but that it would be worthwhile is an explicit assumption behind this paper.

**4. Some Preliminary Decisions**

Many of the problems mentioned here are recurrent, and being encountered by almost all projects in the Humanities which employ anything which even remotely resembles a managed data base. There exist attempts at solutions within quite traditional technologies. Take (Ex 1.4) “*By occupation he was an innkeeper and a farmer.”: *The example is taken from historical micro studies, but the problem is truly ubiquitous – art historical data bases assign more than one style to an object of art (*“late Republican / Early Imperial”*); literary styles create the same problems; people can have more than one place of residence. At first look, the normalization process required during the design of a relational data scheme seems to prevent a solution: all values are assumed to be strictly scalar. As frequently, what is claimed as a problem with an abstract model in the application of digital methods within the Humanities occurs actually only as a restriction of widely popular software, in this case the most widely used relational data bases. To the best of my knowledge largely ignored by digital databases in the Humanities, there exists a quite rich literature on so called NF^{2} databases: *No First Normal Form* databases allowing vectors as values for selected fields, in databases which in that way are still supported by relational database software, though they vulnerate in some aspects the prescribed normalization [Kitagawa 1989]. ^{9 } And they have done so long before the NoSQL movement took of.

Even when we look at the requirement for Vague Flow Control there are some points were one can start to look for a solution. In the strict sense to the best of my knowledge no solution exists yet. There seems to have been only one programming language [Adamo 1980] – never supported by a compiler – which attempted constructs providing for the parallel execution of both branches of a Fuzzy if / else control statement; and that was restricted to cases, where both branches of the control statement executed assignments to one variable. We derived from our description of the problems the requirement to replace the implicit understanding that separate activations of a program are completely independent from one another, by the proposal to see the engagement with a body of data as an ongoing long term enterprise, just one convoluted query running for years. That means, that earlier decisions are remembered and reconsidered automatically when additional information becomes available. And there *are* conceptual studies worthwhile to look at. One might e.g. look at PROLOG, where a rule- and fact base are kept conceptually active even when no query is actively executed. This model might be a starting point for a system that automatically reconsiders earlier decisions when the fact- or rule base changes, be it because additional data are integrated or earlier decisions are unmade. And while PROLOG may have been sidelined in recent years, one should point out that extensions to it have been considered and implemented, which might resolve the most obvious problems: An attempt to integrate object orientation into PROLOG [Moss 1994] could make it much easier to handle the richer semantics of facts as derivable from historical sources; vague evaluations could be supported by one of a number of attempts to generalize PROLOG from binary to one of the fuzzy logics [Ding 1996].

But for a model which aims at integrating solutions to all of the problems identified at the beginning, we would like to start with a more general and abstract decision and decide that we integrate all partial solutions into a model based on networks or graphs. We emphasize *network* here, before *graph*, as we would like to indicate, that the recent interest in graph data bases as representatives of the NoSQL movement may have overly emphasized the mathematical properties of graphs. These certainly allow the definition of various useful indicators to be derived automatically, as the centrality of a node, but they may obscure that the possibility to create rich conceptual relationships between more simple building blocks with the help of networks of nodes and edges does not necessarily mean, that what is relevant for somebody who wants to build and analyze such a network is supported by or central to graph theoretical theorems. A good example for this may be the concept of nested or hierarchical graphs, which is sufficiently prominent in the discussions about knowledge graphs, that provisions for it exist in a number of graph markup languages. That concept – a subsidiary knowledge graph, contained completely within a node of a superordinate graph – is very hard to map unto any feature of a graph discussed within graph theory from a mathematical point of view.^{10}

In the current discussion within the Humanities the term “network” is usually biased towards the type of content to be analyzed – networks of correspondence, social networks, … – while “graphs” are used as a shorthand for the software structures used to administer, process and analyze such networks^{11}. We’ll continue using the term “graph”, making explicitly clear that that does *not* imply that any property of such a “graph” described below is necessarily mathematically grounded. We also would like hope to avoid any associations to either semantic or neural networks. For both approaches data structures which are based on graphs in the sense above are attractive, or rather: necessary, but our decision is not supposed to be any kind of conceptual endorsement of either. Though you may come across some references to them.

Handling graphs of any sort with software will obviously be easier, if that software uses a model based on graphs. So much of what we propose below may be easier to implement in a graph data base than with other tools. The proposals are independent of any currently existing software. They are a sketch for an integrated software environment which has yet to be realized.

**5. Towards a Solution: A Tentative Model**

The following is an attempt to define a representation of data as they are derived from a source, which is embedded into a system of processing capabilities. Both together are designed to take care of all examples given in section 1 of this paper. At this stage all definitions are loose, but care has been taken that they do not stray too far from a more formally definable semantic and syntax.

__Definition 0.1__

When in the following definitions elements of a data structure are listed or defined, all such elements may implicitly be missing or default to an value appropriate for the underlying data structure.

**5.1 Low Level Data**

__Definition 1.1__

The basic building block is an amount of data in a representation within the expressiveness for data structures of a modern higher programming language, so far envisioned as the C / C++ language. Examples are the builtin data types or such objects as provided by the core of the standard libraries as representations of strings, images or sounds.

A __token__ is a set consisting either of the reserved symbol <Unknown>, the reserved symbol <Not_Applicable> or one or more low level data structures of exactly one of the primitive data types described in the preceding paragraph.

As a token is a set it can e.g. hold all parts of a discontinuous phrase or all those segments of an image which represent one object which is partially obscured by other parts of the image.

__Definition 1.2__

A __token__ is __terminal__ if all members of the set are stored directly on the underlying hardware.

__Definition 1.3__

A __token__ is __mapped__ if at least one member of the set consists of a segment of data stored on the underlying hardware, e.g. a substring of a string, part of a data object indicated by standoff markup or a segment of an image.^{12}

__Definition 1.4__

A token has a __token context.__ That consists of a link to a node as in definition 4.1 or an edge as in definition 4.2 below.

This definition is strictly speaking redundant, as it is implied by definition 3.4 below. As the concept of a token context is highly important it is introduced explicitly here.

__Definition 1.____5__

A token can be __embedded.__

This is a property of the overall model discussed here, but does not, or not primarily, relate to the handling of vagueness and uncertainty. To keep the model complete a short discussion of this concept is provided in appendix I.

**5.2 Data Types**

__Definition 2.1__

Every token has one or more __interpretations__ which define the operations which can be applied to it.

__Definition 2.2__

The __native__ __interpretation__ of a token consists of all operations defined for the data structures of the underlying programming language as described in definition 1.1.

__Definition 2.3__

Additional interpretations consist of triples { __parsing algorithm__, __parsing semantics__, __derived__ __operations__ }.

__Definition 2.4__

A __parsing algorithm__ defines how a token can be converted to the internal representation of another data type. Examples are: The conversion of a string to an integer, the conversion of a string to a calendar date, the conversion of an image into a textual string by an OCR algorithm.

__Definition 2.5__

__Parsing semantics__ provide all auxiliary data necessary to apply a parsing algorithm to a token. They can be implied, as in the case of an interpretation from one data type of the underlying programming language to another or in the case of an integer representation of a string, They can also provide explicit historical knowledge, as in the calendar of Saints within a specific bishopric, or define content-agnostic technical knowledge, as in the case of a specific OCR algorithm.

__Definition 2.6__

The __derived operations__ of an interpretation are all operations that become applicable to a token after it has been interpreted by a parsing algorithm, reflecting the assumptions of the target data type.

__Definition 2.7__

The context connected before any interpretation takes place is the __native__ __context__ of that token.

__Definition 2.8__

Before any interpretation takes place, the native context is also the __default__ __context__ of a token.

__Definition 2.9__

An interpretation of the token inherits the default context. Any interpretation has also the possibility:

♦ to consider the default as well as the native context of a token during the application of the parsing algorithm.

♦ to consider a context submitted as part of the parsing semantics during the application of the parsing algorithm.

♦ to replace the default context by the context submitted as part of the parsing semantics during the application of the parsing algorithm.

**5.3 Links between data items.**

In the overall model links between items occur on all levels of abstraction. Two types of links are described by the following definitions:

(a) links between a node or an edge and the tokens connected to it.

(b) links between nodes, i.e. edges.

Nodes are defined in definition 4.1.

Edges are defined in definition 4.2.

Both types of links introduced above are realized as co-references.

__Definition 3.1__

A co-reference connects its target to a set of sources by links which carry weights expressed as a real number between 0.0 and 1.0. Each of them describes the strength with which the respective source is connected to the target.

__Definition 3.2__

A co-reference can be __probabilistic.__ The sum of all weights of a probabilistic co-reference is 1.0.

__Definition 3.3__

A co-reference can be __possibilistic.__ The sum of all weights of a possibilistic co-reference is undefined.^{13}

__Definition 3.4__

All co-references are bi-directional.

__Definition 3.5__

Each weight within a co-reference can have an __reference__ __explanation__. A __reference__ __explanation__ is a string in the definition of the underlying programming language describing the motivation of the weight.

**5.4 Relationship between Data and Meaning.**

The basic building block for the representation of meaning is a __node__.

__Definition 4.1__

A __node__ consists of:

♦ an __instance__ __identifier__, a string guaranteed to be unique within the data set.

♦ __conceptual__ __identifier__ designating a concept represented by the tokens connected to all nodes sharing this conceptual identifier.

♦ a set of __co-references__ to __tokens__.

♦ a set of __co-references__ to __edges__.

__Definition 4.2__

An __edge__ in the sense of this document is always a hyper-edge^{14} in the abstract sense, represented by the tokens connected to all nodes sharing this conceptual identifier.

An __edge__ consists of:

♦ an __instance__ i__dentifier__, a string guaranteed to be unique within the data set.

♦ a __conceptual__ __identifier__ designating a concept.

♦ a set of __co-references__ to __tokens__.

♦ a set of __co-references__ to __nodes__.

♦ an __edge history__.

__Definition 4.3__

Links between tokens and nodes or edges are non-exclusive.

If more than one edge or node contains a link to a token, all such nodes and edges have to link to the complete co-reference in which this token appears. Tokens cannot be part of more than one co-reference.

The links between a co-reference to a set of tokens and a set of nodes or edges form a __second order__ __co-reference__.

__Definition 4.4__

A __conceptual__ __identifier__ can be:

♦ a __name__ expressed by a string.

♦ a set of __co-references__ to __nodes__.

__Definition 4.5__

An __edge history__ is an ordered list set of __decision snapshots__.

__Definition 4.6__

A __decision snapshot__ describes which operations within the life of a data base have been responsible for the creation of an edge.

**5.5 Implied Structures**

__Definition 5.1__

A __conceptual__ __identifier__ can have __implied properties__ which describe assumptions about the content of all nodes or edges which share this conceptual identifier.

__Definition 5.2__

An __explicit__ __default__ __token__ __set__ is a co-reference to a set of tokens to be assigned to the node or edge in question at the time of their creation.

__Definition 5.3__

An __explicit__ __default__ __edge__, if defined, will be created at the time of the creation of a node.

__Definition 5.4__

An __explicit__ __default__ __node__, if defined, will be created at the time of the creation of an edge.

__Definition 5.5__

The __default__ __depth__ of an explicit default edge restricts the generation of recursive creations of explicit default edges caused by the creation of explicit default nodes.

__Definition 5.6__

Explicit default tokens, edges and nodes can be __triggered__. That is, their creation can be delayed until the creation of another specified token edge or node.

__Definition 5.7__

A triggered default token, edge or node may overwrite a non triggered one.

__Definition 5.8__

A triggered default token, edge or node must not overwrite a token, edge or node which holds non-default data.

__Definition 5.9__

A conceptual identifier can be connected to a __membership__ __guard__, a function which computes a set of parameters of a function describing the values of all instances of this identifier, usually a statistical distribution of these values.

__Definition 5.10__

A membership guard can be __constrained__ by a condition restricting its application to a subset of the instances of the conceptual identifier in question.

**6. Elementary Applications of the Model**

The initial problem set has been used to define central features of the model. That is, the problems described are considered to be solvable by the properties of the proposed data structure which are identified in the following reflections.

This discussion focused on vagueness and uncertainty is for my long term research interests embedded into a larger context. In the first post to this blog – “On information in historical sources” – I described a set of nine proposals for research into unconventional computational approaches, as an attempt towards processing information in historical sources in a way which reflects primarily informational properties of historical sources, rather than current mainstream directions of software derived from other knowledge domains. The current paper tries to describe a data structure, which allows all features of that set of proposals to interact in the handling of the problems of vagueness and uncertainty identified. The following arguments have usually two parts, therefore: They identify where these problems have resulted in specific features of the data structure described above and they point towards the – very short – research proposals just mentioned, to identify the necessary technologies to be utilized to build that structure. The later references take the form {Ivory n} where “n” stands for the number of the research proposal referred to. The proposals are briefly summarized and their relationship to this paper is explained in appendix III. I recommend to ignore these references during first reading.

**6.1 Handling of Vagueness I**

(Ex 1.1) *Price of wheat in March: 23d – 26d.*

To handle this problem, two interrelated difficulties have to be taken care of. Computations with numeric intervals require support for a concept of numbers which allows them. For this situation the development of support for “Grey Numbers” has been proposed as {Ivory 3}. For the application towards historical sources with their specific semantics, e.g. the processing of non-decimal currencies, the data type concept described in section 5.2 definition 2.3 above has been introduced.

(Ex 1.2) *Early 18*^{th}* century.*

The solution employs the same constituent parts as the solution for (Ex 1.1) just described. Besides the obvious differences between parser semantics for the character string ==> integer conversion, different restrictions for the underlying numeric range are required. This is assumed to be part of the parser semantics and accessible for the functions later provided by the {Ivory 3} library supporting their application.

(Ex 1.3) *He was born in Prussia.*

The solution employs the same constituent parts as the solution for the two previous examples just proposed. It adds a requirement for the implementation of {Ivory 3}, though. The central problem here is created by the computations relating spatial expressions of different precision to each other. For any kind of computation involving a spatial dimension, a spatial surface term like “Prussia” has obviously to be converted into a set of co-ordinates within an underlying space where numeric computations are possible. It is easy to see, that the acceptance of a term like Prussia into a variable, where usually you would expect much more precise terms, say “Potsdam”, requires an extension of the Grey Number model to handle computations where more than one dimension is imprecise.

One should also point to the fact, that the spatial coordinates of “Prussia” vary widely over time. Any evaluation has therefore to take place by the context sensitivity allowed by definition 4.2 and conceptualized by {Ivory 1}; the later in turn depending very much on the possibility to handle “grey” spatial computations.

(Ex 1.4) *By occupation he was an innkeeper and a farmer.*

The problems described by this example are directly addressed within the current model by providing the properties of nodes and edges as sets of co-references (definition 3.1). As this is a central feature of the overall model, the example is discussed in some detail.

Assume some strings “A”, “B”, “C”.

Assigning them to three co-references, so that each co-reference refers to exactly one of these strings, we express a situation, where all three concepts represented by the strings are co-existing properties of the node or edge in question. Each of these three co-references may have a probabilistic or possibilistic weight. The weights may, e.g., express a difference in the degree of certainty about the readability of the terms. The node which contains three co-references for {“A”}, {“B”} and {“C”} has all three properties in parallel.

Assigning the three strings to a single probabilistic co-reference expresses a situation, where only one of the concepts expressed by the strings applies. The decision for one of them leaves a uncertainty expressed by the sum of the weights of the not selected ones. The node which contains one such probabilistic co-reference for {“A”, “B”, “C”} has only one of the three properties, but it is not clear which one.

Assigning the three strings to a single possibilistic co-reference expresses a situation, where a subset of the concepts expressed by the strings applies. The weight with which conclusions based upon one of the concepts should be qualified is expressed by the possibilistic weight of the string selected. The node which contains one such possibilistic co-reference for {“A”, “B”, “C”} has a subset of the three properties, but it is not clear which one.

**6.2 Handling of Vagueness II**

All of the examples in this section depend on the possibility of handling expressions which have Fuzzy values, as defined in {Ivory 4}. As that proposal is described there only rather sketchily, it shall be introduced in greater detail here.

We mentioned already that all approaches to the handling of fuzzy information, based on Fuzzy Sets or Fuzzy Logic and alternative models or refinements using these as a starting point – Rough Sets, (Dempster–Shafer) belief functions and a few less prominent models – have in common that within a working software system they are used only for the solution of punctual problems. Within an inherently crisp system there are a few pockets of vagueness or uncertainty handled by functionality which is somewhat secondarily plugged in. That may be unavoidable, if all tools are realized based upon the builtin data types of the underlying programming languages. It is unsatisfactory for historians however, as not only individual decisions, but their very reasoning processes as a whole are fuzzy.

Please notice two aspects of the approach we describe here, which may not be self evident:

(a) For all examples of the preceding section 6.1 we proposed solutions combining aspects of the notion of co-references, which assume that *every *token (definitions 1.1 – 1.4) *is* weighted by the co-reference mechanism (definitions 1.1 – 1.3). *Is *weighted, not: *can be* weighted. And the examples (Ex 1.1) – (Ex 1.3) rely upon the availability of the Generalized Grey Number mechanism of {Ivory 3}. While this does not make the resulting system Fuzzy, it certainly makes it generically fuzzy.

(b) Discussing (Ex 1.2) – *“Early 18*^{th}* century”* – in the context of section 2.3 we have developed the notion of a vague evaluation by developing out of the conversion of “Early 18^{th} century” into a grey number “1700 – 1739” the notion of a pair of numbers which did not constitute the value to be used computationally, but delimiting the numeric space within which a true value was expected. And already there we noticed that the true value might itself be a genuine interval, a grey number, that is.

Keeping these two observations current, we should address another aspect of the data model we describe here. So far we have mainly discussed it as a model to represent the data derived from historical sources, basically as the data structures for a data base. If we emphasize that a notion like the properties of a node being presented by a set of co-references, is not only a way to represent data derived from a source for processing but also the logically appropriate structure to hold the result of processing such structures, our options for handling vagueness open up considerably.

If *all* data can harness the power of co-references, they are *all* inherently fuzzy, and a fuzzy result of their processing can be stored within the co-reference model again; be it temporarily during a computation or persistently by using it to update or augment the underlying data base.

That is: all data within the structure described are already prepared for fuzzy processing in the sense of {Ivory 4}.

(Ex 2.1) *Bread prices were very high during the summer.*

The biggest problem of all attempts of fuzzy reasoning is not the application of the specific calculus, but the derivation and operationalization of the underlying distributions.

For the convenience of readers who are less familiar with the various approaches at fuzzy computing, notably the “Computing With Words” ([Zadeh 1999], [Seising 2012]) approach, we will describe in a bit of detail, how the predicate “very high” would be handled within Zadeh’s Fuzzy reasoning [Zadeh 1975]. A *bit* of detail. Apologizing to all specialists for the simplifications we use. Apologizing also for the slightly more loquacious style of this section.

Bread prices would usually be expressed in a source as some amount given in a historical currency system. Let the price of bread in one historical community be derived from a price that varied between 24 pennies and 83 pennies for a bushel of wheat. Obviously the extreme prices have been rarer than the intermediate values, so we have a frequency distribution, probably a Gaussian one.

Talking about prices in that community we presumably all agree that getting a bushel for 24 pennies would be extraordinarily cheap, while one for 83 would be extremely high. 25 is still very cheap, 82 still very high a price, but both are less close to the extremes than the first two numbers.

Zadeh’s seminal idea has been to derive from this the following: The phenomenon “price of bread” can be discussed by, say, three linguistic terms: “low”, “medium”, “high”. For each of the set of individual prices, the set of numbers { 24, 25, …, 82, 83 }, our “universe of discourse”, we can derive an estimate how likely it is that someone would apply the term “low” to it. That a price of 24 or 25 would most certainly be considered low we already mentioned; that a price of 83 or 82 would most certainly not needs no explanation.

We can express this by saying that the set of numbers representing a “low price” would consist of the numbers {24, 25, …, 42, 43 }, the set of numbers representing a “medium price” { 44, 45, …, 62, 63 } and { 64, 65, …, 82, 83 } representing a “high price”. Not completely implausible; but why would somebody make such a radical change between 43 and 44 on the one hand, between 63 and 64 on the other?

If we rephrase this by something like “ ’24’ would certainly be assumed to be a low price, ’43’ might be, but might just as well be considered medium” the statement reads much more reasonable. Formalizing that we can say, that for every number in our universe of discourse – { 24, 25, …, 82, 83 } – there is a weight, standardized between 0.0 and 1.0 indicating, with what probability, plausibility, certainty or some such concept this number would be considered “low”, “medium” or “high”. More formally: for all words used to reason about the height of the price of bread, there is a membership function which estimates for each number how strongly this number belongs to the concept expressed by each word. In our example, we might arrive at:

Universe of discourse ::= { 24, …, 39, …, 53, …, 68, …, 83 }

membership(low) ::= { 1.0, …, 0.5, …, 0.0, …, 0.0, …, 0.0 }

membership(medium) ::= { 0.0, …, 0.5, …, 1.0, …, 0.5, …, 0.0 }

membership(high) ::= { 0.0, …, 0.0, …, 0.0, …, 0.5, …, 1.0 }

Or, much more intuitive graphically:

Most people confronted with this reasoning find it immediately appealing; take unfortunately seconds only, to come up with the question “yes, but how do you find the numbers for the membership degrees?”.^{15}

There are two sides to that question: It is relatively plausible, that there can be empirical reasons to apply specific such weights. This is why “fuzzy computing” of one paradigm or the other, has made major inroads into virtually all disciplines of engineering, the hard sciences and the economic disciplines, where we are on stable ground to make empirical assumptions how the numerical values are distributed, what numbers define our universe of discourse and how they are distributed within it. But these are assumptions which are deeply rooted within a discipline or a knowledge domain. The assumptions about what constitutes the membership function for the term “high level of water” for a flood control system can be brilliantly observed, consistent and reliable, nevertheless they do not contribute to the question, what exactly constitutes a “high price” for early modern bread.

A “very” high price, incidentally, shows the use of a “linguistic hedge” within a fuzzy expression, which can be understood as a modification of the basic membership function. The most readable general introduction to their role within fuzzy reasoning is probably still [Yager 1982]. See [Ferson 2015] for a recent well readable example.

Any individual price of bread has as its context^{16} a subset of all other tokens in the database which express a “price”. Access to these is provided by definitions 1.4 and 3.4; the subset can be restricted by rules allowed under definitions 5.9 and 5.10. Any algorithm processing a single price has therefore access to all of that context.

Based on this we propose during the evaluation of a fuzzy predicate to compute the parameters of the distribution of the actually observed values for a property like “price” out of the context of the tokens involved.

As it might be slightly inefficient to repeat this analysis of the universe of discourse every time a token is encountered, this will depend ultimately on an implementation of the concept of “frozen algorithms” in {Ivory 2}.^{17} Any such more general solution notwithstanding two preliminary solutions are obvious. (a) Explore the available quotes for prices the first time such a computation is encountered and derive the appropriate distributions. Store these and use them, until updates to the tokens concerned have become so frequent, that it becomes necessary to re-compute them. (b) Slightly more specialized and mainly possible for such probabilistic measures as can be derived from the moments of a distribution: Keep the intermediate steps of the computation of the moment stored and recompute the moment every time the value of the underlying variable or variables is changed, added or deleted. (Which restricts the full power of definition 5.10 slightly.)

There is also a much more refined possibility, for which I feel not yet able, though, to understand all implications. At the end of the day, a membership function can be seen as the result of a classificatory process. It should be possible to construct small dedicated artificial neural networks and embed them into the model we discuss here in such a way, that they produce the membership functions necessary and update their classification dynamically, much like the plain moments of the statistical distributions. To provide for this was the concrete reason to introduce definitions 5.9 and 5.10.

(Ex 2.2) *She was young.*

This example does not introduce anything completely new compared to example 2.1. It was included here to emphasize, that the solution of the seemingly identical problem requires only the replacement of the appropriate universe of discourse as described at the end of the previous section. And it is useful to shed a bit more light on the expected properties of the implementation of the fuzzy reasoning support as described in {Ivory 4} and in the preceding section.

To evaluate the meaning of a linguistic term – or “linguistic variable” [Zadeh 1975] – in a Universe of Discourse we need one (or more, if we implement fuzziness not via Fuzzy Sets, but e.g. via Rough Sets^{18}) distribution. This can be selected via “heuristics”, i.e., an understanding of the problem domain external to the database. Even if the Universe of Discourse is mapped unto the interval 0.0 – 1.0 “old” has a different membership function than “high price”, though both relate to the upper end of the numeric interval. Prices have a tendency towards a normal distribution; ages are distributed within a population with a distribution skewed towards the right.

The decision, whether a normal distribution or one skewed towards the right should be applied to evaluating a term has to be defined in the information stored about the concept, which provides the conceptual identifier for the node which is connected with a specific datum. This *type* of the distribution *might* in principle be derived from the data, as described above for the *concrete* distribution. In many cases, at least at the beginning of a dynamic analysis which starts already before all data which are expected are already available, this has to be provided independent of the data, as a small data set may at the beginning otherwise create a completely erroneous distribution by chance.

Significant parameters, on the other hand, like the minimum and the maximum needed for normalization of the data, can *only* be derived from them. That in 19^{th} century literature we occasionally encounter “venerable old men” who are still in their fifties looks plausible, when we look at the contemporary distribution of ages at death; but only than.

To repeat: Full support for vagueness implies in our opinion predicates, which are part of a conceptually permanently running reasoning process, which will from time to time derive knowledge it needs from the data set as a whole, without always explicitly being told to do so.

(Ex 2.3) *She came from a nearby village.*

Basically generalizing the previous examples into two-dimensional conceptual space. Two additional complexities arrive. “Nearby” can be operationalized as a membership function for the membership of villages in the set of places near the referring place mentioned in the data, derived from their distance from the place referred to. This emphasizes the necessity to contextualize all nodes at least within space time as postulated more generally by {Ivory 1}.

The second problem is trickier: “nearby” in such a context is not really a function of spatial distance, but a of spatial distance mitigated by available means of transport. At the moment I see no straightforward way to implement a solution in the current model. This obviously needs a much more elaborated concept of the context of a node.

(Ex 2.4) *Their son was an artisan.*

*Any* kind of query, which requires reasoning on the conceptual level (in the sense of: connect the appropriate tokens in the data to a more abstract concept) requires a representation of the conceptual relationships. To recognize that the token “carpenter” in the source represents an artisan, we have to apply some reasoning support. At the current stage of development a graph describing the relationship between the concepts to which the data points are connected looks reasonable.

Structurally the current model can provide for that: a conceptual identifier according to definition 4.4 can consist of a set of co-references to nodes. An instance of the overall data structure described here which employs only strings as conceptual identifiers is very much a graph oriented data base. An instance, where the conceptual identifiers are co-references to edges and all sets of co-references to tokens are empty sets is an abstract conceptual graph. All realized systems would presumably be a mixture of the two. I would like to point out, that it is very easy to see parallels towards the concept of a mixed rule- and database, as handled by PROLOG.^{19} We intentionally avoid the obvious hint at RDF and triple stores as we assume that ultimately for many applications higher order tuples will become more prominent than RDF triples.

**6.3 Handling of Uncertainty I**

(Ex 3.1) *She was 25 or 35 years old.*

This example can very easily be expressed in the overall model. A co-reference (definition 3.1) to the two possibilities will suffice, allowing both values to be equiprobable as well as declared to have different probabilities / belief values / degrees of plausibility etc., depending on the conceptual model of vagueness to be employed.

(Ex 3.2) *An event took place before Easter 1435 or Easter 1453.*

Same solution as in the previous example; intended as a reminder that the mechanism of providing a data type specific semantics (definition 2.3) is independent of the handling of uncertainty.

(Ex 3.3) *They bought iron tools in “the market”. (Assuming three towns with names containing “market” qualifying.)*

This example has been introduced to show that the type of uncertainty handled here is also connected to the kind of implicit automatic reasoning which has been introduced to handle the more challenging problems of vagueness II above, notably (Ex 2.3) and (Ex 2.4). To construct a co-reference between an ambiguous conceptual identifier and the instance identifiers it may indicate is an obvious solution. It can be discussed whether this should be done primarily from the point of view of the representation of the source in the data base or better within a knowledge graph interconnected with that data base.

As the expected probability with which one of a set of candidate place names will be addressed by an ambiguous (abbreviated) form will presumably change, when the conceptual framework behind the sources becomes clearer during successive analyses, the dynamic character of the weights in the co-reference between the abbreviated form and the full ones should be noted. The relationship between this problem and the problem of the dynamic extraction of membership functions out of the data (section 6.2 above) needs to be clarified further.

(Ex 3.4) *The source allows reading the occupation as “tailor” or “sailor”.*

Obviously all said under (Ex 3.1) applies. The example has been introduced to point to a special problem. A case like this may point to a reading difficulty. On the other hand, the reading may be completely clear, but it may be highly implausible – like a record of a sailor working in a land-bound occupation in the middle of the alps, which could indicate a scribal error. For this situation the mechanism of a reference explanation in definition 3.5 is provided.

**6.4 Handling of Uncertainty II**

(Ex 4.1) *The amount “25” is probably given in florin unless it is guilders.*

The specific difficulty of this example is, that the uncertainty is not so much on the representational level, as within the interpretative process deriving representations in different data types. (The token “fl.” can indicate the original Florentine “florin” as well as a couple of other historic denominations.) To model such a situation we have to conceptualize the interaction between the parsing algorithm and the parsing semantics of definition 2.3 in such a way, that the action of interpreting a single token in the native interpretation, can result in a co-reference of more than one token in another interpretation.

(Ex 4.2) *This cannot have happened much before 1618, though the text mentions “1590”.*

A rather convincing solution would be to create a possibilistic co-reference where “before equal 1618” carries a weight of 1.0 and “1590” a weight of 0.0. (Cf. section 6.9 below.)

(Ex 4.3) *A location “x-town” is mentioned. As that was founded much later another, earlier, settlement with the same name is probably referred to.*

The solution is obviously another co-reference. We have included the sample problem here, as, again, it is an uncertainty which appears only in a non-native interpretation (the spatial one or the reference to a point in a knowledge graph) the original string token being seemingly unambiguous.

(Ex 4.4) *The charter claims to assign a rent of type “x”, which occurs in all other documents only 100 years later.*

If this is obvious at the time the data are entered from the source, it is a simple case of defining a co-reference with a single token and weight. For considerations of {Ivory 2} (“frozen algorithm”) it would be useful to ponder, whether such a weight can be linked to an interpretative process, which watches for any further occurrence of that type of rent and adapts the weight automatically, if further instances dated earlier turn up.

(Ex 4.5) *Data points derived from different sources are of different validity, as the sources from which they are derived are of different quality.*

Trivial solution: Weigh the co-reference to the tokens differently. A broader perspective is opened, if we contemplate the possibility to use the bidirectionality (definition 3.4) of co-references to assign the weights automatically from the context in which the data has been created, if the “source” is modeled as such in the overall data. (I.e., if all tokens can be followed to a description of the source they are based upon from which the can inherit their weight or part of a weighting function.)

**6.5 ****Handling of ****Inconsistency ^{20}**

(Ex 5.1) *She was 25 or 35 years old.*

We have discussed in section 6.3 how we can handle contradictory data, when we focus on individual decisions to be based upon them, e.g.: “Is x greater than y?”. The related proposals for solutions we summarized under the label “uncertainty I”.

We speak of *inconsistency* when we contemplate the use of such data in the creation of complex chains of conclusions, which are not restricted to individual elements of a logical expression, but to the hypotheses about the interconnections between a (very) large number of data points, essentially between all that we have access to. The construction of such a set of conclusions is not an intermediate step within a research process, but the purpose of the research process as a whole.

This means: If today we decide based upon one of the two contradictory tokens about somebodies age, that the person this age is ascribed to in a source has to be identical with another person represented by the another set of tokens somewhere in the data, we have to remember that this decision was based upon just *one* of a set of contradictory tokens, if two years hence we discover, that the assumption of the identity between the two persons in the sources is discovered to vulnerate a condition posed by the existence of another data item. As soon as that is discovered, it is worthwhile to check, whether an identification of the two persons erroneously considered identical so far becomes possible with other persons appearing in the source.^{21}

This describes the core of {Ivory 5}, the possibility to proceed with a program in parallel with the results of arbitrarily many different possible outcomes of a non-binary decision. The current model is still tentative here. The mechanism of decision snapshots (definitions 4.5 and 4.6) allows to re-connect from an edge to be destroyed as result of the discovery of an inconsistency as described in the preceding paragraph. (For an extensive example see section 2.4.) Ultimately this is the conventional solution, however. Full support for inconsistency would expect that the chains of reasoning allowed by all contradictory arguments are all developed up to some degree, where they are kept pending and taken up when additional decision criteria turn up. The definition of this mechanism needs considerably more work.

(Ex 5.2) *An event took place before Easter 1435 or Easter 1453.*

When this is handled as an inconsistency, no additional considerations beyond those raised with the previous example apply.

(Ex 5.3) *They bought iron tools in “the market”. (Assuming three towns with names containing “market” qualifying.)*

In example 3.3, interpreting the same situation as an uncertainty, we pointed to the possibility, that the weights resembling the different probabilities / possibilities for selecting the three full place names in the knowledge base, should be connected to a dynamic evaluation of the usage of the source terms. In the handling of this example as contradictory information, this dynamic evaluation has also to have the possibility to trigger the decision snapshot mechanism as described in (Ex 5.1) above. This means that the mechanism works two ways: it can be used to handle the effects of the destruction of an edge, but it can also be use to re-weigh or destroy an edge created previously.

(Ex 5.4) *The source allows reading the occupation as “tailor” or “sailor”.*

When this is handled as an inconsistency, no additional considerations beyond those raised in (Ex 5.1) apply.

(Ex 5.5) *The name of one individual spelled differently in different sources.*

The difference between (Ex 5.4) and this one is, that in (Ex 5.4) the contradiction is (almost certainly) created by a reading problem, that is, by the process of representing the data on in an information system for interpretation. Differences in spelling can point to a systematic difference between two sources of data. Handling of this requires the properties of a source, as discussed with (Ex 4.5) in section 6.4

(Ex 5.6) *Variance between witnesses of a text.*

This example has been included to identify a missing element in the current model. For simplicity’s sake, we have assumed in definition 1.1 that all data are represented in the standard data types of currently available programming languages and their standard libraries. The problem created by differences between witnesses is, that variation and contradiction arise on a level *below* the classical data type string. This should eventually be taken care of by the mechanism proposed in {Ivory 6 and 7}. Simplifying the matter very much: replacement of conventional strings by strings interpreted within a system of standoff markup. But so far there are a number of issues preventing a consistent plan for such a substitution.

**6.6 ****Handling of ****Incompleteness**

(Ex 6.1)* The case where an attribute of an entity must have existed, is unknown, however – a person for which no information about the age can be found in the sources.*

This is precisely the case for which the “explicit default” mechanism described by definitions 5.1 – 5.5 has been created.

(Ex 6.2)* The case where an attribute can partially be derived from another attribute – a person who died in 1754 must have been born before that.*

This is precisely the case for which the “triggered default” mechanism described by definitions 5.6 – 5.8 has been created.

(Ex 6.3) *The case where a potential attribute is not applicable – every person can have a marriage date, does not have to, however.*

This case is handled by the <Not_Applicable> value provided by definition 1.1.

**6.7 ****Handling of ****Polyvalence I**

(Ex 7.1) *A topographical name is given as “Ansbach-Bayreuth”. Within any procedures interpreting character strings, this is a name. It indicates indirectly a temporal frame of between 1792 and 1806, as the political entity with that name existed only than. Within procedures interpreting spatial information, it is an area that can be processed for spatial queries.*

This is the reason for and taken care of by definition 2.1, the fundamental assumption that any token can have parallel interpretations in all data types that can be processed.

(Ex 7.2) *A calendar date is given as “Tuesday before St. X.”. Within textual processing, the character string can indicate the occurrence of a patron saint considered to be indicative of a given ruling family. Within temporal processing it is a temporal data point.*

The difference between this example and the previous one is, that the previous one has no implications for the data structure. The textual interpretation of (Ex 7.2) implies the existence of an additional data point, “superordinated political unit”. This is a special case for the “triggered default” mechanism described by definitions 5.6 – 5.8.

**6.8 ****Handling of ****Polyvalence II:** A data point can be interpreted as a value of more than one conceptual variable.

(Ex 8.1) *It is unclear, whether the character string “Adenau” refers to a surname or a place of origin.*

Definition 4.3 provides rules to connect a token to more than one node or edge.

(Ex 8.2) *It is unclear, whether the date “March 23*^{rd}*, 1764” refers to the date a will was drawn up or to the date of death of the deceased.*

As triggered defaults (definitions 5.6 – 5.8) are triggered when a token is linked to a node or edge, the same token may trigger different actions when linked to different nodes or edges. Dates of death are quite likely to trigger plausibility constraints, which the date a will was drawn up would not.

(Ex 8.3) *It is unclear, whether the character string “Kolding” refers to a place of birth or to the previous place of residence.*

The example has been inserted as a reminder that this type of polyvalence may lead to an increase of vagueness in computed results. (E.g. leading to increased fuzziness of the distance between a place of residence and the “place of origin”.) This has to be taken care of by the derived operations introduced by definition 2.6.

(Ex 8.4) *It is unclear, whether an attribute – the date of birth, e.g. – refers to one person or to another.*

This case is different from (Ex 8.1) from the point of view of a database which strongly separates between “records” and “fields”. As this difference is less pronounced in the graph based model defined here definition 4.3 applies here as well.

(Ex 8.5) *It is unclear, whether a group of data points refers to an object of one type or of another. (Is a person described in a list the son of the preceding one or a hired helper?)*

For the same reasons as in the previous example the mechanism enabled by definition 4.3 applies here as well.

**6.9 Negations: **A data point defines a value which is *not* applicable but offers no clue on which value should be used instead.

*(Ex 9.1) A source – or the comment of a researcher – raises a negative claim:”not present in March 1820”.*

Negative information can easily be *represented* by co-references with a weight of 0.0. To make it possible to *process* such data consistently, the concepts of {Ivory 1} have to be fully supported.

**Appendix ****I****: ****Embedded Graphs**

In discussing (Ex 2.4) in section 6.2 we have already noticed that the data structure presented can be seen as a mixture of an abstract graph describing the relationships between concepts and a concrete one describing relationships between instances of tokens. Besides vagueness and uncertainty historical sources – indeed historical layers of language – frequently contain expressions which are using, implicitly or explicitly, analogical or metaphoric construction. This is of quite fundamental importance as there are good reasons to assume that for human thinking analogies and metaphors are central [Lakoff 1980], [Fauconnier 2002]. It will ultimately not be possible to process historical sources without the ability to represent and process these constructions.

To process them I proposed recently in [Thaller 2020] to envisage embedding a graph representing tokens of the kind we discuss here into an *n*-dimensional space. This is *mathematically* pointless; the very notion of a graph emphasizes that it is dimensionless. Between *concepts* represented in a graph there are distances, however, which can not be described satisfactorily by the number of intervening nodes. We have argued already in section 4, that graphs as data structures are not bound by any graph theoretical considerations – even if such data structures enable the implementation of graph theoretical methods.

Definition 1.5 asks, therefore, that the basic building blocks out of which the graph structures we have described are built can have co-ordinates in an *n*-dimensional space within which distances can be computed.

The notion that the process of concept formation in the human mind is based upon geometric analogies has been argued in detail by [Gardenförs 2000] and [Gardenförs 2014]. [Holmqvist 1993] has submitted a thesis which explores possibilities to implement some of the concepts, albeit focusing more on the general aspects of the cognitive grammar of [Langacker 1987], with many connections to [Lakoff 1980] and [Fauconnier 2002].

Gardenförs emphasizes the importance of relationships within standard geometry for the understanding of humane perception very much and insists therefore time and again, that conceptual spaces have to be convex; [Hernández-Conde 2017] has convincingly argued that this restricts the model unnecessarily much. In the case of the model presented here I am arguing for an *n*-dimensional space which is further removed from classical geometry, as described in [Thaller 2009].

**Appendix I****I****: Vague Numbers in κλειω.**

Externally a numeric term can be a decimal number or one of a couple of notations as they appear in historical sources. Among others: “12.3.4” for three-level- systems of measurements (e.g. pound, shillings, pennies); mixtures of decimal numbers and qualifiers, e.g. “3guilders”; expressions like “1200square_feet * 0.2.1”.

Independent of the notation of a single numeric term, any item of the data type “number” consists of two terms: a minimally possible numeric value and a maximally possible numeric value. (For the – many – cases, where the two terms coincide, that is, where a precise value is known, suitable optimizations exist.)

So the basic form of a number x is “Min(x) – Max(x)”.

Each such number can be qualified by a set of fuzzifiers – equal, circa, greater, less – which can be combined, which brings the definition of the value of the datatype number to “fuzzification Min(x) – Max(x)”, as in the example “circa 5 – 10”. In the internal representation of such a number, the fuzzifiers are resolved as modification of the numerical values used, according to the following table where the underscore is used to Indicate a numeric range, to be different from the minus sign for subtraction:

“<no fuzzifier> [Min] _ [Max]” ==> “[Min] _ [Max]”

“equal [Min] _ [Max]” ==> “[Min] _ [Max]”

“circa [Min] _ [Max]” ==>

“( [Min] – ( [Min] * 0.10) ) _ ( [Max] + ( [Max] * 0.10) )”

“greater [Min] _ [Max]” ==> “[Min] _ ( [Max] + ( [Max] * 0.10) )”

“less [Min] _ [Max]” ==> “( [Min] – ( [Min] * 0.10) ) _ [Max]”

“equal circa [Min] _ [Max]” ==>

“( [Min] – ( [Min] * 0.05) ) _ ( [Max] + ( [Max] * 0.05) )”

“equal greater [Min] _ [Max]” ==> “[Min] _ ( [Max] + ( [Max] * 0.05) )”

“equal less [Min] _ [Max]” ==> “( [Min] – ( [Min] * 0.05) ) _ [Max]”

“circa greater [Min] _ [Max]” ==>

“( [Min] – ( [Min] * 0.05) ) _ ( [Max] + ( [Max] * 0.10) )”

“circa less [Min] _ [Max]” ==>

“( [Min] – ( [Min] * 0.10) ) _ ( [Max] + ( [Max] * 0.05) )”

“equal circa greater [Min] _ [Max]” ==>

“( [Min] – ( [Min] * 0.025) ) _ ( [Max] + ( [Max] * 0.10) )”

“equal circa less [Min] _ [Max]” ==>

“( [Min] – ( [Min] * 0.10) ) _ ( [Max] + ( [Max] * 0.025) )”

In any case, after the application of the fuzzification, if present, we have as basic form of a number x “[Min] – [Max]”.

If we compare two of them – A and B, we have a comparison of [Min(A)] – [Max(A)] with [Min(B)] – [Max(B)]

The comparison operators are implemented as:

“A == B” ==> “[Min(A)] == [Min(B)] && [Max(B)] == [Max(B)]”

– only the two numeric values are compared

“A equal B” ==> “[Min(A)] == [Min(B)] && [Max(B)] == [Max(B)]”

– fuzzifiers have to be identical as well; general properties of the “fields” and “objects” in the definition of the data model containing the number have to be identical as well.

For all other comparisons the additional properties mentioned in the “A equal B” operator are ignored. These are:

“A circa B” ==>

“[Min(A)] <= [Max(B)] && [Max(A)] >= [Min(B)] ||

[Min(B)] <= [Max(A)] && [Max(B)] >= [Min(A)]”

“A greater B” ==> “[Min(A)] > [Max(B)]”

“A less B” ==> “[Max(A)] < [Min(B)]”

“A equal circa B” ==>

“[Min(A)] <= [Min(B)] && [Max(A)] >= [Max(B)] ||

[Min(B)] <= [Min(A)] && [Max(B)] >= [Max(A)]”

“A equal greater B” ==> “[Min(A)] >= [Max(B)]”

“A equal less B” ==> “[Max(A)] <= [Min(B)]”

“A circa greater B” ==> “[Max(A)] >= [Min(B)]”

“A circa less B” ==> “[Min(A)] <= [Max(B)]”

“A equal circa greater B” ==> “[Max(A)] >= [Max(B)]”

“A circa less B” ==> “[Min(A)] <= [Min(B)]”

The arithmetic operators provided are defined as follows:

“R = A + B” ==>

“[Min(R)] = [Min(A)] + [Min(B)]” and “[Max(R)] = [Max(A)] + [Max(B)]”

“R = A – B” ==>

“[Min(R)] = [Min(A)] – [Min(B)]” and “[Max(R)] = [Max(A)] – [Max(B)]”

“R = A * B” ==>

“[Min(R)] = [Min(A)] * [Min(B)]” and “[Max(R)] = [Max(A)] * [Max(B)]”

“R = A / B” ==>

“[Min(R)] = [Min(A)] / [Min(B)]” and “[Max(R)] = [Max(A)] / [Max(B)]”

**Appendix I****I****I: The Ivory Stack – and Why this Paper was Written**

In the blog [Thaller 2018] I have proposed to understand “information” as appearing in historical sources as inherently different from information as arising out of contemporary processes. (That contemporary information from many knowledge domains may actually share more with historical information than with information arising from the hard sciences notwithstanding.)

I claim that therefore many of the concepts out of which the conceptual stack behind modern information technology has been constructed vulnerates some of the primary properties of historical information. If this is so, the conceptual stack needs to be replaced. And as a conceptual stack which is not implemented is in my opinion pointless, I started by sketching extremely roughly what concrete tasks would have to be solved, to support such a knowledge stack.

Trying to design a consistent understanding of the implications of “historical information” the following building blocks have been identified:

__Research proposal in software technology 1:__

(Short: Ivory Infon – IvI)

Implement __infons__ for seamless usage in main stream programming languages. (Or possibly __situations__, the slightly more complex abstraction which actually is the subject of the bulk ofDevlin’s theory, omitted here to simplify the argument.) [cf. Devlin 1991]

__Research proposal in software technology 2:__

(Short: Ivory Frozen – IvF)

Represent information as a set of conceptually permanently running algorithms, the state of which can be frozen and stored.

__Research proposal in software technology 3:__

(Short: Ivory Numbers – IvN)

Implement grey numbers, or a derivation from them, and integrate them seamlessly into main stream programming languages. [cf. Liu and Lin 2006; Liu and Lin 2011]

__Research proposal in software technology 4:__

(Short: Ivory Terms – IvT)

Implement linguistic variables and integrate them seamlessly into main stream programming languages, as permanently accessible data type in all parts of the flow of execution. Base the implementation on a generalized concept of uncertainty, which broadens the scope of Zadeh’s theory of that name. [cf. Zadeh 2005]

__Research proposal in software technology 5:__

(Short: Ivory Control – IvC)

Design genuinely fuzzy control structures and integrate them seamlessly into main stream programming languages.

__Research proposal in software technology 6:__

(Short: Ivory Graphs – IvG)

Provide tools for the easy handling of graphs supporting co-references in main stream programming languages. (Called co-edges in the original document.)

__Research proposal in software technology 7:__

(Short: Ivory Markup – IvM)

Develop a representation of “information objects”, where a data object of arbitrary dimensionality can be combined with interpretative layers in such a way, that the data object can be changed without damaging these layers.

__Research proposal in software technology 8:__

(Short: Ivory Links – IvL)

Generalize the solution of research proposal 7 to handle graphs of objects of inhomogeneous dimensionality.

__Research proposal in software technology 9:__

(Short: split into Ivory Objects – IvO and Ivory Virtual System Calls – IvV)

Look upon possibilities to extend the object oriented paradigm of programming into a context

oriented one. Intuitively speaking by two approaches:

(1) Augmenting the “private” and “public” sections of classes by a “context” section, which provides an interface between classes outside of their lines of inheritance.

(2) Providing a possibility for “virtual system calls”, which provide interfaces into tools, which can be shared between programs in different programming languages.

Of these research proposals numbers 1 – 7 have been considered in the current document. Proposals 8 and 9 have more the character of enabling technologies which are addressed only implicitly, proposal nine split into two for practical reasons. For the overall project described this paper serves the following need.

Trying to arrive at a stage where one could start tackling actual programming tasks in the nine projects defined, it turned out that it was relatively easy to start exploring the necessary solutions individually, but time and again work had to be postponed, as it became clear, that the different projects would have so many interdependencies to make it extremely difficult to start somewhere without an concept, how the different blocks would interact. The current document was written to provide exactly such an concept.

The term “Ivory Stack” has been the result of an intermediate stage, trying to explore how the interdependencies of concrete attempts at implementation could be described, resulting in the following diagram:

**Footnotes**

1 The following examples are drawn from the domain of historical research. To compare this classification with those generally used in AI the introduction to [Krause 1993], notably p. 1-14, is very useful. Generally I received many valuable ideas from that volume. See also [Ma 2012] p. 7.

2 As an introduction on fuzzy spatial information see: [Guesgen 1996]. Besides the volume as a whole in which this has been published, I found particularly useful [Glagowski 1996], [Sun 2008], [Cobb 1998], [Di Martino 2014], [Wang 1999].

3 In the context of AI research the selection of the most appropriate methods and technologies for the handling or vague and uncertain information has for long time up to a degree been considered a question of taste by some (my formulation). See: “Namely, that different formalisms should be viewed as alternatives, and that the choice of an adequate technique for a particular application is a meta-level decision.” p. 59 of the [Responses 1988] to be read in the context of [Saffiotti 1987] and [Saffioti 1988].

4 Good intuitive introduction: [Doherty 2006]; fundamental: [Pawlak 1982], [Pawlak 1985].

5 An early but still very readable introduction to belief values [Shafer 1990]. Notice the introductory example p. 328 /329.

6 A good example for recent attempts to integrate the various approaches: [Shen 2011].

7 447 BC – 438 / 433 BC

8 See Appendix II for the full definition of the handling of numbers in κλειω.

9 We notice in [Rundensteiner 1989] that also an early attempt to apply fuzzy measures to relational databases engages closely with graph theoretical concepts.

10 Note generally [Giunchiglia 2004] for the observation that graphs have proven useful for the handling also of non-primarily graph-oriented data structures.

11 Another example for such an ancillary use of graphs as databases is the relatively recent work on probabilistic graphical models / Chain Graph Models to handle complex probabilistic models: [Lappenschaar 2014], [Sonntag 2016].

12 The concept of a mapped token will not be discussed or used further in this document. It is introduced as a necessary condition for the link to the concepts in {Ivory 6 and 7} as briefly addressed in section 6.5.

13 As possibilistic measures are still relatively unknown a short explanation, reflecting the example Zadeh used for their connection to Fuzzy theory. For a given person there are four probabilities, that she will eat 0, 1, 2, or 3 eggs for breakfast in the morning. Say {0.1, 0.3, 0.5, 0.1 }. As usual these probabilities sum up to 1.0. Independent of that, due to the fact that she is a homo sapiens and all biological restrictions apply, there are possibilities for her to eat 0, 1, 2, 3, 4 … 120 eggs for breakfast, say { 1.0, 1.0, 1.0, 0.9, 0.8, … 0.0} before getting sick. These do *not *sum to 1.0 and are independent of each other. See: [Zadeh 1978], [Dubois 1986].

14 Notice [Ma 2016] p. 406 on problems of representing RDF data in standard (i.e. non-hyper) graphs.

15 For completeness sake I would like to acknowledge various attempts to modify Fuzzy Set theory in such a way as to increase the vagueness of membership function terms, avoiding the need to express vagueness with a number with many decimals, which intuitively introduces the feeling, that decisions have to be specified even more precisely than in a crisp world. Besides Zadeh’s own Type 2 Fuzzy Sets [Zadeh 1975], which make the membership functions themselves Fuzzy, particularly Hesitant Fuzzy Sets [Torra 2010], [Herrera 2014] introduce possibilities to express membership functions which can be ambiguous. One should emphasize, that these approaches, seemingly even more esoteric than Fuzzy Sets look to a reader with a Humanities background, link directly into practical applications in the semantic web: [Lee 2010].

16 On the relevance of the context of data items for handling vagueness and uncertainty see [Gebhardt 1993].

17 For a more formal treatment of a large system, where decision processes remain “hanging” it will probably be worthwhile to look into the direction of non-monotonic logics. Rather plausible for its close connection to a programming model [Mott 1988].

18 Rough Sets – [Doherty 2006], [Pawlak 1982], [Pawlak 1985] – define an upper and a lower limit for the applicable values. This policy of defining two limiting values occurs also in other approaches towards the handling of Fuzzyness. Intuitionistic Fuzzy Sets [Atanassov 1986], [Atanassov 2012] replace the single membership function by a pair of membership function and non-membership function, where the two values of the functions for one term do *not* have to add up to 1.0. In possibility theory the straightforward concept of possibility can be replaced by a pair of possibility / necessity functions, which also represent upper and lower bounds [Dubois 1990], [Magrez 1989].

19 A further connection could be drawn to the notion of active documents, as discussed in [Ciancarini 2002]. Connecting the two concepts we might discuss knowledge objects which consist of segments of of mixed rule- / factbases prepared for merger.

20 Notice [Hunter 2000] p. 333 for the observation that advanced logic models for para consistency are useful only, if embedded in a broader model for the representation of inconsistency.

21 On the formal level notice the concept of “Belief Revision” as in [Flappa 2011].

**Bibliography**

[Adamo 1980] Jean Marc Adamo: “L.P.L. A fuzzy Programming Language: 1 Syntactic Aspects,” and “L.P.L. A fuzzy Programming Language: 2 Semantic Aspects,” in: *Fuzzy Sets and Systems* 3 (1980) 151-179, 261-289.

[Atanassov 1986] Krassimir T. Atanassov: “Intuitionistic Fuzzy Sets”, in: *Fuzzy Sets and Systems* 20 (1986) 87-96.

[Atanassov 2012] Krassimir T. Atanassov: *On Intuitionistic Fuzzy Sets Theory*, Springer, 2012.

[Bradley 2014] John Bradley: “Silk Purses and Sow’s Ears: Can Structured Data Deal with Historical Sources?”, in: *International Journal of Humanities and Arts Computing* 8 (2014) 13–27.

[Bretto 2013] Alain Bretto: *Hypergraph Theory*, Springer, 2013.

[Ciancarini 2002] Paolo Ciancarini et al.: “A survey of coordination middleware for XML-centric applications”, in: *Knowledge Engineering Review* 17 (2002) 389-405.

[Cobb 1998] Maria A. Cobb and Frederick E. Petry: “Modeling Spatial Relationships within a Fuzzy Framework”, in: *Journal of the American Society for Information Science* 49 (1998) 253–266.

[Devlin 1991] Keith Devlin: *Logic and Information*, Cambridge, 1991.

[Di Martino 2014] Ferdinando Di Martino and Salvatore Sessa: “Type-2 interval fuzzy rule-based systems in spatial analysis”, in: *Information Sciences* 279 (2014) 199-212.

[Ding et al. 1996] Liya Ding et al.: “A Prolog-like inference system based on neural logic- An attempt towards fuzzy neural logic programming”, in: *Fuzzy Sets and Systems* 82 (1996) 235-251.

[Doherty 2006] Patrick Doherty et al.: *Knowledge Representation Techniques*, Springer, 2006 (= Studies in Fuziness and Soft Computing 202).

[Dubois 1986] Didier Dubois and Henri Prade: Possibility Theory, Plenum, 1986.

[Dubois 1990] Didier Dubois and Henri Prade: “Resolution Principles in Possibilistic Logic”, in: *International Journal of Approximate Reasoning* 4 (1990) 1-21,

[Dupin de Saint-Cyr] Florence Dupin de Saint-Cyr andHenri Prade: “Logical handling of uncertain, ontology-based, spatial information”, in: Fuzzy Sets and Systems, 159 (2008) 1515 – 1534.

[Falappa 2011] Marcelo A. Falappa et al.: “On the evolving relation between Belief Revision and Argumentation”, in: *Knowledge Engineering Review* 26 (2011) 35-43.

[Fauconnier 2002] Gilles Fauconnier and Mark Turner: *The Way We Think. Conceptual Blending and the Mind’s Hidden Complexities*, New York, 2002.

[Ferson 2015] Scott Ferson et al.: “Natural language of uncertainty: numeric hedge words”, in: *International Journal of Approximate Reasoning* 57 (2015) 19–39.

[Gardenförs 2000] Peter Gardenförs: *Conceptual Spaces*, MIT Press, 2000.

[Gardenförs 2014] Peter Gardenförs: *The Geometry of Meaning*, MIT Press, 2014.

[Gebhardt 1993] Jiirg Gebhardt and Rudolf Kruse: “The Context Model: An Integrating View of Vagueness and Uncertainty”, in: *International Journal of Approximate Reasoning* 9 (1993) 283-314.

[Giunchiglia 2004] Fausto Giunchiglia and Pavel Shvaiko: “Semantic Matching”, in: *Knowledge Engineering Review* 18 (2004) 265-280.

[Glagowski 1996] Terry G. Glagowski et al.: “A New Method for Implementing Fuzzy Retrieval from a Spatial Database”, in: *Information Sciences* 88 (1996) 209-225.

[Guesgen 2005] Hans Werner Guesgen: “Fuzzy Reasoning about Geographic Regions”, in: Frederick E. Petry et al. (eds.): *Fuzzy Modeling with Spatial Information for Geographic Problems*, Springer, 2005.

[Hernández-Conde 2017] José V. Hernández-Conde: “A case against convexity in conceptual spaces”, in: *Synthese* 194 (2017) 4011-4037.

[Herrera 2014] Francisco Herrera et al. (eds.): “Special Issue on Hesitant Fuzzy Sets”, *International Journal of Intelligent Systems* 29 (2014) 493-595.

[Holmqvist 1993] Kenneth B.I. Holmqvist: *Implementing Cognitive Semantics*, Lund University, 1993.

[Hunter 2000] Anthony Hunter: “Reasoning with inconsistency in structured text”, in: *Knowledge Engineering Review* 15 (2000) 317-337.

[Kitagawa 1989] Hiroyuki Kitagawa and Tosiyasu L. Kunii: *The Unnormalized Relational Data Model*, Springer, 1989.

[Kohlas 1995] Jiirg Kohlas and Paul-Andre Monney: *A Mathematical Theory of Hints, *Springer, 1995.

[Krause 1993] Paul Krause and Dominic Clark: *Representing Uncertain Knowledge*, Springer, 1993.

[Kruse 1991] Rudolf Kruse et al.: *Uncertainty and Vagueness in Knowledge Based Systems*, Springer. 1991.

[Langacker 1987] Ronald W. Langacker: *Foundations of Cognitive Grammar*, Volume 1, *Theoretical Prerequisites,* Stanford University Press, 1987.

[Langacker 1991] Ronald W. Langacker: *Foundations of Cognitive Grammar*, Volume 2, *Descriptive Application, *Stanford University Press, 1991.

[Langacker 2008] Ronald W. Langacker: *Cognitive Grammar: A Basic Introduction*,Oxford University Press, 2008.

[Lappenschaar 2014] Martijn Lappenschaar et al.: “Qualitative chain graphs and their application”, in: *International Journal of Approximate Reasoning* 55 (2014) 957–976.

[Lakoff 1980] George Lakoff and Mark Johnson: *Metaphors we Live by*, Chicago, 1980.

[Lakoff 1980] George Lakoff: *Women, Fire and Dangerous Things. What Categories Reveal about the Mind*, Chicago, 1987.

[Lee 2010] Chang-Shing Lee et al.: “Diet Assessment Based on Type-2 Fuzzy Ontology and Fuzzy Markup Language”, in: *International Journal of Intelligent Systems* 25 (2010) 1187–1216.

[Lin 2006] Sifeng Liu and Yi Lin: *Grey Information. Theory and Practical Applications*, London, 2006.

[Lin 2011] Sifeng Liu and Yi Lin: *Grey Systems. Theory and Practical Applications*, London, 2011.

[Long 1989] Derek Long: “A review of Temporal logics”, in: *Knowledge Engineering Review* 4 (1989) 141-162.

[Ma 2012] Zong Min Ma et al.: “An overview of fuzzy Description Logics for the Semantic Web”, in: *Knowledge Engineering Review* 28 (2012) 1-34.

[Ma 2016] Zong Min Ma et al.: “Storing massive Resource Description Framework (RDF) data: a survey”, in: *Knowledge Engineering Review* 31 (2016) 391-413.

[Magrez 1989] Paul Magrez and Philippe Smets: “Epistemic Necessity, Possibility, and Truth. Tools for Dealing with Imprecision and Uncertainty in Fuzzy Knowledge-Based Systems”, in: *International Journal of Approximate Reasoning* 3 (1989) 35-57.

[Moss 1994] Chris Moss: *Prolog++*, Addison-Wesley, 1994.

[Mott 1988] Peter Mott: “Default non-monotonic logic”, in: *Knowledge Engineering Review* 3 (1988) 265-284.

[Nagypál 2003] Gábor Nagypál and Boris Motik: “A Fuzzy Model for Representing Uncertain, Subjective, and Vague Temporal Knowledge in Ontologies”, in: Meersmann et al. (eds.): *On The Move to Meaningful Internet Systems*, Springer, 2003, 906–923.

[Pasin 2015] Michele Pasin and John Bradley: “Factoid-based prosopography and computer ontologies: towards an integrated approach”, in: *Digital Scholarship in the Humanities* 30 (2015) 86-97.

[Pawlak 1982] Zdzisław Pawlak: “Rough Sets”, in: *International Journal of Parallel Programming* 11 (1982), 341-356.

[Pawlak 1985] Zdzisław Pawlak:* “*Rough Sets and Fuzzy Sets*” **in:** Fuzzy Sets and Systems *17 (1985) 99-102.

[Responses 1988] “Responses to ‘An Al view of the treatment of uncertainty’ by Alessandro Saffiotti”, in: *Knowledge Engineering Review* 3 (1988) 59-86.

[Rundensteiner 1989] Elke A. Rundensteiner et al.: “On Nearness Measures in Fuzzy Relational Data Models”, in: *International Journal of Approximate Reasoning* 3 (1989) 267-298.

[Shafer 1976] Glenn Shafer: *A Mathematical Theory of Evidence*, Princeton, 1976.

[Shafer 1990] Glenn Shafer: “Perspectives on the Theory and Practice of Belief Functions”, in: *International Journal of Approximate Reasoning* 4 (1990) 323-362.

[Sun 2008] Haibin Sun: “Computational models for computing fuzzy cardinal directional relations between regions”, in: *Knowledge-Based Systems* 21 (2008) 599–603.

[Saffiotti 1987] Alessandro Saffioti: “An Al view of the treatment of uncertainty”, in: *Knowledge Engineering Review* 2 (1987) 75-97.

[Saffiotti 1988] Alessandro Saffioti: “The treatment of uncertainty in Al: Is there a better vantage point?”, in: *Knowledge Engineering Review* 3 (1988) 87-91.

[Seising 2012] Rudolf Seising and Veronica Sanz (eds.): *Soft Computing in Humanities and Social Sciences*, Springer, 2012 (= *Studies in Fuzziness and Soft Computing* 273).

[Shafer 1990] Glen Shafer: “Perspectives on the Theory and Practice of Belief Functions”, in: *International Journal of Approximate Reasoning* 4 (1990) 323-362.

[Shen 2011] Yonghong Shen and Faxing Wang: “Rough approximations of vague sets in fuzzy approximation space”, in: *International Journal of Approximate Reasoning* 52 (2011) 281–296.

[Skowron 2005] Andrzej Skowron: “Rough Sets and Vague Concepts”, in: *Fundamenta Informaticae* 64 (2005) 417–431.

[Sonntag 2016] Dag Sonntag and Jose M. Peña: “On expressiveness of the chain graph interpretations”, in: *International Journal of Approximate Reasoning* 68 (2016) 91–107.

[Thaller 1993] Manfred Thaller: *κλειω. A Database System*, St. Katharinen 1993 (= *Halbgraue Reihe zur Historischen Fachinformatik* B 11).

[Thaller 2009] Manfred Thaller: “The Cologne Information Model: Representing Information Persistently”, in: Manfred Thaller (ed.) *The eXtensible Characterisation Languages – XCL*, Hamburg, 2009, 223-39. Reprinted under the same title in: *Historical Social Research *Supplement 29 (2017) 344-356.

[Thaller 2018] Manfred Thaller: “On Information in Historical Sources“ Blogentry: https://ivorytower.hypotheses.org/56 (accessed June24^{th} 2020).

[Thaller 2020] Manfred Thaller: “Über Metaphern (und die Voraussetzungen für ihre Verwendung in der Informationstechnologie)”, in: * Festschrift für Jan Christoph Meister, * Hamburg, 2020, at press.

[Torra 2010] Vincenç Torra: “Hesitant Fuzzy Sets”, in: *International Journal of Intelligent Systems* 25 (2010) 529–539.

[Wang 1999] Xiaomei Wang and James M. Keller: “Human-based spatial relationship generalization through neural / fuzzy approaches”, in: *Fuzzy Sets and Systems 101* (1999) 5-20.

[Yager 1982] Ronald R. Yager: “Linguistic Hedges: Their Relation to Context and Their Experimental Realization”, in: *Cybernetics and Systems* 13 (1982) 357-374.

[Zadeh 1965] Lotfi A. Zadeh: “Fuzzy Sets”, in: *Information and Control* 8 (1965) 338-353.

[Zadeh 1975] Lotfi A. Zadeh: “The Concept of a Linguistic Variable and its Application to Approcimate Reasoning”, I – III, in: *Information Sciences* 8 (1975) 199-249, 301-357, 9 (1975) 43-80.

[Zadeh 1978] Lotfi A. Zadeh: “Fuzzy Sets as a Basis for a Theory of Possibility”, in: *Fuzzy Sets and Systems *1 (1978), 3-28.

[Zadeh 1999] Lotfi A. Zadeh and Janusz Kacprzyk (eds.): *Computing with Words in Information / Intelligent Systems* I and II, Springer, 1999 (= *Studies in Fuzziness and Soft Computing* 33 and 34).

[Zadeh 2005] Lotfi A. Zadeh: “Toward a Generalized Theory of Uncertainty (GTU) – an outline”, *Information Sciences* 172 (2005) 1-40.

[Zambonelli 2004] Franco Zambonelli and H. Van Dyke Parunak: “Towards a paradigm change in computer science and software engineering: a synthesis”, in: *Knowledge Engineering Review* 18 (2004) 329-342.