Friday, September 22, 2023

Type Theory Basics

http://diaspar.42web.io/files/hott.html

On the most general concept of a formal logic

Let us look at 'logic' from the most abstract and general perspective possible. We fix a set of tokens $T$. We also consider a well-defined (computable) subset $ S \subset T^*$ of well-formed expressions. But now we must fix from the start a stratification of $S$ into disjoint syntactic categories $S_i$, $i=0,...n$, where $S_0$ is considered as a distinguished subset of $T^*$. In general a string in $S_i$ will have substrings in $S_j$ for $j> i$ but also possibly substrings in $S_m$ for $m \geq i$. We expect a general element of $S_0$ to be a concatenation of syntactic symbols $\square$ (tokens in the strings of $S_0$ which are not in the other $S_i$) and strings in the $S_i$: $ s_1\square_1 s_2\square_2... \square_k s_k$ and likewise for the $s_i$ until reaching atomic or indecomposable syntactic categories. We may think of the above decomposition in $S_0$ as expressing a sentence as a predicate with arguments. A type 1 logic $\mathbb{L}$ is a finite set of partial computable functions $\{L_m\}_{m \in M}$ \[ L_m: S_0^n \times P \rightarrow S_0\] where $n\geq 0$ is the arity of $L$ and $P$ is a subset of $S^p$ called the parameter space of arity $p \geq 0$. We can also write $L (s,p) = \bullet$ when $L$ is undefined. We write $L_{m,n,p}$ to indicate the index and arities. It is straightforward to define the notion of proof (either in linear or tree-form) and proof from a given set of hypotheses. For $H \subset S_0$ let $D(H)$ denote the set of all provable elements of $S_0$ using $H$. In general $D(\emptyset)$ will generalize both axioms and definitions (but there are some subtleties here). Note that if $s \in D(A)$ and for all $a \in A$ we have $a \in D(B)$ then $s \in D(B)$ (cut rule). If we interpret $S_0$ as 'sentences' then then type 1 logic have the problem of not being able to deal directly with discharging hypotheses natural deduction style. So instead we can generalize sequent calculus style logic by considering a further syntactic category $S_{-1}$ for sequents, that is, elements of $S_0^\star \times S_0^\star$. Then a type 2 logic is defined in terms of partial computable functions $\{L_m\}_{m \in M}$ \[ L_m: S_{-1}^n \times P \rightarrow S_{-1}\] In what follows we work with $S_0$ but there should, we hope, be no difficulty adapting the definitions to type 2 logics. It is clear that these two types of logic are directly based on Hilbert style and sequent calculi. We could of course also consider a type 3 logic based on natural deduction, that is, incorporating dependencies and discharging. We can look at logic from an even more general point of view. Take $S_T$ to be an concatenations of elements of $S_0$ (sentences). Then a logic is a computable relation \[ S_T \vdash_L S_T \] We can think of this relation as a 'transformer' in the sense of computational linguistics. A logic gives as for each set of sentences legitimate completion in the form of further text, a map $ L : S_T \rightarrow \mathbf{P} S_T$. It is now our task to give the most general and abstract versions of the usual logical notions. The easiest is conjunction. A logic is called conjunctive if there is a total computable function $\& : S_0 \times S_0 \rightarrow S_0$ such that for all $s,s' \in S_0$ we have that $ D(\{s,s'\}) = D(s)\cup D(s')$. A logic has absurdity if there is a $abs \in S_0$ such that $D(abs) = S_0$. An important property of $S$ involves abstraction. Given a $t \in S_i$ occurring in $s \in S_0$. Consider the set $\lambda t. s$ of all strings in $T^*$ obtained from $s$ by replacing $t$ with elements of $S_i$. Then $S$ has abstraction if $\lambda t.s \subset S_0$ for all such $t$ and $s$. A logic is called universalizing if for any $s \in S_0$ and $t \in S_i$ occurring in $s$ there is a $u(s,t) \in S_0$ such that $\lambda t.s \subset D(u(s,t)) $. This definition implies that the language has abstraction. I think being conjunctive and universalizing are the most basic conditions for a 'logic' to be a logic. Note that variables are optional, variable-free style logics can also deal with generality. Indeed variables do not seem directly relevant in our very abstract approach. Implication might be characterized as given $s,t \in S_0$ there is computably determined expression $Imp(s,t) \in S_0$ such that $ D(t) \subset D(\{s, Imp (s,t)\})$ and perhaps also: if $t \in D(s)$ then $ Imp(s,t) \in D(\emptyset)$. What is 'negation' ? We view negation as attaching to each sentence $s$ a subset $\dagger s \subset S_0$ of sentences which are interpreted as sentences immediately contradictory to $s$. All the elements of $\dagger s$ must be mutually derivable and if we consider $\dagger (\dagger s) = \{\dagger t : t \in \dagger s\}$ then $s$ itself must be in $\dagger (\dagger s)$. We can now define general versions of consistency, paraconsistency and compactness. We also obtain the famous tetralemma, all four possibilities can occur for a general logic. We can classify rules as anabolic, katabolic or mixed. Anabolic rules are when the application of the rule contains the arguments as substrings (this needs to be adapted slightly for sequents). Katabolic is when the result of the rule is a substring of an argument. Mixed when neither. A rule is uniformly determined when given the result of its application to concrete values its value on any argument can be obtained via a process of substitution on this concrete instance. Uniform rules are logical rules that can be given in standard table form via syntactic metavariables (and provisos). We can go more and more in the concrete direction until arriving at the concepts usually studied in philosophical logic. What about definitions of terms, that is, elements of $t \in S_i$ for $i> 0$. If we have some generalized concept of equality between elements of $S_i$ then things will be easier. For then we can convert this into a problem for $S_0$, into 'what is means to be $t$', finding a sentence which is 'satisfied' only by $t$ up to equality. But otherwise we can consider the set $C(t)$ of elements of $D(\emptyset)$ in which $t$ occurs. Then a type 1 definition is a small finite subset $\Delta(t)$ of $C(t)$ such that $C(t) \subset D(\Delta(t))$. A type 2 definition is a weakening of the previous type. It is a 'brief description' of $t$ giving some salient features of $t$ so that 'most' of $C(t)$ can be deduced from these. It is interesting to compare this to reconstructions of the Stoic theories of definition.

Text and meaning

 Texts are strings $s \in T^*$  of tokens $T$.  But there are different possible choices of tokens corresponding to different choices of 'scale'.  Thus a set of tokens $T_2$ refines a set $T_1$ if we have a decomposition map \[ t: T_1 \rightarrow T_2^+\] which induces a map\[  t^* : T_1^* \rightarrow T_2^*\] For each set of tokens $T$ we denote by $S_T \subset T^*$ the corresponding set of texts. Now $S_T$ comes with  natural partially defined operations: \[ \otimes_i : S_T \times S_T \rightarrow S_T \]
\[  \oplus_i : S_T \rightarrow S_T \times S_T\] for $i \in O$ which we call the anabolic and katabolic operations respectively. These include   for instance the basic operation of combining sentences or breaking a sequence of two sentences into the corresponding sentences.  If we consider a decomposition of the tokens then more phonetic or morphological type operations may become available. Now consider a text $s \in S_T$. Consider a multi-sorted first-order language with equality  including Peano Arithmetic with a further predicate describing the enumeration of elements of $T^*$. Se use the standard notation $s(n) = t$ for a text $s$ and token $t$.
Then for a given text $s$ we can consider the set $ \mathbb{T}(s)$ of all sentences that hold in $s$ considered as a first-order model. This is done by considering as the underlying set $|s|$ of the model the tokens occurring in $s$ as well as the standard model for the natural numbers.  Likewise we can consider $\mathbb{T}(S_T)$ of all sentences holding in all the texts $s \in S_T$. We can also consider formulas with a fixed number of free variables (let us say all of sort $T$) $\mathbb{L}^n$ and consider the subset $\mathbb{V}^n(s)$ elements of $|s|^ n$ that satisfy this formula. Since the part of the model corresponding to this sort is finite we can also assign probability distributions to each $\phi(t_1,...,t_n)$ in $\mathbb{L}^n$. Bigram models are a special case of this. Likewise we can consider formulas and consider satisfaction in a whole set of texts.  Thus the 'information' contained in a set of texts $S_T$ can be described as an assignment of probability distributions on $|s| ^n$ to formulas of $\mathbb{L}^n$. We propose that these ideas be used for a new version of transformers.But what about meaning ?  Meaning depends on the sum-total of interlocutor's past conscious experience (since birth).  The temporal unit of the reception of  meaning is conditioned by an atomic interval of conscious experience or attention which in general is capable of encompassing numerous tokens.  There is also a theory that there is a space of meanings $M$ the elements of which can be constructed via several operations from  a set of primitive meanings. Also that any meaning can be decomposed in a canonical way into such operations and primitives.  The problem is describing the correspondence between texts and meanings and in particular the semantic counterpart of the operations $\otimes_i$ and $\oplus_i$.  This correspondence is in general given by a pair $(\mathcal{M}, \mathcal{T})$\[ \mathcal{M} : S_T \longrightarrow \mathbf{P} M\]
\[ \mathcal{T} : M \longrightarrow \mathbf{P} S_T\]A given text can have more than one interpretation/meaning. A given meaning can be expressed in a text in more than one way. But the above remark suggests that the meaning(s) of a text must depend also on an additional parameter space $\Psi$ of states of consciousness, for instance\[ \mathcal{M}' : S_T \times \Psi \longrightarrow \mathbf{P} M\]The dogma of compositionality states that the meaning of a $s \otimes_i s'$ can somehow be canonically constructed from the meanings of $s$ and $s_i$.
The dogma of sequentialism states takes compositionality only at the level of elementary tokens considered as meaning-bearers. The meaning of each token depends both on itself and its entire previous context in the text (and indeed alternatively in a similar situation in all texts in $S_T$).\[  \mathcal{M} (s' \star s) =  F (s', s)\]Self-attention can be adapted to characterize how $F$ operates on the context. In a previous note we proposed that we think of meaning-units as provided with transmitters and receptors (think of biochemistry or the $\pi$-calculus). So each token in the context can send out signals to subsequent tokens like 'I am a verb looking for an object' or 'I am a direct object, where is my verb ?'.But a better way to think of this is that the 'meaning' of each token will in general depend on whole set of tokens both previous and subsequent. We call this set the relevance context. This gives rise to great computational complexity as a text will give rise to a graph. If we consider the problem of polysemy and idiomatic expressions then it is clear that the enveloping context (both back and forward) is crucial to determine meaning. Perhaps the probability distributions associated to formulas mentioned above could be a candidate for a rough formal treatment of meanings ?

Quodlibet

 1. René Thom called quantum mechanics 'the greatest intellectual scandal of the 20th century'. Maybe this was too harsh, but quantu...