<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Phys.</journal-id>
<journal-title>Frontiers in Physics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Phys.</abbrev-journal-title>
<issn pub-type="epub">2296-424X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">671882</article-id>
<article-id pub-id-type="doi">10.3389/fphy.2021.671882</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Physics</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Complexity and Entropy in Legal Language</article-title>
<alt-title alt-title-type="left-running-head">Friedrich</alt-title>
<alt-title alt-title-type="right-running-head">Complexity-Entropy in Legal Language</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Friedrich</surname>
<given-names>Roland</given-names>
</name>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1086529/overview"/>
</contrib>
</contrib-group>
<aff>ETH Zurich, D-GESS, <addr-line>Zurich</addr-line>, <country>Switzerland</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1002704/overview">Pierpaolo Vivo</ext-link>, King&#x2019;s College London, United&#x20;Kingdom</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/877345/overview">Eric DeGiuli</ext-link>, Ryerson University, Canada</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1250824/overview">Alessandro Vezzani</ext-link>, National Research Council (CNR), Italy</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Roland Friedrich, <email>roland.friedrich@gess.ethz.ch</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Social Physics, a section of the journal Frontiers in Physics</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>04</day>
<month>06</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>9</volume>
<elocation-id>671882</elocation-id>
<history>
<date date-type="received">
<day>24</day>
<month>02</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>03</day>
<month>05</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Friedrich.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Friedrich</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>We study the language of legal codes from different countries and legal traditions, using concepts from physics, algorithmic complexity theory and information theory. We show that vocabulary entropy, which measures the diversity of the author&#x2019;s choice of words, in combination with the compression factor, which is derived from a lossless compression algorithm and measures the redundancy present in a text, is well suited for separating different writing styles in different languages, in particular also legal language. We show that different types of (legal) text, e.g. acts, regulations or literature, are located in distinct regions of the complexity-entropy plane, spanned by the information and complexity measure. This two-dimensional approach already gives new insights into the drafting style and structure of statutory texts and complements other methods.</p>
</abstract>
<kwd-group>
<kwd>information theory</kwd>
<kwd>complex systems</kwd>
<kwd>linguistics</kwd>
<kwd>legal theory</kwd>
<kwd>algorithmic complexity theory</kwd>
<kwd>lossless compression algorithms</kwd>
<kwd>Shannon entropy</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>The complexity of the law has been the topic of both scholarly writing and scientific investigation, with the main challenge being the proper definition of &#x201c;complexity&#x201d;. Historically, articles in law journals took a conceptual and non-technical approach toward the &#x201c;complexity of the law&#x201d;, motivated by practical reasons, such as the ever increasing amount of legislation produced every year and the resulting cost of knowledge acquisition, e.g. [<xref ref-type="bibr" rid="B1">1</xref>, <xref ref-type="bibr" rid="B2">2</xref>]. Although this approach is important, it remains technically vague and not accessible to quantitative analysis and measurement. Over the past decade, with the increasing availability of digitized (legal) data and the steady growth of computational power, a new type of literature has emerged within legal theory, the authors of which use various mathematical notions that come from areas as diverse as physics and information theory or graph theory, to analyze the complexity of the law, cf. e.g. [<xref ref-type="bibr" rid="B3">3</xref>&#x2013;<xref ref-type="bibr" rid="B5">5</xref>]. The complexity considered results mainly from the exogenous structure of the positive law, i.e. the tree-like hierarchical organization of the legal texts in a forest consisting of codes (root nodes), chapters, sections, etc., but also from the associated reference network.</p>
<p>According to the dichotomy introduced by [<xref ref-type="bibr" rid="B6">6</xref>]; one can distinguish between structure-based measures and content-based measures of complexity, with the former pertaining to the field of knowledge representation (knowledge engineering) and the latter relating to the complexity of the norms, which includes, e.g. the (certainty of) legal commands, their efficiency and socio-economic impact.</p>
<p>In this article, we advance the measurement of legal complexity by focusing on the language using a method originating in the physics literature, cf. [<xref ref-type="bibr" rid="B7">7</xref>]. So, we map legal documents from several major legal systems into a two-dimensional complexity-entropy plane, spanned by the (normalized) vocabulary entropy and the compression factor, cf. <xref ref-type="sec" rid="s2-1">Section 2.1</xref>. Using an abstract and rigorous measurement of the complexity of the law, should have significant practical benefits for policy, as discussed previously by, e.g. [<xref ref-type="bibr" rid="B1">1</xref>, <xref ref-type="bibr" rid="B2">2</xref>]. For example, it could potentially identify parts of the law that need to be rewritten in order to remain manageable, thereby reducing the costs for citizens and firms who are supposed to comply. Most notably, the French Constitutional Court has ruled that articles of unjustified &#x201c;excessive-complexity&#x201d; are unconstitutional<xref ref-type="fn" rid="fn1">
<sup>1</sup>
</xref>. However, in&#x20;order to render the notion of &#x201c;excessive complexity&#x201d; functional, quantitative methods are needed such as those used by [<xref ref-type="bibr" rid="B5">5</xref>, <xref ref-type="bibr" rid="B8">8</xref>]; and which our version of the complexity-entropy plane ideally complements.</p>
</sec>
<sec id="s2">
<title>2 Complexity and Entropy</title>
<p>A non-trivial question that arises in several disciplines is how the complexity of a hierarchical structure, i.e. of a multi-scale object, can be measured. Different areas of human knowledge are coded as written texts that are organized hierarchically, e.g. each book&#x2019;s Table of Contents reflects its inherent hierarchical organization as a tree, and all books together form a forest. Furthermore, a tree-like structure appears again at the sentence level in the form of the syntax tree and its semantics as an additional degree of freedom. Although various measures of complexity have been introduced that are specially adapted to a particular class of problems, there is still no unified theory. The first concept we consider is Shannon entropy, [<xref ref-type="bibr" rid="B9">9</xref>]; which is a measure of information. It is an observable on the space of probability distributions with values in the non-negative real numbers. For a discrete probability distribution <inline-formula id="inf1">
<mml:math id="m1">
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mo>:</mml:mo>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>N</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, with <inline-formula id="inf2">
<mml:math id="m2">
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x3e;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, for all <italic>i</italic>, and <inline-formula id="inf3">
<mml:math id="m3">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:msup>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mtext>&#x200b;</mml:mtext>
</mml:msup>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, the Shannon entropy <inline-formula id="inf4">
<mml:math id="m4">
<mml:mrow>
<mml:mi>H</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>P</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, is defined as:<disp-formula id="e1">
<mml:math id="m5">
<mml:mrow>
<mml:mi>H</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>P</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>:</mml:mo>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2061;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>log</mml:mi>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(1)</label>
</disp-formula>with <inline-formula id="inf5">
<mml:math id="m6">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>log</mml:mi>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, the logarithm with base 2. The normalized Shannon entropy <inline-formula id="inf6">
<mml:math id="m7">
<mml:mrow>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>P</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, is given by<disp-formula id="e2">
<mml:math id="m8">
<mml:mrow>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>P</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>:</mml:mo>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>H</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>P</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mtext>log</mml:mtext>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>N</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(2)</label>
</disp-formula>
</p>
<p>i.e. by dividing <inline-formula id="inf7">
<mml:math id="m9">
<mml:mrow>
<mml:mi>H</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>P</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> by the entropy <inline-formula id="inf8">
<mml:math id="m10">
<mml:mrow>
<mml:mi>H</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mi>N</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> of the discrete uniform distribution <inline-formula id="inf9">
<mml:math id="m11">
<mml:mrow>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mi>N</mml:mi>
</mml:msub>
<mml:mo>:</mml:mo>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, for <italic>N</italic> different outcomes. We shall use the normalized entropy in order to measure the information content of the vocabulary of individual legal texts, for details cf. <xref ref-type="sec" rid="s6-3">Section 6.3</xref>. Word entropies have previously been used by various authors. In the legal domain [<xref ref-type="bibr" rid="B5">5</xref>], calculated the word entropy, after removing stop words, for the individual Titles of the U.S. Code. [<xref ref-type="bibr" rid="B10">10</xref>] used word entropies to gauge Shakespeare&#x2019;s and Jin&#x20;Yong&#x2019;s writing capacity, based on the 100 most frequent words in each&#x20;text.</p>
<p>The second concept we consider is related to Kolmogorov complexity (cf. [<xref ref-type="bibr" rid="B11">11</xref>, <xref ref-type="bibr" rid="B12">12</xref>] and references therein), which is the prime example of algorithmic (computational) complexity. Heuristically, the complexity of an object is defined as the length of the shortest of all possible descriptions. Further fundamental examples of algorithmic complexity include Lempel-Ziv complexity <inline-formula id="inf10">
<mml:math id="m12">
<mml:mrow>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mrow>
<mml:mn>76</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, [<xref ref-type="bibr" rid="B13">13</xref>]; or Wolfram&#x2019;s complexity measure of a regular language, [<xref ref-type="bibr" rid="B14">14</xref>]. The latter is defined as (logarithm of) the minimal number of nodes of a deterministic finite automaton (DFA) that recognizes the language (Meyhill-Nerode theorem). In order to facilitate the discussion, let us propose a set of axioms for a complexity measure. This measure is basically a general form of an outer measure.</p>
<p>Let <italic>X</italic> be (at least) a monoid <inline-formula id="inf11">
<mml:math id="m13">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo>,</mml:mo>
<mml:mo>&#x2218;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>&#x3b5;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, with binary composition <inline-formula id="inf12">
<mml:math id="m14">
<mml:mrow>
<mml:mo>&#x2218;</mml:mo>
<mml:mo>:</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, and identity element <inline-formula id="inf13">
<mml:math id="m15">
<mml:mi>&#x3b5;</mml:mi>
</mml:math>
</inline-formula>, and additionally, let <inline-formula id="inf14">
<mml:math id="m16">
<mml:mo>&#x2265;</mml:mo>
</mml:math>
</inline-formula> be a partial order on&#x20;<italic>X</italic>.</p>
<p>A complexity measure <italic>C</italic> on <italic>X</italic>, is a functional <inline-formula id="inf15">
<mml:math id="m17">
<mml:mrow>
<mml:mi>C</mml:mi>
<mml:mo>:</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:msub>
<mml:mi>&#x211d;</mml:mi>
<mml:mo>&#x2b;</mml:mo>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, such that for all <inline-formula id="inf16">
<mml:math id="m18">
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, we&#x20;have</p>
<p>pointed: <disp-formula id="e3">
<mml:math id="m19">
<mml:mrow>
<mml:mi>C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3b5;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(3)</label>
</disp-formula>
</p>
<p>monotone: <disp-formula id="e4">
<mml:math id="m20">
<mml:mrow>
<mml:mtext>if</mml:mtext>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mtext>then</mml:mtext>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:mi>C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(4)</label>
</disp-formula>
</p>
<p>sub-additive: <disp-formula id="e5">
<mml:math id="m21">
<mml:mrow>
<mml:mi>C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>&#x2218;</mml:mo>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:mi>C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(5)</label>
</disp-formula>
</p>
<p>Examples satisfying the above axioms include tree structures, with the (simple) complexity measure given by the number of levels, i.e. the depth from the baseline. Then the empty tree has zero complexity, the partial order being given by being a sub-tree and composition being given by grafting trees. Further, the Lempel-Ziv complexity <inline-formula id="inf17">
<mml:math id="m22">
<mml:mrow>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mrow>
<mml:mn>76</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, and Wolfram&#x2019;s complexity measure for regular languages, if slightly differently defined via recognizable series, satisfy the axioms. However, plain Kolmogorov complexity does not satisfy, e.g. sub-additivity, cf. the discussion by&#x20;[<xref ref-type="bibr" rid="B12">12</xref>].</p>
<sec id="s2-1">
<title>2.1 Compression Factor</title>
<p>A derived complexity measure is the compression factor, which we consider next, and which is obtained from a lossless compression algorithm, such as, [<xref ref-type="bibr" rid="B15">15</xref>,&#x20;<xref ref-type="bibr" rid="B16">16</xref>].</p>
<p>A lossless compression algorithm, i.e. a compressor &#x3b3;, reversibly transforms an input string <italic>s</italic> into a sequence <inline-formula id="inf18">
<mml:math id="m23">
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> which is shorter than the original one, i.e. <inline-formula id="inf19">
<mml:math id="m24">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mo>&#x2264;</mml:mo>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, but contains exactly the same information as <italic>s</italic>, cf. e.g. [<xref ref-type="bibr" rid="B17">17</xref>,&#x20;<xref ref-type="bibr" rid="B18">18</xref>].</p>
<p>For a string <italic>s</italic>, the compression factor <inline-formula id="inf20">
<mml:math id="m25">
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, is defined as<disp-formula id="e6">
<mml:math id="m26">
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>:</mml:mo>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(6)</label>
</disp-formula>
</p>
<p>The inverse <inline-formula id="inf21">
<mml:math id="m27">
<mml:mrow>
<mml:msup>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>, is called the compression ratio. These derived complexity measures quantify the relative amount of redundancy or structure present in a string, or more generally&#x20;data.</p>
<p>The compression factor, as the entropy rate, is a relative quantity which permits to directly compare individual data items, independently of their&#x20;size.</p>
<p>Let us illustrate this for the Lempel-Ziv complexity measure <inline-formula id="inf22">
<mml:math id="m28">
<mml:mrow>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mrow>
<mml:mn>76</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, cf. [<xref ref-type="bibr" rid="B13">13</xref>]; and the following strings of length 20:<disp-formula id="equ1">
<mml:math id="m29">
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>:</mml:mo>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>00000000000000000000</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula id="equ2">
<mml:math id="m30">
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>:</mml:mo>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>01010101010101010101</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula id="equ3">
<mml:math id="m31">
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mn>3</mml:mn>
</mml:msub>
<mml:mo>:</mml:mo>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>01001010100110101101</mml:mn>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>Then we have <inline-formula id="inf23">
<mml:math id="m32">
<mml:mrow>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mrow>
<mml:mn>76</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf24">
<mml:math id="m33">
<mml:mrow>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mrow>
<mml:mn>76</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf25">
<mml:math id="m34">
<mml:mrow>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mrow>
<mml:mn>76</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mn>3</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>7</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, from which one immediately obtains the respective compression factors. [<xref ref-type="bibr" rid="B19">19</xref>]; showed that a generic string of length <italic>n</italic> has complexity close to <italic>n</italic>, i.e. it is &#x201c;random&#x201d;, however the meaningful strings for humans, i.e. representing text, images etc., are not random and have a structure between the completely uniform and the random string, cf. [<xref ref-type="bibr" rid="B18">18</xref>, <xref ref-type="bibr" rid="B19">19</xref>]. [<xref ref-type="bibr" rid="B20">20</xref>] introduced a quantity related to the compression factor, called the &#x201c;computable information density&#x201d;, which is a measure of order and correlation in (physical) systems in and out of equilibrium. Compression factors (ratios) were previously used by [<xref ref-type="bibr" rid="B21">21</xref>]; who measured the complexity of mulitple languages by compressing texts and their shuffled versions to measure the inherent linguistic order. [<xref ref-type="bibr" rid="B22">22</xref>]; additionally to a neural language model, utilized compression ratios to measure the complexity of the language used by the Supreme Courts of the U.S. (USSC) and Germany (BGH). [<xref ref-type="bibr" rid="B23">23</xref>]; using the Lempel-Ziv complexity measure <inline-formula id="inf26">
<mml:math id="m35">
<mml:mrow>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mrow>
<mml:mn>76</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, took into account not only the order inherent in a grammatically correct sentence, but also the larger organization of a text document, e.g. sections, by selectively shuffling the data belonging to each level of the hierarchy.</p>
</sec>
</sec>
<sec id="s3">
<title>3 Some Remarks on Complexity, Entropy and Language</title>
<p>[<xref ref-type="bibr" rid="B24">24</xref>] (pp. 10&#x2013;11) intuitively describe the broad difference between classical information theory and algorithmic complexity, which we summarize next. Whereas information theory (entropy), as conceived by Shannon, determines the minimal number of bits needed to transmit a set of messages, it does not provide the number of bits necessary to transmit a particular message from the set. Kolmogorov complexity on the other hand, focuses on the information content of an individual finite object, e.g. a play by Shakespeare, accounting for the (empirical) fact that strings which are meaningful to humans, are compressible, cf. [<xref ref-type="bibr" rid="B19">19</xref>]. In order to relate entropy, Kolmogorov complexity or Ziv-Lempel compression to one another, various mathematical assumptions such as stationarity, ergodicity or infinity are required, cf. [<xref ref-type="bibr" rid="B11">11</xref>, <xref ref-type="bibr" rid="B17">17</xref>, <xref ref-type="bibr" rid="B25">25</xref>]. Also, the convergence of various quantities found in natural languages, e.g. entropy estimates, [<xref ref-type="bibr" rid="B26">26</xref>]; are based on some of these assumptions. Despite the fact that the different approximations and assumptions proved valuable for language models, natural language is not necessarily generated by a stationary ergodic process, cf. [<xref ref-type="bibr" rid="B11">11</xref>]; as e.g., cf. [<xref ref-type="bibr" rid="B25">25</xref>]; the probability of upcoming words can depend on words which are far away. But, as argued by [<xref ref-type="bibr" rid="B27">27</xref>]; it is precisely due to the non-ergodic nature of natural language that one can empirically distinguish different topics, e.g. by determining the uneven distribution of keywords in texts, cf. also [<xref ref-type="bibr" rid="B28">28</xref>]. [<xref ref-type="bibr" rid="B29">29</xref>] considered a model of a random languages and showed how structure emerges as a result of the competition between energy and entropy.</p>
<p>Finally, let us comment on the relation between relative frequencies and probabilities in the context of entropy. Given a standard <italic>n</italic>-simplex, <inline-formula id="inf27">
<mml:math id="m36">
<mml:mrow>
<mml:msub>
<mml:mi>&#x394;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, i.e. <inline-formula id="inf28">
<mml:math id="m37">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>0</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi>&#x211d;</mml:mi>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf29">
<mml:math id="m38">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:msup>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mtext>&#x200b;</mml:mtext>
</mml:msup>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mi>n</mml:mi>
</mml:msubsup>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, and <inline-formula id="inf30">
<mml:math id="m39">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2265;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, for <inline-formula id="inf31">
<mml:math id="m40">
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, its points can either be interpreted as discrete probability distributions on <inline-formula id="inf32">
<mml:math id="m41">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> elements or as the set of relative frequencies of <inline-formula id="inf33">
<mml:math id="m42">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> elements. The distinction between the two concepts is relevant as the Shannon entropy <italic>H</italic>, provides in both cases a functional (observable) <inline-formula id="inf34">
<mml:math id="m43">
<mml:mrow>
<mml:mi>H</mml:mi>
<mml:mo>:</mml:mo>
<mml:msub>
<mml:mi>&#x394;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
<mml:mo>&#x2192;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold">&#x211d;</mml:mi>
<mml:mi>&#x002B;</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> which, in our context, has two possible interpretations. Namely, as a component of a coordinate system on (law) texts, which is the interpretation in the present study, but also as an estimate of the Shannon entropy of the language used if considered as a sample from the space of all (law) texts of a certain type. In the latter case, it is known that the &#x201c;naive&#x201d; estimation of the Shannon entropy <xref ref-type="disp-formula" rid="e1">Eq. 1</xref> from finite samples is biased. Therefore, several estimators have been developed to solve this problem. We utilize the entropy estimator introduced by [<xref ref-type="bibr" rid="B30">30</xref>]; in order to reexamine some of our results in the light of a probabilistic interpretation, and find that it has no qualitative effect on the outcome, cf. <xref ref-type="sec" rid="s13">Supplementary Material</xref>.</p>
</sec>
<sec id="s4">
<title>4 The Complexity-Entropy Plane</title>
<p>Complex systems, e.g. biological, physical or social ones, are high-dimensional multi-scale objects. [<xref ref-type="bibr" rid="B31">31</xref>]; and [<xref ref-type="bibr" rid="B32">32</xref>] realized that in order to describe them, entropy is not enough, and an independent complexity measure is needed. Guided by the insight that the intuitive notion of complexity for patterns, when ordered by the degree of disorder, is at odds with its algorithmic description, the notion of the physical complexity of a system emerged, cf. [<xref ref-type="bibr" rid="B7">7</xref>, <xref ref-type="bibr" rid="B31">31</xref>, <xref ref-type="bibr" rid="B33">33</xref>]. The corresponding physical complexity measure, pioneered by [<xref ref-type="bibr" rid="B33">33</xref>]; should not be a monotone function of the disorder or the entropy, but should attain its maximum between complete order (perfect crystal) and total disorder (isolated ideal gas). [<xref ref-type="bibr" rid="B7">7</xref>]; introduced the excess Shannon entropy as a statistical complexity measure for physical systems, and later [<xref ref-type="bibr" rid="B34">34</xref>] introduced another physical complexity measure, the product of a system&#x2019;s entropy with its&#x20;disequilibrium measure. [<xref ref-type="bibr" rid="B35">35</xref>]; introduced a novel approach to handle the complexity of patterns on multiple scales using a multi-level renormalization technique to quantify the complexity of a (two- or three-dimensional) pattern by a scalar quantity that should ultimately better fit the intuitive notion of complexity.</p>
<p>[<xref ref-type="bibr" rid="B7">7</xref>]; paired both the entropy and the physical complexity measure into what has become a complexity-entropy diagram, in order to describe non-linear dynamical systems; for a review cf. [<xref ref-type="bibr" rid="B36">36</xref>]. Remarkably, these low-dimensional coordinates are often sufficient to characterize such systems (in analogy to principal component analysis), since they capture the inherent randomness, but also the degree of organization. Several variants of entropy-complexity diagrams are now widely used, even outside the original context. [<xref ref-type="bibr" rid="B37">37</xref>]; by combining the normalized word entropy, cf. <xref ref-type="disp-formula" rid="e7">Eq. 7</xref>, with a version of a statistical complexity measure, quantitatively study Shakespeare and other English Renaissance authors. [<xref ref-type="bibr" rid="B23">23</xref>]; used for the complexity-entropy plane the entropy rate and the entropy density and studied the organization of literary texts (Shakespeare, Abbott and Doyle) at different levels of the hierarchy. In order to calculate the entropy rate and density, which are asymptotic quantities, they used the Lempel-Ziv complexity <inline-formula id="inf35">
<mml:math id="m44">
<mml:mrow>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mrow>
<mml:mn>76</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. Strictly speaking this approach would require the source to be stationary and ergodic, cf.&#x20;[<xref ref-type="bibr" rid="B11">11</xref>].</p>
<p>We introduce a new variant <inline-formula id="inf36">
<mml:math id="m45">
<mml:mi>&#x393;</mml:mi>
</mml:math>
</inline-formula> of the complexity-entropy plane, spanned by the normalized word entropy and the compression factor, in order to study text data. So, every text <italic>t</italic>, can be represented by a point in <inline-formula id="inf37">
<mml:math id="m46">
<mml:mi>&#x393;</mml:mi>
</mml:math>
</inline-formula>, via the map <inline-formula id="inf38">
<mml:math id="m47">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x21a6;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, with coordinates <inline-formula id="inf39">
<mml:math id="m48">
<mml:mrow>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, the normalized Shannon entropy of the underlying vocabulary, and <italic>r</italic>, the compression factor. Let us note, that <inline-formula id="inf40">
<mml:math id="m49">
<mml:mi>&#x393;</mml:mi>
</mml:math>
</inline-formula> is naturally a metric space, e.g. with the Euclidean metric, but other metrics may be more appropriate, depending on the particular question at&#x20;hand.</p>
</sec>
<sec id="s5">
<title>5 The Norm Hierarchy and Boundaries of Natural Language</title>
<p>Let us now motivate some of our research questions from the perspective of Legal Theory.</p>
<p>[<xref ref-type="bibr" rid="B38">38</xref>] and his school introduced and formalized the notion of the &#x201c;Stufenbau der Rechtsordnung&#x201d;,<xref ref-type="fn" rid="fn2">
<sup>2</sup>
</xref> which led to the concept of the hierarchy of norms. The hierarchy starts with the Constitution (often originating in a revolutionary charter written by the &#x201c;pouvoir constituant&#x201d;), which governs the creation of statutes or acts, which themselves govern the creation (by delegation) of regulations, administrative actions, and also the judiciary. At the national level these (abstract) concepts are taken into account, e.g. Guide de l&#xe9;gistique [<xref ref-type="bibr" rid="B39">39</xref>]; when drafting positive law. This is valid for, e.g. Austria, France, Germany, Italy, Switzerland and the European Union, although strictly speaking, it does not have a formal Constitution. Every new piece of legislation has to fit the preexisting order, so at each level, the content outlined at an upper level, has to be made more precise, which leads to the supposed linguistic gradient of abstraction. A new phenomenon can be observed for regulations, namely that the legislature, or more precisely its drafting agencies, is being forced to abandon the realm of natural language and take an approach that is common to all scientific writing, namely the inclusion of images, figures and formulae. The purpose of figures, tables and formulae is not only the ability to succinctly visualize or summarize large amounts of abstract information, but most often it is the only mean to convey complex scientific information at all. As regulations increasingly leave the domain of jurisprudence, novel methods should be adopted. For example [<xref ref-type="bibr" rid="B2">2</xref>], advocated the inclusion of mathematical formulae in a statute if this statue contains a computation that is based on this formula. Ultimately, a natural scientific approach (including the writing style) to law would be beneficial, however, this might be at odds with the idea of law being intelligible to a wide audience.</p>
<p>Our hypothesis is that these functional differences between the levels of the hierarchy of legal norms should manifest themselves as differences in vocabulary entropy or in the compression factor.</p>
</sec>
<sec sec-type="materials|methods" id="s6">
<title>6 Materials and Methods</title>
<sec id="s6-1">
<title>6.1 Data</title>
<p>Our analysis is based on the valid (in effect) and online available national codes from Canada, Germany, France, Switzerland, the United&#x20;States, Great Britain and Shakespeare&#x2019;s collected works, for a summary statistics, cf. <xref ref-type="table" rid="T1">Table&#x20;1</xref>. We also included the online available constitutions of Canada, Germany, and Switzerland in the analysis, cf. <xref ref-type="table" rid="T2">Table&#x20;2</xref>. In addition, we use the online available German EuroParl corpus from [<xref ref-type="bibr" rid="B40">40</xref>] and its aligned English and French translations (proceedings of the European Parliament from 1996 to 2006) to measure language-specific effects for German, English and French.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Summary statistics on acts, regulations and English literature showing the language used and size (in MB) of the respective corpora, the number of items, the mean size (in KB) and the standard deviation.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Corpus (language)</th>
<th align="center">Size [MB]</th>
<th align="center">
<inline-formula id="inf41">
<mml:math id="m50">
<mml:mo>&#x23;</mml:mo>
</mml:math>
</inline-formula> Texts</th>
<th align="center">Mean (size) [KB]</th>
<th align="center">Std (size)</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">CA acts (EN)</td>
<td align="char" char=".">52.4</td>
<td align="char" char=".">823</td>
<td align="char" char=".">63.6</td>
<td align="char" char=".">254.7</td>
</tr>
<tr>
<td align="left">CA reg. (EN)</td>
<td align="char" char=".">55.6</td>
<td align="char" char=".">3,725</td>
<td align="char" char=".">14.9</td>
<td align="char" char=".">59.7</td>
</tr>
<tr>
<td align="left">CA acts (FR)</td>
<td align="char" char=".">56.9</td>
<td align="char" char=".">833</td>
<td align="char" char=".">64.6</td>
<td align="char" char=".">264.5</td>
</tr>
<tr>
<td align="left">CA reg. (FR)</td>
<td align="char" char=".">62.4</td>
<td align="char" char=".">3,718</td>
<td align="char" char=".">15.9</td>
<td align="char" char=".">64.5</td>
</tr>
<tr>
<td align="left">F codes (FR)</td>
<td align="char" char=".">127.6</td>
<td align="char" char=".">74</td>
<td align="char" char=".">1664.0</td>
<td align="char" char=".">2275.8</td>
</tr>
<tr>
<td align="left">D acts (DE)</td>
<td align="char" char=".">53.6</td>
<td align="char" char=".">1,306</td>
<td align="char" char=".">40.3</td>
<td align="char" char=".">108.3</td>
</tr>
<tr>
<td align="left">D reg. (DE)</td>
<td align="char" char=".">69.6</td>
<td align="char" char=".">3,316</td>
<td align="char" char=".">20.6</td>
<td align="char" char=".">61.5</td>
</tr>
<tr>
<td align="left">United&#x20;Kingdom PGA (EN)</td>
<td align="char" char=".">269.5</td>
<td align="char" char=".">3,512</td>
<td align="char" char=".">76.3</td>
<td align="char" char=".">192.7</td>
</tr>
<tr>
<td align="left">USC 1&#x2013;54 (2020) (EN)</td>
<td align="char" char=".">139.6</td>
<td align="char" char=".">57</td>
<td align="char" char=".">2442.6</td>
<td align="char" char=".">3835.6</td>
</tr>
<tr>
<td align="left">U.S. CFR (2000) (EN)</td>
<td align="char" char=".">940.2</td>
<td align="char" char=".">200</td>
<td align="char" char=".">4701.9</td>
<td align="char" char=".">8156.2</td>
</tr>
<tr>
<td align="left">U.S. CFR (2019) (EN)</td>
<td align="char" char=".">572.9</td>
<td align="char" char=".">242</td>
<td align="char" char=".">2360.9</td>
<td align="char" char=".">1079.7</td>
</tr>
<tr>
<td align="left">CH acts (EN)</td>
<td align="char" char=".">7.0</td>
<td align="char" char=".">103</td>
<td align="char" char=".">343.2</td>
<td align="char" char=".">286.6</td>
</tr>
<tr>
<td align="left">CH reg. (EN)</td>
<td align="char" char=".">6.3</td>
<td align="char" char=".">118</td>
<td align="char" char=".">53.4</td>
<td align="char" char=".">58.3</td>
</tr>
<tr>
<td align="left">Shakespeare (EN)</td>
<td align="char" char=".">5.2</td>
<td align="char" char=".">42</td>
<td align="char" char=".">124.9</td>
<td align="char" char=".">32.0</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Summary statistics for the Constitutions of Canada (EN), Germany (DE), Switzerland (DE,EN,FR), showing the language used, the original size (in KB), the compression factor and the normalized vocabulary entropy (after cutoff at 150&#xa0;K).</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Corpus (language)</th>
<th align="center">Size [KB]</th>
<th align="center">Comp. Factor</th>
<th align="center">n-voc. Entropy</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">CH constitution (DE)</td>
<td align="char" char=".">156</td>
<td align="char" char=".">3.74</td>
<td align="char" char=".">0.79</td>
</tr>
<tr>
<td align="left">CH constitution (EN)</td>
<td align="char" char=".">157</td>
<td align="char" char=".">3.88</td>
<td align="char" char=".">0.77</td>
</tr>
<tr>
<td align="left">CH constitution (FR)</td>
<td align="char" char=".">172</td>
<td align="char" char=".">3.80</td>
<td align="char" char=".">0.77</td>
</tr>
<tr>
<td align="left">Ca constitution (EN)</td>
<td align="char" char=".">215</td>
<td align="char" char=".">3.67</td>
<td align="char" char=".">0.75</td>
</tr>
<tr>
<td align="left">D Grundgesetz (DE)</td>
<td align="char" char=".">180</td>
<td align="char" char=".">3.57</td>
<td align="char" char=".">0.79</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In detail, we use all Consolidated Canadian Acts and Regulations in English and French (2020); all Federal German acts (Gesetze) and Federal regulations (Verordnungen) in German (2020); all French Codes (en vigueur) (2020); all Swiss Internal Laws (Acts and Ordinances) which have been translated into English, containing the following areas: 1 State - People - Authorities; 2 Private law - Administration of civil justice - Enforcement; Criminal law - Administration of criminal justice - Execution of sentences; 4 Education - Science - Culture; 5 National defense; 6 Finance; 7 Public works - Energy - Transport; 8 Health - Employment - Social security; 9 Economy - Technical cooperation (2020); the United&#x20;Kingdom Public General Acts (partial dataset 1801&#x2013;1987 and complete dataset 1988&#x2013;2020); U.S. Code Titles 1&#x2013;54 (Title 53 is reserved, including the appendices) (2020); U.S. Code of Federal Regulations for (2000) and (2019).</p>
<p>The collected works of Shakespeare are obtained from &#x201c;The Folger Shakespeare - Complete Set, June 2, 2020&#x201d;, <ext-link ext-link-type="uri" xlink:href="https://shakespeare.folger.edu/download/">https://shakespeare.folger.edu/download/</ext-link>
</p>
</sec>
<sec id="s6-2">
<title>6.2&#x20;Pre-Processing</title>
<p>For our analysis we use Python 3.7. If available, we downloaded the bulk data as XML-files, from which we extracted the legal content (without any metadata), and saved it as a TXT-file, after removing multiple white spaces or line breaks. If no XML-files were available, we extracted the texts from the PDF versions, removed multiple white spaces or line breaks, and saved it as TXT-files.</p>
</sec>
<sec id="s6-3">
<title>6.3 Measuring Vocabulary Entropy</title>
<p>For an individual text <italic>t</italic>, let <inline-formula id="inf42">
<mml:math id="m51">
<mml:mrow>
<mml:mi>V</mml:mi>
<mml:mo>:</mml:mo>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>:</mml:mo>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, be the underlying vocabulary, and <inline-formula id="inf43">
<mml:math id="m52">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> the size of <italic>V</italic>. Let <inline-formula id="inf44">
<mml:math id="m53">
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> be the frequency (total number of occurrences) of a unique word <inline-formula id="inf45">
<mml:math id="m54">
<mml:mrow>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, and let <inline-formula id="inf46">
<mml:math id="m55">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> be the total number of words in <italic>t</italic> (with repetitions), i.e. <inline-formula id="inf47">
<mml:math id="m56">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>. The relative frequency is given by <inline-formula id="inf48">
<mml:math id="m57">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>p</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>:</mml:mo>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, which can also be interpreted as the empirical probability distribution <inline-formula id="inf49">
<mml:math id="m58">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>p</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. The word entropy <inline-formula id="inf50">
<mml:math id="m59">
<mml:mrow>
<mml:mi>H</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> of a text <italic>t</italic> (but cf. <xref ref-type="sec" rid="s3">Section 3</xref>), is then given by<disp-formula id="e7">
<mml:math id="m60">
<mml:mrow>
<mml:mi>H</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>:</mml:mo>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:munderover>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>p</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2061;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>log</mml:mi>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>p</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(7)</label>
</disp-formula>
</p>
<p>and correspondingly, the normalized word entropy <inline-formula id="inf51">
<mml:math id="m61">
<mml:mrow>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, cf. <xref ref-type="disp-formula" rid="e2">Eq. 2</xref>. Let us remark, that the word entropy is invariant under permutation of the words in a sentence.</p>
<p>First we read the individual TXT-files, then filter the punctuation or special characters out and then split the remaining text into a list of items. In order to account for prefixes in French, the splitting separates expressions which are written with an apostrophe into separate entities. However, we do not lowercase letters, lemmatize or stem the remaining text, nor do we consider any bi- or trigrams. Keeping the original case-sensitivity, allows us to capture some syntactic or semantic information. Then we determine the relative frequencies (empirical probability values) of all unique items, from which we calculate the normalized entropy values according to <xref ref-type="disp-formula" rid="e2">Eq. 2</xref>. We truncate each text file at 150,000 characters, and discard files which are smaller than the cutoff value. For the EuroParl corpus we sampled 400 strings, consisting of 150&#xa0;K characters each (with a gap of 300&#xa0;K characters between consecutive strings) from the English, German and French texts, in order to calculate the corresponding normalized vocabulary entropy.</p>
</sec>
<sec id="s6-4">
<title>6.4 Measuring Compression Factors Using Gzip</title>
<p>In order to compute the compression factor as our derived complexity measure, we use as lossless compressor gzip.<xref ref-type="fn" rid="fn3">
<sup>3</sup>
</xref> After reading the individual TXT-files as strings, we compress them using Python&#x2019;s gzip compression module, with the compression level set to its maximum value (&#x3d; 9). The individual compression factors are calculated according to <xref ref-type="disp-formula" rid="e6">Eq. 6</xref>. After analyzing all of our data, we choose 150,000 characters as the cutoff in order to minimize the effects of the overhead generated by the compression algorithm for very small text sizes. For the EuroParl corpora (English, French, German), we calculated the compression factors based on 400 samples each, as described above. Note that in the future it might make sense to also consider other (e.g. language specific) lossless compression algorithms in order to deal with short strings.</p>
</sec>
</sec>
<sec sec-type="results" id="s7">
<title>7 Results</title>
<p>Our first analysis, cf. <xref ref-type="table" rid="T1">Table&#x20;1</xref>, is a summary of the sizes of the different corpora, the languages used, the number of individual items, the mean text sizes and standard deviations. The analysis shows different approaches to the organization of national law, namely either by thousands of small texts of around 50&#xa0;KB (Canada, Germany, United&#x20;Kingdom) or less than a hundred large codes, several MB in size (France, United&#x20;States), with the regulations significantly exceeding the number of acts. Note that the French codes contain both the law and the corresponding regulation in the same text. The size of a corpus within the same category, i.e. act or regulation, differs from country to country by an order of magnitude or even two, which is noteworthy as broadly similar or even identical areas are regulated within the law, e.g. banking, criminal, finance or tax law. This begs the question of what an efficient codification should ideally look like. The Swiss Federal codification is remarkably compact, despite the fact that the English version does not contain all acts or regulations available in German, French or Italian (which are the official languages); nevertheless all important and recent ones are included, cf. <xref ref-type="sec" rid="s6-1">Section&#x20;6.1</xref>.</p>
<sec id="s7-1">
<title>7.1 Normalized Entropy and Compression Factor</title>
<p>The normalized vocabulary entropies per corpus, cf. <xref ref-type="table" rid="T3">Table&#x20;3</xref>, have a standard deviation of approximately 0.01, and average entropy values that are distributed as follows: English in <inline-formula id="inf52">
<mml:math id="m62">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>0.73</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>0.80</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, German in <inline-formula id="inf53">
<mml:math id="m63">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>0.78</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>0.81</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and French in <inline-formula id="inf54">
<mml:math id="m64">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>0.74</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>0.77</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. The analysis of the mean compression factors, based on the individual texts truncated at 150&#xa0;K, reveals three regions where the values accumulate, cf. <xref ref-type="table" rid="T3">Table&#x20;3</xref>. So, Shakespeare&#x2019;s works have a mean compression factor of 2.52 (std &#x3d; 0.03), the EuroParl corpora in English, French and German of around 3.01 (std &#x3d; 0.06 approximately), whereas the national codifications are located in the interval <inline-formula id="inf55">
<mml:math id="m65">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>3.75</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>5.23</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, with the standard deviations being in the interval <inline-formula id="inf56">
<mml:math id="m66">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>0.14</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>1.24</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. On average, all national acts have a lower compression factor and a lower standard deviation than the corresponding national regulations. The (French), German, Swiss and United&#x20;States acts are in the sub-interval <inline-formula id="inf57">
<mml:math id="m67">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>3.75</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>4.12</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, and the respective regulations in <inline-formula id="inf58">
<mml:math id="m68">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>4.00</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>4.28</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, but with a large standard deviation (1.06), for Germany and the United&#x20;States. Based on the mean compression factor, the variance, the number of acts and the total size of the corpus, the French and the US codes are most similar. The acts of Canada (English and French) and of the United&#x20;Kingdom are located at the upper end of the interval, namely in <inline-formula id="inf59">
<mml:math id="m69">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>4.68</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>5.0</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, as are the Canadian regulations with 4.98 and 5.23, for French and English, respectively. The values for the constitutions can be found in the interval <inline-formula id="inf60">
<mml:math id="m70">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>3.57</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>3.88</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> (compression factors), and <inline-formula id="inf61">
<mml:math id="m71">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>0.75</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>0.79</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> (normalized vocabulary entropy). The value of the compression factor of the Canadian and German Constitution is smaller than the corresponding mean value of the acts or regulations, but larger than that of EuroParl (DE, EN, FR) or Shakespeare. In the case of the Swiss Federal Constitution and its aligned translations into English, French and German, the compression factor is significantly higher than the corresponding EuroParl average values, but between the mean of the acts (EN) and the mean of the regulations (EN), cf. <xref ref-type="table" rid="T2">Tables 2,</xref>&#x20;<xref ref-type="table" rid="T3">3</xref>.</p>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>Summary statistics on acts, regulations (reg.) and English literature.<inline-formula id="inf62">
<mml:math id="m72">
<mml:mo>&#x23;</mml:mo>
</mml:math>
</inline-formula>
</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Corpus</th>
<th align="center">
<inline-formula id="inf63">
<mml:math id="m73">
<mml:mo>&#x23;</mml:mo>
</mml:math>
</inline-formula> Texts</th>
<th align="center">Mean (cfc.)</th>
<th align="center">Std. (cfc.)</th>
<th align="center">Mean (nve.)</th>
<th align="center">Std. (nve.)</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">CA acts (EN)</td>
<td align="char" char=".">75</td>
<td align="char" char=".">5.00</td>
<td align="char" char=".">0.94</td>
<td align="char" char=".">0.73</td>
<td align="char" char=".">0.01</td>
</tr>
<tr>
<td align="left">CA reg. (EN)</td>
<td align="char" char=".">54</td>
<td align="char" char=".">5.23</td>
<td align="char" char=".">1.18</td>
<td align="char" char=".">0.73</td>
<td align="char" char=".">0.02</td>
</tr>
<tr>
<td align="left">CA acts (FR)</td>
<td align="char" char=".">74</td>
<td align="char" char=".">4.64</td>
<td align="char" char=".">0.93</td>
<td align="char" char=".">0.75</td>
<td align="char" char=".">0.01</td>
</tr>
<tr>
<td align="left">CA reg. (FR)</td>
<td align="char" char=".">60</td>
<td align="char" char=".">4.98</td>
<td align="char" char=".">1.24</td>
<td align="char" char=".">0.74</td>
<td align="char" char=".">0.02</td>
</tr>
<tr>
<td align="left">F codes</td>
<td align="char" char=".">58</td>
<td align="char" char=".">4.10</td>
<td align="char" char=".">0.28</td>
<td align="char" char=".">0.76</td>
<td align="char" char=".">0.01</td>
</tr>
<tr>
<td align="left">D acts</td>
<td align="char" char=".">78</td>
<td align="char" char=".">4.12</td>
<td align="char" char=".">0.42</td>
<td align="char" char=".">0.78</td>
<td align="char" char=".">0.01</td>
</tr>
<tr>
<td align="left">D reg.</td>
<td align="char" char=".">69</td>
<td align="char" char=".">4.28</td>
<td align="char" char=".">1.06</td>
<td align="char" char=".">0.79</td>
<td align="char" char=".">0.01</td>
</tr>
<tr>
<td align="left">United&#x20;Kingdom PGA</td>
<td align="char" char=".">431</td>
<td align="char" char=".">4.68</td>
<td align="char" char=".">0.44</td>
<td align="char" char=".">0.74</td>
<td align="char" char=".">0.01</td>
</tr>
<tr>
<td align="left">U.S. Codes (2020)</td>
<td align="char" char=".">49</td>
<td align="char" char=".">4.11</td>
<td align="char" char=".">0.29</td>
<td align="char" char=".">0.74</td>
<td align="char" char=".">0.01</td>
</tr>
<tr>
<td align="left">U.S. CFR (2000)</td>
<td align="char" char=".">200</td>
<td align="char" char=".">4.04</td>
<td align="char" char=".">0.72</td>
<td align="char" char=".">0.77</td>
<td align="char" char=".">0.02</td>
</tr>
<tr>
<td align="left">U.S. CFR (2019)</td>
<td align="char" char=".">241</td>
<td align="char" char=".">4.16</td>
<td align="char" char=".">1.06</td>
<td align="char" char=".">0.78</td>
<td align="char" char=".">0.02</td>
</tr>
<tr>
<td align="left">CH fed. acts (EN)</td>
<td align="char" char=".">4</td>
<td align="char" char=".">3.75</td>
<td align="char" char=".">0.14</td>
<td align="char" char=".">0.76</td>
<td align="char" char=".">0.00</td>
</tr>
<tr>
<td align="left">CH fed. reg. (EN)</td>
<td align="char" char=".">5</td>
<td align="char" char=".">4.00</td>
<td align="char" char=".">0.23</td>
<td align="char" char=".">0.77</td>
<td align="char" char=".">0.01</td>
</tr>
<tr>
<td align="left">EuroParl (DE)</td>
<td align="char" char=".">&#x2014;</td>
<td align="char" char=".">2.95</td>
<td align="char" char=".">0.05</td>
<td align="char" char=".">0.81</td>
<td align="char" char=".">0.00</td>
</tr>
<tr>
<td align="left">EuroParl (EN)</td>
<td align="char" char=".">&#x2014;</td>
<td align="char" char=".">3.02</td>
<td align="char" char=".">0.05</td>
<td align="char" char=".">0.77</td>
<td align="char" char=".">0.00</td>
</tr>
<tr>
<td align="left">EuroParl (FR)</td>
<td align="char" char=".">&#x2014;</td>
<td align="char" char=".">3.06</td>
<td align="char" char=".">0.06</td>
<td align="char" char=".">0.77</td>
<td align="char" char=".">0.00</td>
</tr>
<tr>
<td align="left">Shakespeare</td>
<td align="char" char=".">10</td>
<td align="char" char=".">2.52</td>
<td align="char" char=".">0.03</td>
<td align="char" char=".">0.80</td>
<td align="char" char=".">0.00</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>notes: cfac &#x3d; compression factor; nve. &#x3d; normalized vocabulary entropy; <inline-formula id="inf64">
<mml:math id="m74">
<mml:mo>&#x23;</mml:mo>
</mml:math>
</inline-formula> texts &#x3d; number of texts considered at 150&#xa0;K.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="s7-2">
<title>7.2&#x20;Complexity-Entropy Plane</title>
<p>The general picture of all texts analyzed in this study, cf. <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>, reveals, that the literary works of Shakespeare occupy a region to the left and are well separated from all the other data points. The three points corresponding to the English, French and German EuroParl samples are also well separated from the vast majority of legal texts and Shakespeare&#x2019;s collected works. This indicates that legal texts are much more redundant than classic literary texts or parliamentary speeches. The picture for the constitutions is heterogeneous for the data considered.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Figure showing the mean compression factor and mean normalized vocabulary entropy for: 1 &#x3d; Canadian acts (EN), 2 &#x3d; Canadian regulations (EN), 3 &#x3d; Canadian regulations (FR), 4 &#x3d; Canadian acts (FR), 5 &#x3d; U.S. Code Titles 1&#x2013;54, 6 &#x3d; U.S. CFR 2019, 7 &#x3d; United&#x20;Kingdom acts, 8 &#x3d; French acts (FR), 9 &#x3d; German Federal acts (DE), 10 &#x3d; German Federal regulations (DE), 11 &#x3d; Shakespeare&#x2019;s collected works, 12 &#x3d; Swiss Federal acts (EN), 13 &#x3d; Swiss Federal regulations (EN) 14 &#x3d; EuroParl speeches (EN), 15 &#x3d; EuroParl speeches (FR), 16 &#x3d; EuroParl speeches (DE); and the compression factor and normalized vocabulary entropy (green marker) for: a &#x3d; Swiss Federal Constitution (DE), b &#x3d; Swiss Federal Constitution (EN), c &#x3d; Swiss Federal Constitution (FR), d &#x3d; Canadian Constitution (EN), e &#x3d; German Constitution (Grundgesetz) (DE). The ellipses are centered around the mean values and have half-axes corresponding to <inline-formula id="inf65">
<mml:math id="m75">
<mml:mrow>
<mml:mrow>
<mml:mi>&#x3c3;</mml:mi>
<mml:mo>/</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> of the standard deviation of the compression factor and the normalized vocabulary entropy, respectively. Colors of ellipses correspond to: red &#x3d; speeches (EuroParl), green &#x3d; literature (Shakespeare), light blue &#x3d; acts, orange &#x3d; regulations; all texts truncated at 150&#xa0;K.</p>
</caption>
<graphic xlink:href="fphy-09-671882-g001.tif"/>
</fig>
<p>The German (DE) and Canadian (EN) Constitution are located on the left border of the region, which contains the respective national acts and ordinances, while the Swiss Federal Constitution lies between the averages of the acts and ordinances, but is much closer to the mean of the&#x20;acts.</p>
<p>The plot for U.S. Code (USC), Titles 1&#x2013;54 for the year 2020, and U.S. Code of Federal Regulations (CFR) for the years 2000 and 2019, cf. <xref ref-type="fig" rid="F2">Figure&#x20;2</xref>, shows that the Federal acts occupy a distinguishable region which is located below the domain populated by the Federal regulations. This is in line with the values from <xref ref-type="table" rid="T3">Table&#x20;3</xref>, as the mean vocabulary entropy for USC is 0.74, as compared to 0.77, for CFR 2000, and 0.78, for CFR 2019. On the other hand, the distribution pattern of the regulations in 2000 and 2019 is similar (small changes in the region around the means), but several points are more spread out in the 2019 data, which is in line with the larger standard deviation of 1.06 in 2019 vs. 0.72 in 2000. However, the overall size of CFR 2000 is 940&#xa0;MB, vs. 572,9&#xa0;MB, for CFR 2019, which is a quite substantial difference.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Figure showing the mean compression factor and mean normalized vocabulary entropy for: 1 &#x3d; U.S. Code Titles 1&#x2013;54, 2 &#x3d; U.S. CFR 2019, 3 &#x3d; U.S. CFR 2000, 4 &#x3d; Shakespeare&#x2019;s collected works, 5 &#x3d; EuroParl speeches (EN), 6 &#x3d; EuroParl speeches (FR), 7 &#x3d; EuroParl speeches (DE). The ellipses are centered around the mean values and have axes corresponding to <inline-formula id="inf66">
<mml:math id="m76">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mi>&#x3c3;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> of the standard deviation of the compression factor and the normalized vocabulary entropy, respectively. Colors of ellipses correspond to: green &#x3d; U.S. Federal acts (2020), orange &#x3d; U.S. Federal regulations (2019), light blue &#x3d; U.S. Federal regulations (2000); all texts truncated at 150&#xa0;K.</p>
</caption>
<graphic xlink:href="fphy-09-671882-g002.tif"/>
</fig>
<p>We have already noted the similarity of the U.S. Titles and the French Codes. As <xref ref-type="fig" rid="F3">Figure&#x20;3</xref> shows, the French Codes (in French), German Federal acts (in German) and the U.S. Titles (in English) are situated in the complexity-entropy plane, almost as vertical, non-overlapping, translations of each other, with the German acts being highest up. The order of the average normalized vocabulary entropies appears to be language specific, although in this case we are not considering (aligned) translations, cf. <xref ref-type="sec" rid="s7-3">Section&#x20;7.3</xref>.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Figure showing the mean compression factor and mean normalized vocabulary entropy for: 1 &#x3d; U.S. Code Titles 1&#x2013;54, 2 &#x3d; French Codes (FR), 3 &#x3d; German Federal acts (DE), 4 &#x3d; Shakespeare&#x2019;s collected works, 5 &#x3d; EuroParl speeches (EN), 6 &#x3d; EuroParl speeches (FR), 7 &#x3d; EuroParl speeches (DE). The ellipses are centered around the mean values and have axes corresponding to <inline-formula id="inf67">
<mml:math id="m77">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mi>&#x3c3;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> of the standard deviation of the compression factor and the normalized vocabulary entropy, respectively. Colors of ellipses correspond to: light blue &#x3d; U.S. Code (2020), orange &#x3d; French Codes, green &#x3d; German Federal acts; all texts truncated at 150&#xa0;K.</p>
</caption>
<graphic xlink:href="fphy-09-671882-g003.tif"/>
</fig>
<p>The picture for the aligned translations of the Canadian acts and regulations into English and French, cf. <xref ref-type="fig" rid="F4">Figure&#x20;4</xref>, reveals that the acts are located, depending on the language, in separated regions which are bounded by ellipses of the same size around the respective means. For both English and French, the regulations are more dispersed than the acts (in particular the French) and the regulations in French are more widespread than those in English. The mean normalized entropy of the regulations in French is below the mean of the acts in French, but above the mean of the acts and regulations in English. The slightly odd position of the regulations in French could be due to the fact that after being truncated at 150&#xa0;K, 60 (FR) vs. 54 (EN) regulations remain, while for the acts the number of texts remaining is the same. As we are dealing with aligned translations, the observed language specific pattern is quite meaningful, cf. <xref ref-type="sec" rid="s7-3">Section 7.3</xref>. On the other hand, Canadian acts and regulations in the same language are not easily separable, i.e. they show a distribution pattern that differs from the U.S. Titles and U.S. Federal regulations, cf. <xref ref-type="fig" rid="F2">Figure&#x20;2</xref>.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Figure showing the mean compression factor and mean normalized vocabulary entropy for: 1 &#x3d; Canadian acts (EN), 2 &#x3d; Canadian regulations (EN), 3 &#x3d; Canadian regulations (FR), 4 &#x3d; Canadian acts (FR), 5 &#x3d; Shakespeare&#x2019;s collected works, 6 &#x3d; EuroParl speeches (EN), 7 &#x3d; EuroParl speeches (FR), 8 &#x3d; EuroParl speeches (DE). The ellipses are centered around the mean values and have axes corresponding to <inline-formula id="inf68">
<mml:math id="m78">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mi>&#x3c3;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> of the standard deviation of the compression factor and the normalized vocabulary entropy, respectively. Colors of ellipses correspond to: light blue &#x3d; Canadian acts (EN), orange &#x3d; Canadian regulations (EN), green &#x3d; Canadian acts (FR), red &#x3d; Canadian regulations (FR); all texts truncated at 150&#xa0;K.</p>
</caption>
<graphic xlink:href="fphy-09-671882-g004.tif"/>
</fig>
<p>The German Federal acts and regulations accumulate in nearby and overlapping areas of the plane, and cannot be clearly separated from each other, with the laws being more compactly grouped around the mean. The acts of Canada (EN), the United&#x20;States and the United&#x20;Kingdom are close to each other, but far below the German acts and regulations, cf. <xref ref-type="fig" rid="F5">Figure&#x20;5</xref>. Indeed, this seems to reflect language-specific characteristics common to all genres.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>Figure showing the mean compression factor and mean normalized vocabulary entropy for: 1 &#x3d; Canadian acts (EN), 2 &#x3d; U.S. Code, Titles 1&#x2013;54 (USC), 3 &#x3d; United Kingdom General Public Acts (PGA), 4 &#x3d; German Federal acts (DE), 5 &#x3d; German Federal regulations (DE), 6 &#x3d; Shakespeare&#x2019;s collected works, 7 &#x3d; EuroParl speeches (EN), 8 &#x3d; EuroParl speeches (FR), 9 &#x3d; EuroParl speeches (DE). The ellipses are centered around the mean values and have axes corresponding to <inline-formula id="inf69">
<mml:math id="m79">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mi>&#x3c3;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> of the standard deviation of the compression factor and the normalized vocabulary entropy, respectively. Colors of ellipses correspond to: light blue &#x3d; Canadian acts (EN), orange &#x3d; German Federal acts (DE), green &#x3d; German Federal regulations (DE), red &#x3d; United&#x20;Kingdom PGA, purple &#x3d; USC; all texts truncated at 150&#xa0;K.</p>
</caption>
<graphic xlink:href="fphy-09-671882-g005.tif"/>
</fig>
<p>The fact that the United&#x20;States Code, unlike for Canada, Germany and Switzerland, is fairly well separated in the plane from its associated regulations could reflect differences in the way laws and regulations are drafted in the United&#x20;States as compared to the countries mentioned&#x20;above.</p>
</sec>
<sec id="s7-3">
<title>7.3 Distinguishing Different Languages</title>
<p>From the above discussion it can be seen that different languages can be distinguished by the normalized vocabulary entropy if the genre is kept constant. In order to further investigate the language effect on the position of the corpora in the complexity-entropy plane, we specifically considered aligned translations. So, additionally to the Swiss Federal Constitution (English, French and German), the German EuroParl corpus and its translation into English and French, we processed the nine largest Swiss Federal acts in English, French and German. However, in order to have enough Swiss Federal acts, we had to lower the cutoff to 100K, and correspondingly had to recalculate the EuroParl values. Additionally we added the collected works of Shakespeare (in English), with a cutoff of 100&#xa0;K. Further, we have the Canadian acts and regulations, and their aligned translations into English and French. The results imply that (aligned) translations of the same collection of texts into different languages are primarily not distinguished by the compression factor but rather by the (normalized) vocabulary entropy, cf. <xref ref-type="fig" rid="F6">Figure&#x20;6</xref> and <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>Figure showing the mean compression factor and mean normalized vocabulary entropy for: 1 &#x3d; Swiss Federal acts (EN), 2 &#x3d; Swiss Federal acts (DE), 3 &#x3d; Swiss Federal acts (FR), 4 &#x3d; Shakespeare&#x2019;s collected works (EN), 5 &#x3d; EuroParl speeches (EN), 6 &#x3d; EuroParl speeches (DE), 7 &#x3d; EuroParl speeches (FR), and the compression factor and normalized vocabulary entropy for: a &#x3d; Swiss Federal Constitution (EN), b &#x3d; Swiss Federal Constitution (DE), c &#x3d; Swiss Federal Constitution (FR). The ellipses are centered around the mean values, and have axes corresponding to <inline-formula id="inf70">
<mml:math id="m80">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mi>&#x3c3;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> of the standard deviation of the compression factor and the normalized vocabulary entropy, respectively. Color code: red &#x3d; EuroParl speeches, green &#x3d; literature, light blue &#x3d; acts; all texts truncated at 100&#xa0;K.</p>
</caption>
<graphic xlink:href="fphy-09-671882-g006.tif"/>
</fig>
</sec>
</sec>
<sec sec-type="conclusion" id="s8">
<title>8 Conclusion</title>
<p>We introduced a tool that is new to the legal field but has already served other areas of scientific research well. Its main strength is the ability to simultaneously capture and visualize independent and fundamental information, namely entropy and complexity, of large collections of data, and to track changes over time. By devising a novel variant of the complexity-entropy plane, we were not only able to show that legal texts of different types and languages are located in distinguishable regions, but also to identify different drafting approaches with regard to laws and regulations. In addition, we have taken the first steps to follow the spatial evolution of the legislation over time. Although we observe that constitutions tend to have lower compression factors than acts and regulations, and regulations on average have higher compression factors than acts, which corresponds to the hierarchy of norms, we could not fully capture the assumed abstraction gradient. This suggests that other language-specific methods should also be used to investigate (possible) differences. On the other hand, the high(er) redundancy of the regulations reflects the increasing need to leave the realm of natural language and to borrow tools from the natural sciences. The analysis we perform can be modified in a number of ways to provide even more specific information. So, one might include <italic>n</italic>-grams, or perform additional pre-processing steps, or choose different compression algorithms. Also, one might add a third coordinate for even more visual information. In combination with other quantitative methods such as citation networks or the consideration of additional (internal) degrees of freedom such as local entropy, new types of quantitative research questions could be formulated, which may lead to more efficient and manageable legislation. In summary, we expect a broad range of further applications of complexity-entropy diagrams within the legal domain.</p>
</sec>
</body>
<back>
<sec id="s9">
<title>Data Availability Statement</title>
<p>Publicly available datasets were analyzed in this study. The data can be found at: <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://uscode.house.gov">https://uscode.house.gov</ext-link>, <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.govinfo.gov/bulkdata/CFR">https://www.govinfo.gov/bulkdata/CFR</ext-link>, <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.legislation.gov.uk/ukpga">https://www.legislation.gov.uk/ukpga</ext-link>, <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://open.canada.ca/data/en/dataset/eb0dee21-9123-4d0d-b11d-0763fa1fb403">https://open.canada.ca/data/en/dataset/eb0dee21-9123-4d0d-b11d-0763fa1fb403</ext-link>, <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.fedlex.admin.ch/en/cc/internal-law/">https://www.fedlex.admin.ch/en/cc/internal-law/</ext-link>, <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.gesetze-im-internet.de/aktuell.html">https://www.gesetze-im-internet.de/aktuell.html</ext-link>, <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.legifrance.gouv.fr/liste/code?etatTexte=VIGUEUR&#x0026;page=1#code">https://www.legifrance.gouv.fr/liste/code?etatTexte&#x003D;VIGUEUR&#x0026;page&#x003D;1#code</ext-link>, <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.statmt.org/europarl/">https://www.statmt.org/europarl/</ext-link>.</p>
</sec>
<sec id="s10">
<title>Author Contributions</title>
<p>RF contributed to the methods, analyzed the data and wrote the article.</p>
</sec>
<sec id="s11">
<title>Funding</title>
<p>The author received funding from the Max Planck Institute for the Physics of Complex Systems (MPIPKS).</p>
</sec>
<sec sec-type="COI-statement" id="s12">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<ack>
<p>RF thanks Elliott Ash (ETH Zurich) and Holger Spamann (Harvard University) for numerous stimulating discussions. He thanks the condensed matter physics group at the Max Planck Institute for the Physics of Complex Systems in Dresden for its hospitality during his stay in 2020, and its support. Finally, he thanks the anonymous referees for their constructive comments and suggestions which helped to improve this article.</p>
</ack>
<sec id="s13">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fphy.2021.671882/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fphy.2021.671882/full&#x23;supplementary-material</ext-link>
</p>
<supplementary-material xlink:href="DataSheet1.PDF" id="SM1" mimetype="application/PDF" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<fn-group>
<fn id="fn1">
<label>1</label>
<p>Conseil Constitutionnel, D&#xe9;cision n 2005&#x2013;530 DC du 29 d&#xe9;cembre 2005 (Loi de Finances pour 2006) 77&#x2013;89, available at <ext-link ext-link-type="uri" xlink:href="https://www.conseil-constitutionnel.fr/decision/2005/2005530DC.htm">https://www.conseil-constitutionnel.fr/decision/2005/2005530DC.htm</ext-link>.</p>
</fn>
<fn id="fn2">
<label>2</label>
<p>This could be translated with &#x201c;hierarchy of the legal order&#x201d; or &#x201c;hierarchy of norms&#x201d;.</p>
</fn>
<fn id="fn3">
<label>3</label>
<p>Note that we do not consider quantities in the limit or issues like the convergence of entropy estimates.</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1">
<label>1.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schuck</surname>
<given-names>PH</given-names>
</name>
</person-group>. <article-title>Legal Complexity: Some Causes, Consequences, and Cures</article-title>. <source>Duke L J</source> (<year>1992</year>) <volume>42</volume>:<fpage>1</fpage>&#x2013;<lpage>52</lpage>. <pub-id pub-id-type="doi">10.2307/1372753</pub-id> </citation>
</ref>
<ref id="B2">
<label>2.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rook</surname>
<given-names>LW</given-names>
</name>
</person-group>. <article-title>Laying Down the Law: Canons for Drafting Complex Legislation</article-title>. <source>Or L Rev</source> (<year>1993</year>) <volume>72</volume>:<fpage>663</fpage>. </citation>
</ref>
<ref id="B3">
<label>3.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Mazzega</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Bourcier</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Boulet</surname>
<given-names>R</given-names>
</name>
</person-group>. <source>Proceedings of the 12th International Conference on Artificial Intelligence and Law</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Association for Computing Machinery)</publisher-name> (<year>2009</year>). p. <fpage>236</fpage>&#x2013;<lpage>7</lpage>. <article-title>The Network of French Legal Codes.</article-title> </citation>
</ref>
<ref id="B4">
<label>4.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bommarito</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Katz</surname>
<given-names>DM</given-names>
</name>
</person-group>. <article-title>A Mathematical Approach to the Study of the united states Code</article-title>. <source>Physica A: Stat Mech its Appl</source> (<year>2010</year>) <volume>389</volume>:<fpage>4195</fpage>&#x2013;<lpage>200</lpage>. <pub-id pub-id-type="doi">10.1016/j.physa.2010.05.057</pub-id> </citation>
</ref>
<ref id="B5">
<label>5.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Katz</surname>
<given-names>DM</given-names>
</name>
<name>
<surname>Bommarito</surname>
<given-names>MJ</given-names>
</name>
</person-group>. <article-title>Measuring the Complexity of the Law: the united states Code</article-title>. <source>Artif Intell L</source> (<year>2014</year>) <volume>22</volume>:<fpage>337</fpage>&#x2013;<lpage>74</lpage>. <pub-id pub-id-type="doi">10.1007/s10506-014-9160-8</pub-id> </citation>
</ref>
<ref id="B6">
<label>6.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bourcier</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Mazzega</surname>
<given-names>P</given-names>
</name>
</person-group>. <article-title>Toward Measures of Complexity in Legal Systems</article-title>. <source>Proceedings of the 11th International Conference on Artificial Intelligence and Law</source> (<year>2007</year>). p. <fpage>211</fpage>&#x2013;<lpage>5</lpage>. </citation>
</ref>
<ref id="B7">
<label>7.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Crutchfield</surname>
<given-names>JP</given-names>
</name>
<name>
<surname>Young</surname>
<given-names>K</given-names>
</name>
</person-group>. <article-title>Inferring Statistical Complexity</article-title>. <source>Phys Rev Lett</source> (<year>1989</year>) <volume>63</volume>:<fpage>105</fpage>&#x2013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1103/physrevlett.63.105</pub-id> </citation>
</ref>
<ref id="B8">
<label>8.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ruhl</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Katz</surname>
<given-names>DM</given-names>
</name>
</person-group>. <article-title>Measuring, Monitoring, and Managing Legal Complexity</article-title>. <source>Iowa L Rev</source> (<year>2015</year>) <volume>101</volume>:<fpage>191</fpage>&#x2013;<lpage>244</lpage>. </citation>
</ref>
<ref id="B9">
<label>9.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shannon</surname>
<given-names>CE</given-names>
</name>
</person-group>. <article-title>A Mathematical Theory of Communication</article-title>. <source>Bell Syst Tech J</source> (<year>1948</year>) <volume>27</volume>:<fpage>379</fpage>&#x2013;<lpage>423</lpage>. <pub-id pub-id-type="doi">10.1002/j.1538-7305.1948.tb01338.x</pub-id> </citation>
</ref>
<ref id="B10">
<label>10.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chang</surname>
<given-names>M-C</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>AC-C</given-names>
</name>
<name>
<surname>Eugene Stanley</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Peng</surname>
<given-names>C-K</given-names>
</name>
</person-group>. <article-title>Measuring Information-Based Energy and Temperature of Literary Texts</article-title>. <source>Physica A: Stat Mech its Appl</source> (<year>2017</year>) <volume>468</volume>:<fpage>783</fpage>&#x2013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1016/j.physa.2016.11.106</pub-id> </citation>
</ref>
<ref id="B11">
<label>11.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Cover</surname>
<given-names>TM</given-names>
</name>
<name>
<surname>Thomas</surname>
<given-names>JA</given-names>
</name>
</person-group>. <source>Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)</source>. <publisher-loc>USA</publisher-loc>: <publisher-name>Wiley-Interscience</publisher-name> (<year>2006</year>).</citation>
</ref>
<ref id="B12">
<label>12.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Vit&#xe1;nyi</surname>
<given-names>PM</given-names>
</name>
</person-group>. <source>An Introduction to Kolmogorov Complexity and its Applications</source>. <edition>4 edn.</edition> <publisher-loc>Incorporated</publisher-loc>: <publisher-name>Springer Publishing Company</publisher-name> (<year>2019</year>).</citation>
</ref>
<ref id="B13">
<label>13.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lempel</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ziv</surname>
<given-names>J</given-names>
</name>
</person-group>. <article-title>On the Complexity of Finite Sequences</article-title>. <source>IEEE Trans Inform Theor</source> (<year>1976</year>) <volume>22</volume>:<fpage>75</fpage>&#x2013;<lpage>81</lpage>. <pub-id pub-id-type="doi">10.1109/TIT.1976.1055501</pub-id> </citation>
</ref>
<ref id="B14">
<label>14.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wolfram</surname>
<given-names>S</given-names>
</name>
</person-group>. <article-title>Computation Theory of Cellular Automata</article-title>. <source>Commun.Math Phys</source> (<year>1984</year>) <volume>96</volume>:<fpage>15</fpage>&#x2013;<lpage>57</lpage>. <pub-id pub-id-type="doi">10.1007/bf01217347</pub-id> </citation>
</ref>
<ref id="B15">
<label>15.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ziv</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Lempel</surname>
<given-names>A</given-names>
</name>
</person-group>. <article-title>A Universal Algorithm for Sequential Data Compression</article-title>. <source>IEEE Trans Inform Theor</source> (<year>1977</year>) <volume>23</volume>:<fpage>337</fpage>&#x2013;<lpage>43</lpage>. <pub-id pub-id-type="doi">10.1109/tit.1977.1055714</pub-id> </citation>
</ref>
<ref id="B16">
<label>16.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ziv</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Lempel</surname>
<given-names>A</given-names>
</name>
</person-group>. <article-title>Compression of Individual Sequences via Variable-Rate Coding</article-title>. <source>IEEE Trans Inform Theor</source> (<year>1978</year>) <volume>24</volume>:<fpage>530</fpage>&#x2013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1109/tit.1978.1055934</pub-id> </citation>
</ref>
<ref id="B17">
<label>17.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Hansel</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Perrin</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Simon</surname>
<given-names>I</given-names>
</name>
</person-group>. <article-title>Compression and Entropy</article-title>. <source>Annual Symposium on Theoretical Aspects of Computer Science</source>. <publisher-loc>Berlin, Heidelberg</publisher-loc>: <publisher-name>Springer</publisher-name> (<year>1992</year>).</citation>
</ref>
<ref id="B18">
<label>18.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Salomon</surname>
<given-names>D</given-names>
</name>
</person-group>. <source>Data Compression: The Complete Reference</source>. <edition>4 edn.</edition> <publisher-loc>London</publisher-loc>: <publisher-name>Springer-Verlag</publisher-name> (<year>2007</year>). <pub-id pub-id-type="doi">10.1007/978-1-84628-959-0</pub-id> </citation>
</ref>
<ref id="B19">
<label>19.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chaitin</surname>
<given-names>GJ</given-names>
</name>
</person-group>. <article-title>Algorithmic Information Theory</article-title>. <source>IBM J&#x20;Res Dev</source> (<year>1977</year>) <volume>21</volume>:<fpage>350</fpage>&#x2013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1147/rd.214.0350</pub-id> </citation>
</ref>
<ref id="B20">
<label>20.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Martiniani</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Lemberg</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Chaikin</surname>
<given-names>PM</given-names>
</name>
<name>
<surname>Levine</surname>
<given-names>D</given-names>
</name>
</person-group>. <article-title>Correlation Lengths in the Language of Computable Information</article-title>. <source>Phys Rev Lett</source> (<year>2020</year>) <volume>125</volume>:<fpage>170601</fpage>. <pub-id pub-id-type="doi">10.1103/physrevlett.125.170601</pub-id> </citation>
</ref>
<ref id="B21">
<label>21.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Montemurro</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Zanette</surname>
<given-names>DH</given-names>
</name>
</person-group>. <article-title>Universal Entropy of Word Ordering across Linguistic Families</article-title>. <source>PLoS ONE</source> (<year>2011</year>) <volume>6</volume>:<fpage>e19875</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0019875</pub-id> </citation>
</ref>
<ref id="B22">
<label>22.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Friedrich</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Luzzatto</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Ash</surname>
<given-names>E</given-names>
</name>
</person-group>. <source>Entropy in Legal Language</source>. <conf-name>2nd Workshop on Natural Legal Language Processing (NLLP, Collocated with KDD 2020) CEUR Workshop Proceedings</conf-name>. <volume>Vol. 2645</volume>, (<publisher-name>virtual</publisher-name>) (<year>2020</year>). <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://ceur-ws.org/Vol-2645/">http://ceur-ws.org/Vol-2645/</ext-link>
</comment>.</citation>
</ref>
<ref id="B23">
<label>23.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Estevez-Rams</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Mesa-Rodriguez</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Estevez-Moya</surname>
<given-names>D</given-names>
</name>
</person-group>. <article-title>Complexity-entropy Analysis at Different Levels of Organisation in Written Language</article-title>. <source>PloS one</source> (<year>2019</year>) <volume>14</volume>:<fpage>e0214863</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0214863</pub-id> </citation>
</ref>
<ref id="B24">
<label>24.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Grunwald</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Vit&#xe1;nyi</surname>
<given-names>P</given-names>
</name>
</person-group>. <article-title>Shannon Information and Kolmogorov Complexity</article-title>. <comment>arXiv preprint cs/0410002</comment>. (<year>2004</year>). </citation>
</ref>
<ref id="B25">
<label>25.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Jurafsky</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Martin</surname>
<given-names>JH</given-names>
</name>
</person-group>. <source>Speech and Language Processing</source>. <edition>2nd ed.</edition> <publisher-loc>USA</publisher-loc>: <publisher-name>Prentice-Hall</publisher-name> (<year>2009</year>). <pub-id pub-id-type="doi">10.1109/asru.2009.5373494</pub-id> </citation>
</ref>
<ref id="B26">
<label>26.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shannon</surname>
<given-names>CE</given-names>
</name>
</person-group>. <article-title>Prediction and Entropy of Printed English</article-title>. <source>Bell Syst Tech J</source> (<year>1951</year>) <volume>30</volume>:<fpage>50</fpage>&#x2013;<lpage>64</lpage>. <pub-id pub-id-type="doi">10.1002/j.1538-7305.1951.tb01366.x</pub-id> </citation>
</ref>
<ref id="B27">
<label>27.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Debowski</surname>
<given-names>&#x141;</given-names>
</name>
</person-group>. <article-title>Is Natural Language a Perigraphic Process? the Theorem about Facts and Words Revisited</article-title>. <source>Entropy</source> (<year>2018</year>) <volume>20</volume>:<fpage>85</fpage>. </citation>
</ref>
<ref id="B28">
<label>28.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sch&#xfc;rmann</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Grassberger</surname>
<given-names>P</given-names>
</name>
</person-group>. <article-title>Entropy Estimation of Symbol Sequences</article-title>. <source>Chaos</source> (<year>1996</year>) <volume>6</volume>:<fpage>414</fpage>&#x2013;<lpage>27</lpage>. <pub-id pub-id-type="doi">10.1063/1.166191</pub-id> </citation>
</ref>
<ref id="B29">
<label>29.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>DeGiuli</surname>
<given-names>E</given-names>
</name>
</person-group>. <article-title>Random Language Model</article-title>. <source>Phys Rev Lett</source> (<year>2019</year>) <volume>122</volume>:<fpage>128301</fpage>. <pub-id pub-id-type="doi">10.1103/physrevlett.122.128301</pub-id> </citation>
</ref>
<ref id="B30">
<label>30.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Grassberger</surname>
<given-names>P</given-names>
</name>
</person-group>. <source>Entropy Estimates from Insufficient Samplings</source> (<year>2003</year>) (<comment>arXiv preprint physics/<italic>0307138</italic>
</comment>.</citation>
</ref>
<ref id="B31">
<label>31.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Grassberger</surname>
<given-names>P</given-names>
</name>
</person-group>. <article-title>Toward a Quantitative Theory of Self-Generated Complexity</article-title>. <source>Int J&#x20;Theor Phys</source> (<year>1986</year>) <volume>25</volume>:<fpage>907</fpage>&#x2013;<lpage>38</lpage>. <pub-id pub-id-type="doi">10.1007/bf00668821</pub-id> </citation>
</ref>
<ref id="B32">
<label>32.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Crutchfield</surname>
<given-names>JP</given-names>
</name>
</person-group>. <article-title>Between Order and Chaos</article-title>. <source>Nat Phys</source> (<year>2012</year>) <volume>8</volume>:<fpage>17</fpage>&#x2013;<lpage>24</lpage>. <pub-id pub-id-type="doi">10.1038/nphys2190</pub-id> </citation>
</ref>
<ref id="B33">
<label>33.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huberman</surname>
<given-names>BA</given-names>
</name>
<name>
<surname>Hogg</surname>
<given-names>T</given-names>
</name>
</person-group>. <article-title>Complexity and Adaptation</article-title>. <source>Physica D: Nonlinear Phenomena</source> (<year>1986</year>) <volume>22</volume>:<fpage>376</fpage>&#x2013;<lpage>84</lpage>. <pub-id pub-id-type="doi">10.1016/0167-2789(86)90308-1</pub-id> </citation>
</ref>
<ref id="B34">
<label>34.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>L&#xf3;pez-Ruiz</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Mancini</surname>
<given-names>HL</given-names>
</name>
<name>
<surname>Calbet</surname>
<given-names>X</given-names>
</name>
</person-group>. <article-title>A Statistical Measure of Complexity</article-title>. <source>Phys Lett A</source> (<year>1995</year>) <volume>209</volume>:<fpage>321</fpage>&#x2013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1016/0375-9601(95)00867-5</pub-id> </citation>
</ref>
<ref id="B35">
<label>35.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bagrov</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Iakovlev</surname>
<given-names>IA</given-names>
</name>
<name>
<surname>Iliasov</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Katsnelson</surname>
<given-names>MI</given-names>
</name>
<name>
<surname>Mazurenko</surname>
<given-names>VV</given-names>
</name>
</person-group>. <article-title>Multiscale Structural Complexity of Natural Patterns</article-title>. <source>Proc Natl Acad Sci USA</source> (<year>2020</year>) <volume>117</volume>:<fpage>30241</fpage>&#x2013;<lpage>51</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.2004976117</pub-id> </citation>
</ref>
<ref id="B36">
<label>36.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Feldman</surname>
<given-names>DP</given-names>
</name>
<name>
<surname>McTague</surname>
<given-names>CS</given-names>
</name>
<name>
<surname>Crutchfield</surname>
<given-names>JP</given-names>
</name>
</person-group>. <article-title>The Organization of Intrinsic Computation: Complexity-Entropy Diagrams and the Diversity of Natural Information Processing</article-title>. <source>Chaos: Interdiscip J&#x20;Nonlinear Sci</source> (<year>2008</year>) <volume>18</volume>:<fpage>043106</fpage>. <pub-id pub-id-type="doi">10.1063/1.2991106</pub-id> </citation>
</ref>
<ref id="B37">
<label>37.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rosso</surname>
<given-names>OA</given-names>
</name>
<name>
<surname>Craig</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Moscato</surname>
<given-names>P</given-names>
</name>
</person-group>. <article-title>Shakespeare and Other English Renaissance Authors as Characterized by Information Theory Complexity Quantifiers</article-title>. <source>Physica A: Stat Mech its Appl</source> (<year>2009</year>) <volume>388</volume>:<fpage>916</fpage>&#x2013;<lpage>26</lpage>. <pub-id pub-id-type="doi">10.1016/j.physa.2008.11.018</pub-id> </citation>
</ref>
<ref id="B38">
<label>38.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Kelsen</surname>
<given-names>H</given-names>
</name>
</person-group>. <source>Reine Rechtslehre: Mit einem Anhang: Das Problem der Gerechtigkeit</source>. <publisher-loc>T&#xfc;bingen</publisher-loc>: <publisher-name>Mohr Siebeck</publisher-name> (<year>2017</year>). <pub-id pub-id-type="doi">10.33196/9783704683991</pub-id> </citation>
</ref>
<ref id="B39">
<label>39.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>General</surname>
<given-names>S</given-names>
</name>
</person-group>. <source>Guide de l&#xe9;gistique (3 &#xe9;dition mise &#xe0; jour 2017)</source>. <edition>3 edn</edition>. <publisher-loc>France</publisher-loc>: <publisher-name>La documentation Fran&#xe7;aise</publisher-name> (<year>2017</year>).</citation>
</ref>
<ref id="B40">
<label>40.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Koehn</surname>
<given-names>P</given-names>
</name>
</person-group>. <source>Conference Proceedings: The Tenth Machine Translation Summit. AAMT</source>. <publisher-loc>Phuket, Thailand</publisher-loc>: <publisher-name>AAMT</publisher-name> (<year>2005</year>). p. <fpage>79</fpage>&#x2013;<lpage>86</lpage>. <article-title>Europarl: A Parallel Corpus for Statistical Machine Translation.</article-title> </citation>
</ref>
</ref-list>
</back>
</article>