<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Archiving and Interchange DTD v2.3 20070202//EN" "archivearticle.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="methods-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Polit. Sci.</journal-id>
<journal-title>Frontiers in Political Science</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Polit. Sci.</abbrev-journal-title>
<issn pub-type="epub">2673-3145</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpos.2023.1268320</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Political Science</subject>
<subj-group>
<subject>Methods</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Advantages and pitfalls of machine translation for party research: the translation of party manifestos of European parties using <italic>DeepL</italic></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Plenter</surname> <given-names>Johanna Ida</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/2384659/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/conceptualization/"/>
<role content-type="https://credit.niso.org/contributor-roles/formal-analysis/"/>
<role content-type="https://credit.niso.org/contributor-roles/methodology/"/>
<role content-type="https://credit.niso.org/contributor-roles/visualization/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-original-draft/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-review-editing/"/>
</contrib>
</contrib-group>
<aff><institution>Department of Social Sciences, Heinrich Heine University D&#x000FC;sseldorf</institution>, <addr-line>D&#x000FC;sseldorf</addr-line>, <country>Germany</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Felix Ettensperger, University of Freiburg, Germany</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Jeremy Dodeigne, University of Namur, Belgium; Michele Scotto Di Vettimo, University of Exeter, United Kingdom</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Johanna Ida Plenter <email>johanna.plenter&#x00040;hhu.de</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>22</day>
<month>11</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>5</volume>
<elocation-id>1268320</elocation-id>
<history>
<date date-type="received">
<day>27</day>
<month>07</month>
<year>2023</year>
</date>
<date date-type="accepted">
<day>31</day>
<month>10</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2023 Plenter.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Plenter</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Parties are the central actors in representative democracies as they perform important democratic functions. Thus, the identification of party positions is a crucial concern. Party researchers mainly rely on party manifestos to estimate policy positions. However, the analysis of manifestos is accompanied by challenges&#x02014;one of the biggest being cross-national comparisons because of different institutional settings and languages. This article discusses machine translation (MT) as a new option for party research, and reports on the author&#x00027;s experiences with the translation of more than 200 party manifestos using the commercial artificial intelligence (AI) translation tool <italic>DeepL</italic>. To make this approach widely applicable, the (technical) procedure, including its problems and workarounds for large-scale projects, is presented as a step-by-step guide using R. Additionally, drawing on the most recent German, Estonian, Italian and Polish parliamentary election manifestos this article evaluates the quality of the <italic>DeepL</italic> translations by applying both back translation and Wordfish analyses. The main findings indicate that <italic>DeepL</italic> offers high-quality translations as more than 90% of the checked sentences are reproduced word-for-word or at least synonymously and with stable positioning on the left-right scale of both original and English translation. The results have greater implications for political science research as they speak to the reliability of machine translation for political texts.</p></abstract>
<kwd-group>
<kwd>party manifestos</kwd>
<kwd>machine translation</kwd>
<kwd>text as data</kwd>
<kwd><italic>DeepL</italic></kwd>
<kwd>translation quality</kwd>
</kwd-group>
<counts>
<fig-count count="2"/>
<table-count count="2"/>
<equation-count count="0"/>
<ref-count count="40"/>
<page-count count="9"/>
<word-count count="7356"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Political Science Methodologies</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1 Introduction</title>
<p>Parties are the central actors in representative democracies and electoral competition as they perform important democratic functions such as aggregating and articulating citizens&#x00027; interests. To understand electoral processes or estimate party responsiveness, the identification of party positions is a crucial concern in electoral and party research&#x02014;in fact, it has become a subdiscipline in its own right (Laver, <xref ref-type="bibr" rid="B21">2001</xref>). There is a wide range of methodological approaches and types of data to determine party positions: they can be estimated with the help of expert or mass surveys, legislative voting behavior, media analyses, or based on texts (for a detailed discussion, see Mair, <xref ref-type="bibr" rid="B29">2001</xref>). Within the text-based methods, a further distinction can be made between quantitative and qualitative analysis methods. In addition, text analyses can draw on various types of text data, e.g., parliamentary or candidate speeches (Lauderdale and Herzog, <xref ref-type="bibr" rid="B20">2016</xref>; Atzpodien, <xref ref-type="bibr" rid="B1">2020</xref>) as well as policy papers or coalition agreements (Benoit et al., <xref ref-type="bibr" rid="B3">2005</xref>; Gross and Krauss, <xref ref-type="bibr" rid="B13">2021</xref>). More frequently, however, election or party manifestos<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref> are relied upon to estimate parties&#x00027; policy positions (Slapin and Proksch, <xref ref-type="bibr" rid="B33">2008</xref>; Volkens et al., <xref ref-type="bibr" rid="B37">2013</xref>; Br&#x000E4;uninger et al., <xref ref-type="bibr" rid="B5">2020</xref>).</p>
<p>Election manifestos are so well suited for an analysis of party positions as they are a fairly reliable and readily available data source. However, their analysis is also accompanied by challenges&#x02014;one of the biggest being cross-national comparison because of the multitude of languages (Lucas et al., <xref ref-type="bibr" rid="B27">2015</xref>, p. 255). Especially if the manifestos will be coded manually, either native-language coders/analysts have to be hired or the texts have to be translated.<xref ref-type="fn" rid="fn0002"><sup>2</sup></xref> When it comes to translation, there are two options: professional translators or machine translation (MT). The latter method and its pros and cons will be examined in this article to identify opportunities and obstacles for the estimation of party and policy positions based on automatically translated manifestos. The article primarily concerns the MT tool <italic>DeepL</italic> and based on manifestos in four European languages it evaluates its translation quality using back translation and Wordfish analyses. The main results of this evaluation seem to confirm the company&#x00027;s promise of excellent quality: first, the back translation shows that more than 90% of the checked sentences are reproduced word-for-word or at least synonymously. Second, the Wordfish analysis shows that the positioning of the automatic translation on the left-right scale is very close to that of the original text indicating that the translation did not change the tone, content, and political implications.</p>
<p>The remaining article is structured as follows: first, I discuss current developments and state-of-the-art MT methods to outline their benefits and shortcomings compared to human translation. Afterwards, I exemplify this by presenting a step-by-step guide to implementing MT with the commercial artificial intelligence (AI) tool <italic>DeepL</italic><xref ref-type="fn" rid="fn0003"><sup>3</sup></xref> for large-scale projects. The added value of this section is that an evaluation of the translations is carried out to assess their quality as an attempt to open the black box of the AI machine translation algorithm. The subsequent conclusion summarizes the results and evaluates them against the backdrop of existing limitations.</p></sec>
<sec id="s2">
<title>2 Machine translation: state-of-the-art methods in comparison to human translation</title>
<p>In translation studies, machine translation is defined as &#x0201C;the automatic conversion of text from one natural language to another&#x0201D; and it is the &#x0201C;process in which the interlingual conversion of text is carried out by a machine, even if the proper functioning of that machine relies on human labor before, during or after run time&#x0201D; (Kenny, <xref ref-type="bibr" rid="B17">2019b</xref>, p. 305&#x02013;306). Even though machine translation has been used for several decades, the technical developments of the past years have led to a great surge of innovation, for example, the development of end-to-end neural machine translation (NMT), which has become the state-of-the-art method (Tan et al., <xref ref-type="bibr" rid="B35">2020</xref>, p. 5). Neural machine translation is what is meant when we generally talk about translation with the help of AI or deep learning. It is neither the aim nor within the scope of this article to trace the development stages and technical aspects of MT,<xref ref-type="fn" rid="fn0004"><sup>4</sup></xref> but rather to discuss its advantages for political science and especially party research. Nevertheless, it is worth briefly outlining its simplified functionality: essentially, NMT works in the interaction of source to target language, in that the neural network predicts the translation word by word using word embeddings, i.e., the spatial representation of words that exhibit the semantic relationships among them. With increasing amounts of training data, the network &#x0201C;learns&#x0201D; and, thus, improves the quality of the translations. However, for commercial MT algorithms it is difficult to assess robustness and interpretability because these models are essentially black boxes as we know neither their training data nor coding. From Tan et al. (<xref ref-type="bibr" rid="B35">2020</xref>, p. 16), it can be concluded that &#x0201C;noisy inputs&#x0201D; i.e., erroneous spelling or incorrect usage of words, are a particular problem to the translation&#x00027;s robustness. Yet, it can be assumed that manifestos and other official documents published by political parties are likely to be almost free of errors, as they are usually well-curated and edited. This is one reason why, for translated text data, manifestos are superior to e.g., parliamentary speeches.</p>
<p>Machine translation has some decisive advantages and disadvantages compared to &#x0201C;traditional&#x0201D; translation by humans. First, one of the most severe disadvantages is undoubtedly the fact that AI has no feeling for language and does not register sensitive language usage&#x02014;especially if context is lacking. Examples of this are irony and sarcasm as well as idioms or (political) expressions established at a certain point in time, such as the NATO&#x00027;s 2% defense investment guideline, which in German is commonly referred to as the &#x0201C;Zwei-Prozent-Ziel&#x0201D; (literal translation: &#x0201C;two-percent-target&#x0201D;). As it is currently discussed, the term is well-known among policy-makers, the media, and the informed public, but 20 years from now that might not be the case&#x02014;making the term meaningless. Related to this point is a second drawback of MT, namely its handling of language change or evolution. That languages change over time is a normal and inherently unproblematic process. For MT, however, semantic change in particular can be a difficulty because the usage and connotation of certain terms change. In English, for example, the use of the word &#x0201C;gay&#x0201D; has changed at least three times (Lalor and Rendle-Short, <xref ref-type="bibr" rid="B19">2007</xref>, p. 148; Shi and Lei, <xref ref-type="bibr" rid="B32">2020</xref>, p. 35). Accordingly, the expression was first used with a positive connotation as a synonym for &#x0201C;jolly&#x0201D; or &#x0201C;happy&#x0201D;, this changed to the neutral meaning &#x0201C;homosexual&#x0201D;, and more recently to a negatively connoted expression for &#x0201C;boring&#x0201D; or &#x0201C;lame&#x0201D;. A similar development can be observed with regard to the &#x0201C;N-word&#x0201D; and the linguistic representation of Black people in general (Washington, <xref ref-type="bibr" rid="B38">2023</xref>). Even though MT&#x00027;s handling of such cases has not yet been systematically evaluated in the literature, it can be assumed that incorrect or offending translations may occur&#x02014;especially if the corpus spans several decades, i.e., potentially across semantic changes. As human translation still marks the gold standard, it can be expected that professionally trained translators better capture such language use because of their context knowledge. Bizzoni and Lapshinova-Koltunski (<xref ref-type="bibr" rid="B4">2021</xref>), however, find that translations from different translators are stylistically quite heterogeneous. This implies that the translation quality is highly dependent on the individual and their language and content knowledge. However, further research that investigates the comparability of different human translators is needed. Alternatively, computational linguistics research discusses the possibility of combining MT with human translation in order to use &#x0201C;the best of both worlds&#x0201D; (Li et al., <xref ref-type="bibr" rid="B25">2023</xref>, p. 9511; Pe&#x000F1;a Aguilar, <xref ref-type="bibr" rid="B30">2023</xref>). In this approach, human translations are fed into the training dataset of the MT algorithm to improve its quality. Accordingly, this approach requires access to the MT algorithm, which&#x02014;at least for commercial software&#x02014;is usually not available. Lastly, there may also be quite mundane obstacles when using machine translation, e.g., the preferred MT does not offer all languages or is not available in a country. The opacity of AI tools described above also prevents outsiders from assessing how well which tool has been trained with which inputs and, above all, languages and language combinations. Since, however, these obstacles cannot be eliminated by the researcher(s), they must at least be considered as a limitation in a critical reflection.</p>
<p>On the other hand, one extremely significant advantage of MT compared to human translation is its resource efficiency. This is to be understood in two respects: for one, machine translation requires significantly less time, and for another, financial resources. This fact makes MT particularly interesting for research projects that have to get by without generous funding and/or a large team and must be completed within a foreseeable schedule. Another advantage of machine translation lies in the deep-learning approach of neural translation tools as with every translation, i.e., data input, they are continuously learning and improving. Consequently, MT algorithms draw on a constantly growing body of knowledge, whereas typically only one translator is hired per document, meaning that translation quality depends on that person.</p>
<p>In summary, when deciding between machine and human translation, the potentially poorer or more error-prone translation quality must be weighed against the significant time and money savings of real-time translation. For large corpora, human translation is just not feasible because of the sheer amount of text. In addition, the type of text to be translated and the time frame in which it was created are also decisive. For example, irony and sarcasm, which can be hard to grasp for a MT algorithm, play a much greater role in political speeches than in press releases and party manifestos. If the time period in which the texts were created spans several decades, it should also be reflected to what extent language change could influence the results. Moreover, Reber (<xref ref-type="bibr" rid="B31">2019</xref>, p. 118) points out that the &#x0201C;choice of method&#x0201D;, i.e., full-text translation vs. translation of individual words/expressions, must be considered because MT algorithms rely on the context in which a word is used. He concludes that the translation of entire documents is more accurate and should, thus, be the first choice. Accordingly, a blanket recommendation for one or the other does not make sense; however, especially given further development, training, and evaluation of MT, it is expected that this method will become more and more established in (political science) research.</p></sec>
<sec id="s3">
<title>3 Machine translation with <italic>DeepL</italic></title>
<p>Unsurprisingly, machine translation has already been applied in political and communication science research and has also been evaluated (i.e., Lucas et al., <xref ref-type="bibr" rid="B27">2015</xref>; de Vries et al., <xref ref-type="bibr" rid="B8">2018</xref>; D&#x000FC;pont and Rachuj, <xref ref-type="bibr" rid="B11">2022</xref>). Furthermore, Lucas et al. (<xref ref-type="bibr" rid="B28">2018</xref>) provided the R package <italic>translateR</italic>, which enables translation with API integration. All of these applications have used Google Translate for the translation.<xref ref-type="fn" rid="fn0005"><sup>5</sup></xref> This article, in contrast, hereafter presents the application of the commercial AI <italic>DeepL</italic> and discusses the advantages of this MT tool. For this purpose, I will report on my experiences with translating current European manifestos. In addition to manifestos, there are many other possible applications of MT in party research, such as party statutes, social media posts, or speeches. However, the above-mentioned considerations and potential shortcomings must be assessed for different types of textual data. Before presenting the procedure, first, some brief explanations about <italic>DeepL</italic>.</p>
<p><italic>DeepL</italic> is a translation AI by the same-named German company, which has been available since 2017 for an initial seven (exclusively European) languages. At the time of writing (July 2023), <italic>DeepL</italic> offers translations for 31 languages, which, according to the company&#x00027;s statements, significantly exceed the quality of other machine translations. This quality is said to have been evaluated both by &#x0201C;scientific tests&#x0201D; and &#x0201C;external professional translators&#x0201D; (DeepL, <xref ref-type="bibr" rid="B9">2022</xref>). However, these claims are difficult to verify because the sources of the tests and evaluations are neither published, provided, nor cited on the company&#x00027;s homepage. Yet, several technology magazines and media outlets generally confirm these statements (i.e., Coldewey and Lardinois, <xref ref-type="bibr" rid="B6">2017</xref>; Wyndham, <xref ref-type="bibr" rid="B39">2021</xref>). Additionally, first academic assessments seem to confirm the good translation quality. Hidalgo-Ternero (<xref ref-type="bibr" rid="B15">2020</xref>, p. 170) compares Google Translate and <italic>DeepL</italic> translations of Spanish idioms and concludes: &#x0201C;[O]verall, DeepL slightly outperforms Google Translate [&#x02026;][as] the global results exhibit an accuracy rate of 70% [&#x02026;] for Google Translate and 78% [&#x02026;] for DeepL.&#x0201D; Also focusing on the Spanish-English language combination, Pe&#x000F1;a Aguilar (<xref ref-type="bibr" rid="B30">2023</xref>) finds that <italic>DeepL</italic> outperforms both Bing and Google Translate. Lastly, in his comparison of Google Translate and <italic>DeepL</italic>, Reber (<xref ref-type="bibr" rid="B31">2019</xref>, p. 117) concludes that both tools perform equally well for full-text translation. For my research, I chose <italic>DeepL</italic> partly because of this said translation quality. More importantly however, it was decisive that <italic>DeepL</italic>&#x02014;in contrast e.g., to Google Translate or Amazon Translate&#x02014;can process plain text files (txt format), which are the standard file format especially in scientific quantitative text analysis.</p>
<p><italic>DeepL</italic> offers multiple usage options tailored to different needs. For one, there are both free and paid subscriptions; for another, <italic>DeepL</italic> features simple text translation in the web or app translator, translation of entire files and API integration. The <italic>DeepL</italic> API can also be integrated into R using the <italic>deeplr</italic>-package (Zumbach and Bauer, <xref ref-type="bibr" rid="B40">2021</xref>), which is a good alternative to text or file translation, in particular when the text to be translated is not exceedingly long. Up to 500,000 characters per month (&#x0007E;280 standard pages) can be translated free of charge; in addition to a monthly fee, API Pro is billed according to actual consumption. So before deciding for or against API translation, it should be calculated how many characters the text to be translated comprises. In the following, I will concentrate exclusively on file translation which is offered for six file formats (docx, htm, html, pdf, pptx, txt), but as this guideline aims to make the translated manifestos accessible for (automated) text analysis, which requires machine-readable data, only text files (txt) are considered here. The code provided in the <xref ref-type="supplementary-material" rid="SM1">Supplementary material</xref>, however, also includes API translation (see Reber, <xref ref-type="bibr" rid="B31">2019</xref> for an exemplary application of the <italic>DeepL</italic> API).</p>
<p>Summarizing, again no general and universally valid recommendation can be made with regard to the selection of the MT tool. Rather, the decisive factors are the volume of the text data to be translated, the file format in which it is available, and the analyses that are to be performed following translation. For the aforementioned reasons, however, this article focuses only on <italic>DeepL</italic>.</p>
<sec>
<title>3.1 Step-by-step implementation with R</title>
<p>Turning to the data preparation for translation: after collecting all manifestos (or other documents) to be translated, in a second step, the files have to be converted to txt. One of the biggest challenges of file conversion, however, is that manifestos&#x02014;more than party statutes or speech manuscripts&#x02014;often exhibit an elaborate layout or include figures, pictures, quotes, and tables. Text recognition and extraction are particularly challenging when the text is set in columns. When converting such a file from pdf to txt, a lot of text can get lost or mixed up as it is read in lines and not columns. It would therefore be highly desirable if parties made all of their communication available as plain text documents and if database projects such as the Manifesto Project or OPTED<xref ref-type="fn" rid="fn0006"><sup>6</sup></xref> were further strengthened and funded. The current version of the Manifesto Corpus (Lehmann et al., <xref ref-type="bibr" rid="B23">2023</xref>), available through the R package <italic>manifestoR</italic> (Lewandowski et al., <xref ref-type="bibr" rid="B24">2020</xref>), already contains more than 3,000 machine-readable manifestos, so no separate file conversion is necessary for these. For file conversion of the additional election programs, I used the <italic>tabulizer</italic>-package (Leeper, <xref ref-type="bibr" rid="B22">2018</xref>) because it is capable to recognize text set in columns (see RScript in the <xref ref-type="supplementary-material" rid="SM1">Supplementary material</xref> for instructions and code).</p>
<p>As a third step, all converted files should be checked to see whether the text recognition has worked without errors or whether corrections need to be made. It appears that non-embedded fonts and special characters (i.e., not UTF-8 encoded) are problematic because the AI may not identify them as text (see Lucas et al., <xref ref-type="bibr" rid="B27">2015</xref>, p. 256&#x02013;257 for a detailed discussion). Depending on the amount of unrecognized text, manual post-processing of the files is possible but time-consuming. For the translation, the converted files must now be exported from R as txt files. The file translation function of <italic>DeepL</italic> can process several files simultaneously in both the browser and the app so the actual translation happens in real time. After translation, it is advisable to again check whether the entire file/text has been correctly recognized and translated. <xref ref-type="fig" rid="F1">Figure 1</xref> below briefly illustrates the workflow step by step.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Schematic step-by-step workflow of the translation of text documents with <italic>DeepL</italic>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpos-05-1268320-g0001.tif"/>
</fig>
<p>Summarizing, it can be said that MT in general and <italic>DeepL</italic> in particular offer great benefits for party research, since the fast and cheap machine translation of e.g., manifestos makes it easy to analyze the entire text and its message. This enables statements about policy positions and issues that are not included in the Manifesto Project codebook or about the sentiments, i.e., framing, parties use. The biggest technical challenges in dealing with AI are certainly file format as well as the preparation and post-processing of the documents. Overall, the advantages outweigh the shortcomings and considering the self-learning infrastructure, it can be assumed that machine translation algorithms will further improve in the future.</p></sec>
<sec>
<title>3.2 Evaluation of <italic>DeepL</italic> translation quality</title>
<p>One of the biggest issues in translation studies is the assessment and evaluation of translation quality. Since the entire value of a translation hinges upon its quality, it is important to develop certain measurements and discuss their respective strengths and weaknesses. It should however be noted that the &#x0201C;concept of quality varies greatly, within and outside translation studies&#x0201D; (Colina, <xref ref-type="bibr" rid="B7">2008</xref>, p. 98), which is why it is essential to define the aim of the evaluation beforehand. In the context of political science research, it is crucial to determine to what extent the original and the translation correspond in terms of content. Word usage and tonality are central, especially when texts are used to determine party positions or framing. It can be argued that most political communication is drafted and tailored to the author&#x00027;s agenda irrespective of whether a party or a single candidate communicates. Thus, it is essential that the translation quality, i.e., the degree to which original and translation correspond, is extremely high. For this reason, in the following, I evaluate the quality of <italic>DeepL</italic> translations with two purely descriptive assessment approaches, namely back translation and Wordfish analyses. To this end, I draw on a subset of the Manifesto Corpus that comprises the most recent manifestos of parties from Germany, Estonia, Italy, and Poland. These four cases were selected for two reasons: first, the languages mainly spoken in these countries belong to different branches of the Indo-European or Uralic language family. While German as a Germanic language, Italian as a Romance language and Polish as a Slavic language are branches of the Indo-European language family, Estonian as a Finnic language belongs to the Uralic language family. Second, the four languages differ extremely in terms of their prevalence as measured by the number of native speakers. According to the Ethnologue, German is spoken as a native language by about 75 million people, Italian by 65 million, Polish by 40 million and Estonian by 1.2 million (Eberhard et al., <xref ref-type="bibr" rid="B12">2023</xref>). Thus, the evaluation of the translation quality does not only look at one language family or branch, but takes several combinations into account. Additionally, it concerns both very common (i.e., German&#x02014;English, Italian&#x02014;English) and rarer (i.e., Polish&#x02014;English, Estonian&#x02014;English) language combinations. It seems plausible to assume that the language prevalence is correlated to the number of data inputs, i.e., the training, of these language combinations (de Vries et al., <xref ref-type="bibr" rid="B8">2018</xref>, p. 419). However, since the algorithm is unknown to outsiders, this is purely speculative for <italic>DeepL</italic>.</p>
<p>The first evaluation uses back translation, also called re-translation, and only draws on the seven German manifestos. As the name indicates, back translation takes translated texts and translates them back to their source language to compare them with the original. Behr (<xref ref-type="bibr" rid="B2">2017</xref>) discusses the pros and cons of this method and concludes that back translation is very straightforward and, especially with machine translation, fast. However, the method lacks a clear conception of what is considered a mistake or &#x0201C;poor quality&#x0201D;. In her empirical analysis, Behr (<xref ref-type="bibr" rid="B2">2017</xref>, pp. 581&#x02013;582) finds that back translation &#x0201C;can successfully identify errors [&#x02026;]; however, most of these issues were identified by actual translation assessment as well&#x0201D;. She, therefore, concludes that this method should always be accompanied by other assessment approaches. Despite its shortcomings, back translation is still one of the most widely used methods for evaluating translation quality. For this reason, it is applied in this article as a first step.</p>
<p>To estimate the quality of the <italic>DeepL</italic> translations, I draw on the seven manifestos by German parties and re-translate them from English to German&#x02014;again using <italic>DeepL</italic>. For each of these manifestos, 80 sentences of the back translation are manually coded and compared to the original. The manual coding classifies each sentence as either verbatim back translation or synonymous back translation; in addition, obvious errors are also marked.</p>
<p>Looking at <xref ref-type="table" rid="T1">Table 1</xref>, the most central result shows that on average roughly 21% of the sentences were re-translated in verbatim form. Additionally, it seems as if the back translation produces reliable results with an average of only 3.4 mistakes per document. The translations were coded as faulty when there was an evident change in meaning or when a technical error, e.g., double translation of a word/phrase, occurred. Most of the sentences, however, have been re-translated as a paraphrase of the original which reflects the content synonymously. Remarkably, both the number of errors and the share of verbatim sentences vary quite significantly between the seven manifestos. One reason for this might be that the parties use sentences of different lengths and, thus, complexity. The probability of re-translating short sentences verbatim is higher than for long sentences. Nevertheless, it can be stated that based on this exemplary check of the manifestos of German parties, the <italic>DeepL</italic> translations can be assessed as being of high quality. On average, more than 90% of the sentences were re-translated either word-for-word or synonymously, indicating a good quality of the initial German-to-English-translation.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Results of the back translation of German manifestos.</p></caption>
<table frame="box" rules="all">
<thead>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<th valign="top" align="left"><bold>Party</bold></th>
<th valign="top" align="left"><bold>No. of errors</bold></th>
<th valign="top" align="center"><bold>Verbatim sentences (in %)</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">AfD</td>
<td valign="top" align="left">4</td>
<td valign="top" align="center">17.50</td>
</tr> <tr>
<td valign="top" align="left">CDU/CSU</td>
<td valign="top" align="left">3</td>
<td valign="top" align="center">25.00</td>
</tr> <tr>
<td valign="top" align="left">FDP</td>
<td valign="top" align="left">2</td>
<td valign="top" align="center">26.25</td>
</tr> <tr>
<td valign="top" align="left">SPD</td>
<td valign="top" align="left">4</td>
<td valign="top" align="center">20.00</td>
</tr> <tr>
<td valign="top" align="left">Greens</td>
<td valign="top" align="left">3</td>
<td valign="top" align="center">20.00</td>
</tr> <tr>
<td valign="top" align="left">SSW</td>
<td valign="top" align="left">3</td>
<td valign="top" align="center">18.75</td>
</tr> <tr>
<td valign="top" align="left">The Left</td>
<td valign="top" align="left">5</td>
<td valign="top" align="center">22.50</td>
</tr> <tr>
<td valign="top" align="left"><bold>Average</bold></td>
<td valign="top" align="left"><bold>3.4</bold></td>
<td valign="top" align="center"><bold>21.43</bold></td>
</tr></tbody>
</table>
<table-wrap-foot>
<p>The bold values are column averages.</p>
</table-wrap-foot>
</table-wrap>
<p>In order to check whether the back-translation changes the text content, a dictionary analysis was also performed. For this purpose, the seven German manifestos were compared both in the original and the back-translated version by searching the texts for four concepts: economy, state, environment, and social affairs. The results are presented in <xref ref-type="table" rid="T2">Table 2</xref>. Overall, the results confirm the good translation quality, as the number of hits in the original and the back translation are very similar. The largest deviations are found within the concept of social issues in the manifestos of the Greens (&#x0002B;9 hits) and the Left (&#x0002B;19 hits). In six cases, however, exactly the same number of hits was found in both documents. Therefore, the results of this small dictionary analysis can be interpreted as indicating that manifestos translated with the help of <italic>DeepL</italic> are suitable for empirical text analyses because they do not substantially change the text content.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Comparison of dictionary analysis of original and re-translated German manifestos.</p></caption>
<table frame="box" rules="all">
<thead>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<th valign="top" align="left"><bold>Party</bold></th>
<th valign="top" align="left" colspan="2"><bold>Economy</bold></th>
<th valign="top" align="left" colspan="2"><bold>State</bold></th>
<th valign="top" align="center" colspan="2"><bold>Environment</bold></th>
<th valign="top" align="left" colspan="2"><bold>Social affairs</bold></th>
</tr>
</thead>
<tbody>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<td/>
<td valign="top" align="left"><bold>Original</bold></td>
<td valign="top" align="left"><bold>Back translation</bold></td>
<td valign="top" align="center"><bold>Original</bold></td>
<td valign="top" align="left"><bold>Back translation</bold></td>
<td valign="top" align="center"><bold>Original</bold></td>
<td valign="top" align="left"><bold>Back translation</bold></td>
<td valign="top" align="center"><bold>Original</bold></td>
<td valign="top" align="left"><bold>Back translation</bold></td>
</tr> <tr>
<td valign="top" align="left">AfD</td>
<td valign="top" align="left">17</td>
<td valign="top" align="left">20</td>
<td valign="top" align="center">24</td>
<td valign="top" align="left">27</td>
<td valign="top" align="center">8</td>
<td valign="top" align="left">6</td>
<td valign="top" align="center">33</td>
<td valign="top" align="left">30</td>
</tr> <tr>
<td valign="top" align="left">CDU/CSU</td>
<td valign="top" align="left">46</td>
<td valign="top" align="left">43</td>
<td valign="top" align="center">54</td>
<td valign="top" align="left">54</td>
<td valign="top" align="center">8</td>
<td valign="top" align="left">9</td>
<td valign="top" align="center">42</td>
<td valign="top" align="left">47</td>
</tr> <tr>
<td valign="top" align="left">FDP</td>
<td valign="top" align="left">25</td>
<td valign="top" align="left">27</td>
<td valign="top" align="center">37</td>
<td valign="top" align="left">37</td>
<td valign="top" align="center">6</td>
<td valign="top" align="left">6</td>
<td valign="top" align="center">21</td>
<td valign="top" align="left">20</td>
</tr> <tr>
<td valign="top" align="left">SPD</td>
<td valign="top" align="left">16</td>
<td valign="top" align="left">18</td>
<td valign="top" align="center">10</td>
<td valign="top" align="left">10</td>
<td valign="top" align="center">10</td>
<td valign="top" align="left">9</td>
<td valign="top" align="center">47</td>
<td valign="top" align="left">48</td>
</tr> <tr>
<td valign="top" align="left">Greens</td>
<td valign="top" align="left">42</td>
<td valign="top" align="left">42</td>
<td valign="top" align="center">45</td>
<td valign="top" align="left">49</td>
<td valign="top" align="center">28</td>
<td valign="top" align="left">33</td>
<td valign="top" align="center">78</td>
<td valign="top" align="left">87</td>
</tr> <tr>
<td valign="top" align="left">SSW</td>
<td valign="top" align="left">12</td>
<td valign="top" align="left">10</td>
<td valign="top" align="center">9</td>
<td valign="top" align="left">9</td>
<td valign="top" align="center">9</td>
<td valign="top" align="left">8</td>
<td valign="top" align="center">26</td>
<td valign="top" align="left">25</td>
</tr> <tr>
<td valign="top" align="left">The Left</td>
<td valign="top" align="left">51</td>
<td valign="top" align="left">57</td>
<td valign="top" align="center">21</td>
<td valign="top" align="left">20</td>
<td valign="top" align="center">25</td>
<td valign="top" align="left">28</td>
<td valign="top" align="center">206</td>
<td valign="top" align="left">225</td>
</tr></tbody>
</table>
</table-wrap>
<p>In contrast to the back-translation approach, the second assessment method looks at the manifestos as a whole. To accompany the back translations, a Wordfish estimation was conducted. Wordfish is an unsupervised scaling technique developed by Slapin and Proksch (<xref ref-type="bibr" rid="B33">2008</xref>) that positions texts onto a one-dimensional scale, e.g., the left-right scale. The model does not require reference texts or other previous information but instead uses word frequencies, thus the parties&#x00027; relative word usage, to place texts along this dimension and it assumes a Poisson distribution. To my knowledge, Wordfish has not yet been applied as a translation assessment method&#x02014;probably because it was not designed for this purpose. However, I argue that it can be useful for political texts in particular because all (party-) political communication is carefully drafted to convey only the intended message. Thus, I argue that the translation quality can be determined based on the correspondence or distance between the positions of the original and the translation. To assess the quality of the translation, I estimate and compare the relative positions of original (German/Estonian/Italian/Polish) and the translated manifestos (English). <xref ref-type="fig" rid="F2">Figure 2</xref> shows the results of the Wordfish placement of all manifestos.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Wordfish positioning of original and translated manifestos.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpos-05-1268320-g0002.tif"/>
</fig>
<p>At first glance, it is already apparent that the positioning of the originals and their translations vary in proximity depending on the language combination. While the combinations with Estonian, German and Italian largely produce highly congruent positionings, the Polish-English combination in particular seems to perform more poorly. Thus, the proximity of positions does not seem to depend on language family or language prevalence, i.e., number of native speakers. Overall, the result for all four combinations can be considered (very) satisfactory. The second important result of the review is that positional deviations can be observed for all languages and party families, both to the left and to the right. In each tested combination, about 50% of the translations deviate slightly to the left and 50% deviate slightly to the right&#x02014;regardless of whether the party itself is left or right. This indicates that there is no systematic left or right bias in the translations and/or language combinations.</p>
<p>One explanation for the deviations could be the significantly varying number of so-called document features in the respective languages. Due to linguistic differences between German, Polish, Estonian and Italian on the one hand and English on the other, this number is extremely reduced between original and translation. Consequently, the Wordfish algorithm has fewer word frequencies to rely on, implying that every feature becomes more influential. Overall, the Wordfish analysis confirms the results of the back-translation approach. As the placement of the original and translated manifesto is very similar, it can be concluded that the <italic>DeepL</italic> translation did not change the content of the manifestos and their political implications. Nevertheless, it must be mentioned at this point that neither back translation nor Wordfish allow a statement about the correspondence of the framing or the mood in the original and the translation.</p>
<p>The extent to which the results discussed here are transferable to other types of text depends on several factors and cannot be universally answered. On the one hand, the characteristics of the (political) text type play a role. In contrast to speeches, manifestos are considered &#x0201C;sober&#x0201D; (Hawkins and Castanho Silva, <xref ref-type="bibr" rid="B14">2018</xref>, p. 31), since they address a different audience and must represent the interests of the entire party. It is therefore plausible to assume that this characteristic, reflected e.g., in the use of irony in the text, influences translation quality. On the other hand, the length of the texts and within them the length and linguistic complexity of the sentences and words are also decisive. These factors typically vary between types of political texts (Tolochko and Boomgaarden, <xref ref-type="bibr" rid="B36">2019</xref>). For example, due to the character limitations alone, tweets are significantly less complex and exhibit different language usage than formal party communications&#x02014;a fact that can also affect translation quality. Thus, all of these features and distinctions must be considered when transferring machine translation to other textual data.</p></sec></sec>
<sec sec-type="conclusions" id="s4">
<title>4 Conclusion</title>
<p>The overall aim of this article was to discuss the benefits and pitfalls of machine translation, particularly using <italic>DeepL</italic>, for political science and party research. I present the process of automatically translating a large set of party manifestos with its difficulties and workarounds using R and <italic>DeepL</italic>. For large-scale cross-national studies, multilingual text data are a challenge that can be met using MT. The step-by-step guide presented here will assist other scholars with such projects, and the commercial translation AI <italic>DeepL</italic> offers a solution by providing high-quality translations for 31 languages. The most tedious challenge within the workflow is the pre-processing of files, i.e., the text extraction, as manifestos exhibit an increasingly sophisticated layout and have grown in length. In general, it can be stated that the more complex the layout, i.e, the typesetting, of the source file, the more difficult the text extraction. To provide further added value to the technical implementation, in a second step I tried to open the black box of the MT algorithm a bit to evaluate the quality of the machine-translated texts. After all, translation quality and thus the tool&#x00027;s scientific trustworthiness is of central importance, especially for the analysis of party positions and rhetoric. Therefore, I performed an exemplary evaluation of the <italic>DeepL</italic> translations both by re-translating a set of manifestos of German parties and by determining the relative position on the left-right scale of German, Estonian, Polish and Italian manifestos using Wordfish analyses. The results of this evaluation indicate that <italic>DeepL</italic> offers high-quality translations, which do not significantly change the content and the positioning. The back translation revealed that <italic>DeepL</italic> re-translates 90% of the sentences either word-for-word or at least synonymously with an average error rate of 3.4 errors per 80 sentences. The subsequent Wordfish analysis showed for all language combinations that the relative positions of original and translation are very close and that there is no systematic right-left bias. Nevertheless, it should not be forgotten that most MT algorithms are black boxes that cannot be fully opened by the evaluation carried out here. Both future research and further development of the algorithms would be necessary to make MT an integral part of social science research.</p>
<p>In contrast to the approach presented here, there are recent developments to process and analyze multilingual corpora in their original language. New transformer models such as BERT (Devlin et al., <xref ref-type="bibr" rid="B10">2019</xref>) or multilingual sentence embeddings (Licht, <xref ref-type="bibr" rid="B26">2023</xref>) were enabled by advances in computational linguistics in the field of large language models (LLMs) and provide valid and reliable results. Which of the two approaches should be chosen in each individual case depends centrally on how the text data are to be processed in the further course of the research project. The central advantage of translated text data is that, on the one hand, the entire corpus can be examined in one analysis, e.g., a machine learning algorithm, and, on the other hand, that the results can be inspected by one researcher and further examined, e.g., with manual coding. This paper contributes to the validation of analyses using translated text data, as the main findings show that MT&#x02014;and particularly <italic>DeepL</italic>&#x02014;produces reliable and trustworthy results. The importance of this contribution is also underlined by the fact that, according to de Vries et al. (<xref ref-type="bibr" rid="B8">2018</xref>, p. 418), many authors either simply assume that MT provides reliable results or do not consider this issue at all.</p>
<p>While this article concentrates on party manifestos and policy positions, MT can be valuable for other types of textual data and analyses as well. In fact, the amount and availability of (political) textual data such as press releases, speeches, newspaper articles, and social media posts is growing every day and MT is one way to make this data accessible for automated comparative research. The added value of machine translation in general and <italic>DeepL</italic> in particular is that the entire text, including its framing and rhetoric, is made accessible for analysis. This allows for broader research perspectives that cannot be covered, for example, by existing coding schemes such as the Manifesto Project.</p>
<p>Concluding, all of this leaves us with the scientific-ethical question of whether such a further &#x0201C;algorithmization&#x0201D; of (social science) research is desirable and a trend worth supporting. Particularly in light of open science initiatives, this development must be critically questioned. This applies all the more to commercial software and algorithms like <italic>DeepL</italic>. In his opinion commentary, Arthur Spirling (<xref ref-type="bibr" rid="B34">2023</xref>, p. 413) therefore claims: &#x0201C;The rush to involve such artificial-intelligence (AI) models in research is a problem. Their use threatens hard-won progress on research ethics and the reproducibility of results. Instead, researchers need to collaborate to develop open-source LLMs that are transparent and not dependent on a corporation&#x00027;s favors.&#x0201D; Consequently, it would be extremely desirable for companies to disclose their algorithms or for open-source MT tools to be developed that provide high-quality translations for a variety of language combinations. For such tools, it would then also be possible to combine human and machine translation&#x02014;an approach that promises significant quality improvements of up to 28% (Li et al., <xref ref-type="bibr" rid="B25">2023</xref>, p. 9512). Until such tools are available, however, I argue in this article that commercial AI tools such as <italic>DeepL</italic> are a good alternative. After all, MT tools can also contribute to making science more accessible and inclusive by enabling the analysis of countries that are not typically the focus of interest. Furthermore, machine translation enables scientists without large financial resources to participate in and contribute to the research on political parties.</p></sec>
<sec sec-type="data-availability" id="s5">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/<xref ref-type="sec" rid="s9">Supplementary material</xref>, further inquiries can be directed to the corresponding author.</p></sec>
<sec sec-type="author-contributions" id="s6">
<title>Author contributions</title>
<p>JP: Conceptualization, Formal analysis, Methodology, Visualization, Writing &#x02013; original draft, Writing &#x02013; review &#x00026; editing.</p></sec>
</body>
<back>
<sec sec-type="funding-information" id="s7">
<title>Funding</title>
<p>The author(s) declare financial support was received for the research, authorship, and/or publication of this article. The publication of this research was funded by the HHU Open Access Fund of the University and State Library D&#x000FC;sseldorf.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s8">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec sec-type="supplementary-material" id="s9">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fpos.2023.1268320/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fpos.2023.1268320/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.ZIP" id="SM1" mimetype="application/zip" xmlns:xlink="http://www.w3.org/1999/xlink"/></sec>
<fn-group>
<fn id="fn0001"><p><sup>1</sup>In this article, the expressions &#x0201C;election manifesto&#x0201D;, &#x0201C;election program&#x0201D;, and &#x0201C;party manifesto&#x0201D; are used interchangeably and synonymously. Existing differences between these terms (see Klingemann et al., <xref ref-type="bibr" rid="B18">1994</xref>, Chapter 2 for a discussion) are not relevant for this paper.</p></fn>
<fn id="fn0002"><p><sup>2</sup>If the data is to be processed quantitatively using machine learning and the results do not need to be validated or coded manually, recent developments in computational linguistics show that multilingual classifiers e.g. using multilingual sentence embeddings provide valid results (see e.g., Licht, <xref ref-type="bibr" rid="B26">2023</xref>). Accordingly, no translation is necessary for these types of applications.</p></fn>
<fn id="fn0003"><p><sup>3</sup><italic>DeepL</italic> is available at <ext-link ext-link-type="uri" xlink:href="http://www.deepl.com">www.deepl.com</ext-link>; corporate headquarters of DeepL SE are in Cologne, Germany.</p></fn>
<fn id="fn0004"><p><sup>4</sup>See Kenny (<xref ref-type="bibr" rid="B16">2019a</xref>) for a detailed discussion of the historical developments of machine translation and/or Tan et al. (<xref ref-type="bibr" rid="B35">2020</xref>) for a review of NMT developments, methods, and tools.</p></fn>
<fn id="fn0005"><p><sup>5</sup>With <italic>translateR</italic> you can choose between the Google Translate API and the Microsoft Translator API.</p></fn>
<fn id="fn0006"><p><sup>6</sup>For more information on the Manifesto Project see: <ext-link ext-link-type="uri" xlink:href="https://manifesto-project.wzb.eu/">https://manifesto-project.wzb.eu/</ext-link> and for OPTED see: <ext-link ext-link-type="uri" xlink:href="https://opted.eu">https://opted.eu</ext-link>.</p></fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Atzpodien</surname> <given-names>D. S.</given-names></name></person-group> (<year>2020</year>). <article-title>Party competition in migration debates: the influence of the AfD on party positions in German state parliaments</article-title>. <source>Ger. Polit.</source> <volume>31</volume>, <fpage>381</fpage>&#x02013;<lpage>398</lpage>. <pub-id pub-id-type="doi">10.1080/09644008.2020.1860211</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Behr</surname> <given-names>D.</given-names></name></person-group> (<year>2017</year>). <article-title>Assessing the use of back translation: the shortcomings of back translation as a quality testing method</article-title>. <source>Int. J. Soc. Res. Methodol.</source> <volume>20</volume>, <fpage>573</fpage>&#x02013;<lpage>584</lpage>. <pub-id pub-id-type="doi">10.1080/13645579.2016.1252188</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Benoit</surname> <given-names>K.</given-names></name> <name><surname>Laver</surname> <given-names>M.</given-names></name> <name><surname>Arnold</surname> <given-names>C.</given-names></name> <name><surname>Pennings</surname> <given-names>P.</given-names></name> <name><surname>Hosli</surname> <given-names>M. O.</given-names></name></person-group> (<year>2005</year>). <article-title>Measuring national delegate positions at the convention on the future of europe using computerized word scoring</article-title>. <source>Eur. Union Polit.</source> <volume>6</volume>, <fpage>291</fpage>&#x02013;<lpage>313</lpage>. <pub-id pub-id-type="doi">10.1177/1465116505054834</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Bizzoni</surname> <given-names>Y.</given-names></name> <name><surname>Lapshinova-Koltunski</surname> <given-names>E.</given-names></name></person-group> (<year>2021</year>). <article-title>&#x0201C;Measuring translationese across levels of expertise: are professionals more surprising than students?&#x0201D;</article-title> in <source>Proceedings of the 23rd Nordic Conference on Computational Linguistics</source>, eds S. Dobnik, and L. &#x000D8;vrelid (<publisher-loc>Link&#x000F6;ping</publisher-loc>: <publisher-name>Link&#x000F6;ping University Electronic Press</publisher-name>), <fpage>53</fpage>&#x02013;<lpage>63</lpage>.</citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Br&#x000E4;uninger</surname> <given-names>T.</given-names></name> <name><surname>Debus</surname> <given-names>M.</given-names></name> <name><surname>M&#x000FC;ller</surname> <given-names>J.</given-names></name> <name><surname>Stecker</surname> <given-names>C.</given-names></name></person-group> (<year>2020</year>). <source>Parteienwettbewerb in den deutschen Bundesl&#x000E4;ndern</source>. Wiesbaden: Springer VS. <pub-id pub-id-type="doi">10.1007/978-3-658-29222-5</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Coldewey</surname> <given-names>D.</given-names></name> <name><surname>Lardinois</surname> <given-names>F.</given-names></name></person-group> (<year>2017</year>). <source>DeepL Schools Other Online Translators with Clever Machine Learning</source>. <publisher-loc>San Francisco, CA</publisher-loc>: <publisher-name>TechCrunch</publisher-name>.</citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Colina</surname> <given-names>S.</given-names></name></person-group> (<year>2008</year>). <article-title>Translation quality evaluation: empirical evidence for a functionalist approach</article-title>. <source>Translator</source> <volume>14</volume>, <fpage>97</fpage>&#x02013;<lpage>134</lpage>. <pub-id pub-id-type="doi">10.1080/13556509.2008.10799251</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>de Vries</surname> <given-names>E.</given-names></name> <name><surname>Schoonvelde</surname> <given-names>M.</given-names></name> <name><surname>Schumacher</surname> <given-names>G.</given-names></name></person-group> (<year>2018</year>). <article-title>No longer lost in translation: evidence that google translate works for comparative bag-of-words text applications</article-title>. <source>Polit. Anal.</source> <volume>26</volume>, <fpage>417</fpage>&#x02013;<lpage>430</lpage>. <pub-id pub-id-type="doi">10.1017/pan.2018.26</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="web"><person-group person-group-type="author"><collab>DeepL</collab></person-group> (<year>2022</year>). <source>Presseinformationen</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.deepl.com/press.html">https://www.deepl.com/press.html</ext-link> (accessed March 11, 2022).</citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Devlin</surname> <given-names>J.</given-names></name> <name><surname>Chang</surname> <given-names>M.-W.</given-names></name> <name><surname>Lee</surname> <given-names>K.</given-names></name> <name><surname>Toutanova</surname> <given-names>K.</given-names></name></person-group> (<year>2019</year>). <article-title>&#x0201C;BERT: pre-training of deep bidirectional transformers for language understanding,&#x0201D;</article-title> in <source>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)</source>, eds J. Burstein, C. Doran, and T. Solorio (Stroudsburg, PA: Association for Computational Linguistics), <fpage>4171</fpage>&#x02013;<lpage>4186</lpage>.</citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>D&#x000FC;pont</surname> <given-names>N.</given-names></name> <name><surname>Rachuj</surname> <given-names>M.</given-names></name></person-group> (<year>2022</year>). <article-title>The ties that bind: text similarities and conditional diffusion among parties</article-title>. <source>Br. J. Polit. Sci.</source> <volume>52</volume>, <fpage>613</fpage>&#x02013;<lpage>630</lpage>. <pub-id pub-id-type="doi">10.1017/S0007123420000617</pub-id></citation>
</ref>
<ref id="B12">
<citation citation-type="book"><person-group person-group-type="editor"><name><surname>Eberhard</surname> <given-names>D. M.</given-names></name> <name><surname>Simons</surname> <given-names>G. F.</given-names></name> <name><surname>Fennig</surname> <given-names>C. D. (eds)</given-names></name></person-group> (<year>2023</year>). <source>Ethnologue: Languages of the World: Twenty-sixth edition</source>. <publisher-loc>Dallas, TX</publisher-loc>: <publisher-name>SIL International</publisher-name>.</citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gross</surname> <given-names>M.</given-names></name> <name><surname>Krauss</surname> <given-names>S.</given-names></name></person-group> (<year>2021</year>). <article-title>Topic coverage of coalition agreements in multi-level settings: the case of Germany</article-title>. <source>Ger. Polit.</source> <volume>30</volume>, <fpage>227</fpage>&#x02013;<lpage>248</lpage>. <pub-id pub-id-type="doi">10.1080/09644008.2019.1658077</pub-id></citation>
</ref>
<ref id="B14">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hawkins</surname> <given-names>K. A.</given-names></name> <name><surname>Castanho Silva</surname> <given-names>B.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Textual analysis: big data approaches,&#x0201D;</article-title> in <source>The Ideational Approach to Populism: Concept, Theory, and Analysis</source>, eds K. A. Hawkins, R. E. Carlin, L. Littvay, and C. Rovira Kaltwasser (<publisher-loc>London</publisher-loc>: <publisher-name>Routledge</publisher-name>), <fpage>27</fpage>&#x02013;<lpage>48</lpage>. <pub-id pub-id-type="doi">10.4324/9781315196923-2</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hidalgo-Ternero</surname> <given-names>C. M.</given-names></name></person-group> (<year>2020</year>). <article-title>Google translate vs. DeepL: analysing neural machine translation performance under the challenge of phraseological variation</article-title>. <source>MonTI</source> 154&#x02212;177. <pub-id pub-id-type="doi">10.6035/MonTI.2020.ne6.5</pub-id></citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kenny</surname> <given-names>D.</given-names></name></person-group> (<year>2019a</year>). <article-title>&#x0201C;Machine translation,&#x0201D;</article-title> in <source>The Routledge Handbook of Translation and Philosophy</source>, eds P. Rawling and P. Wilson (London, NY: Routledge), <fpage>428</fpage>&#x02013;<lpage>445</lpage>. <pub-id pub-id-type="doi">10.4324/9781315678481-27</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kenny</surname> <given-names>D.</given-names></name></person-group> (<year>2019b</year>). <article-title>&#x0201C;Machine translation,&#x0201D;</article-title> in <source>Routledge Encyclopedia of Translation Studies</source>, eds M. Baker, and G. Saldanha (<publisher-loc>London</publisher-loc>: <publisher-name>Routledge</publisher-name>), <fpage>305</fpage>&#x02013;<lpage>310</lpage>. <pub-id pub-id-type="doi">10.4324/9781315678627-65</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Klingemann</surname> <given-names>H.-D.</given-names></name> <name><surname>Hofferbert</surname> <given-names>R. I.</given-names></name> <name><surname>Budge</surname> <given-names>I.</given-names></name></person-group> (<year>1994</year>). <source>Parties, Policies, and Democracy</source>. <publisher-loc>Boulder</publisher-loc>: <publisher-name>Westview Press</publisher-name>.</citation>
</ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lalor</surname> <given-names>T.</given-names></name> <name><surname>Rendle-Short</surname> <given-names>J.</given-names></name></person-group> (<year>2007</year>). <article-title>&#x02018;That&#x00027;s So gay&#x00027;: a contemporary use of gay in Australian English</article-title>. <source>Aust. J. Linguist.</source> <volume>27</volume>, <fpage>147</fpage>&#x02013;<lpage>173</lpage>. <pub-id pub-id-type="doi">10.1080/07268600701522764</pub-id></citation>
</ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lauderdale</surname> <given-names>B. E.</given-names></name> <name><surname>Herzog</surname> <given-names>A.</given-names></name></person-group> (<year>2016</year>). <article-title>Measuring political positions from legislative speech</article-title>. <source>Polit. Anal.</source> <volume>24</volume>, <fpage>374</fpage>&#x02013;<lpage>394</lpage>. <pub-id pub-id-type="doi">10.1093/pan/mpw017</pub-id></citation>
</ref>
<ref id="B21">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Laver</surname> <given-names>M.</given-names></name></person-group> ed. (<year>2001</year>). <source>Estimating the Policy Position of Political Actors</source>. <publisher-loc>London</publisher-loc>: <publisher-name>Routledge</publisher-name>.</citation>
</ref>
<ref id="B22">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Leeper</surname> <given-names>T. J.</given-names></name></person-group> (<year>2018</year>). <source>tabulizer: Bindings for Tabula PDF Table Extractor Library: R package version 0.2.2</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://github.com/ropensci/tabulizer">https://github.com/ropensci/tabulizer</ext-link></citation>
</ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lehmann</surname> <given-names>P.</given-names></name> <name><surname>Franzmann</surname> <given-names>S.</given-names></name> <name><surname>Burst</surname> <given-names>T.</given-names></name> <name><surname>Lewandowski</surname> <given-names>J.</given-names></name> <name><surname>Matthie&#x000DF;</surname> <given-names>T.</given-names></name> <name><surname>Regel</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2023</year>). <source>Manifesto Corpus. Version: 2023-1</source>. Berlin.</citation>
</ref>
<ref id="B24">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Lewandowski</surname> <given-names>J.</given-names></name> <name><surname>Merz</surname> <given-names>N.</given-names></name> <name><surname>Regel</surname> <given-names>S.</given-names></name></person-group> (<year>2020</year>). <source>manifestoR: Access and Process Data and Documents of the Manifesto Project: R package version 1.5.0</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://github.com/ManifestoProject/manifestoR">https://github.com/ManifestoProject/manifestoR</ext-link></citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>Z.</given-names></name> <name><surname>Qu</surname> <given-names>L.</given-names></name> <name><surname>Cohen</surname> <given-names>P.</given-names></name> <name><surname>Tumuluri</surname> <given-names>R.</given-names></name> <name><surname>Haffari</surname> <given-names>G.</given-names></name></person-group> (<year>2023</year>). <article-title>&#x0201C;The best of both worlds: combining human and machine translations for multilingual semantic parsing with active learning,&#x0201D;</article-title> in <source>Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics</source> (Volume 1: Long Papers), eds A. Rogers, J. Boyd-Graber, and N. Okazaki (Stroudsburg, PA: Association for Computational Linguistics), <fpage>9511</fpage>&#x02013;<lpage>9528</lpage>. <pub-id pub-id-type="doi">10.18653/v1/2023.acl-long.529</pub-id></citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Licht</surname> <given-names>H.</given-names></name></person-group> (<year>2023</year>). <article-title>Cross-lingual classification of political texts using multilingual sentence embeddings</article-title>. <source>Polit. Anal.</source> <volume>31</volume>, <fpage>366</fpage>&#x02013;<lpage>379</lpage>. <pub-id pub-id-type="doi">10.1017/pan.2022.29</pub-id></citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lucas</surname> <given-names>C.</given-names></name> <name><surname>Nielsen</surname> <given-names>R. A.</given-names></name> <name><surname>Roberts</surname> <given-names>M. E.</given-names></name> <name><surname>Stewart</surname> <given-names>B. M.</given-names></name> <name><surname>Storer</surname> <given-names>A.</given-names></name> <name><surname>Tingley</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>Computer-assisted text analysis for comparative politics</article-title>. <source>Polit. Anal.</source> <volume>23</volume>, <fpage>254</fpage>&#x02013;<lpage>277</lpage>. <pub-id pub-id-type="doi">10.1093/pan/mpu019</pub-id></citation>
</ref>
<ref id="B28">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Lucas</surname> <given-names>C.</given-names></name> <name><surname>Tingley</surname> <given-names>D.</given-names></name> <name><surname>Dehiya</surname> <given-names>V.</given-names></name></person-group> (<year>2018</year>). <source>translateR: R package version 2.0</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://github.com/ChristopherLucas/translateR">https://github.com/ChristopherLucas/translateR</ext-link></citation>
</ref>
<ref id="B29">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Mair</surname> <given-names>P.</given-names></name></person-group> (<year>2001</year>). <article-title>&#x0201C;Searching for the positions of political actors: a review of approaches and a critical evaluation of expert surveys,&#x0201D;</article-title> in <source>Estimating the Policy Position of Political Actors</source>, ed. M. Laver (<publisher-loc>London</publisher-loc>: <publisher-name>Routledge</publisher-name>), <fpage>10</fpage>&#x02013;<lpage>30</lpage>. <pub-id pub-id-type="doi">10.4324/9780203451656_chapter_2</pub-id></citation>
</ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pe&#x000F1;a Aguilar</surname> <given-names>A.</given-names></name></person-group> (<year>2023</year>). <article-title>Challenging machine translation engines: some Spanish-English linguistic problems put to the test</article-title>. <source>Cad. Trad</source>. <volume>43</volume>, <fpage>1</fpage>&#x02013;<lpage>26</lpage>. <pub-id pub-id-type="doi">10.5007/2175-7968.2023.e85397</pub-id></citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Reber</surname> <given-names>U.</given-names></name></person-group> (<year>2019</year>). <article-title>Overcoming language barriers: assessing the potential of machine translation and topic modeling for the comparative analysis of multilingual text corpora</article-title>. <source>Commun. Methods Meas.</source> <volume>13</volume>, <fpage>102</fpage>&#x02013;<lpage>125</lpage>. <pub-id pub-id-type="doi">10.1080/19312458.2018.1555798</pub-id></citation>
</ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shi</surname> <given-names>Y.</given-names></name> <name><surname>Lei</surname> <given-names>L.</given-names></name></person-group> (<year>2020</year>). <article-title>The evolution of LGBT labelling words: tracking 150 years of the interaction of semantics with social and cultural changes</article-title>. <source>Engl. Today</source> <volume>36</volume>, <fpage>33</fpage>&#x02013;<lpage>39</lpage>. <pub-id pub-id-type="doi">10.1017/S0266078419000270</pub-id></citation>
</ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Slapin</surname> <given-names>J. B.</given-names></name> <name><surname>Proksch</surname> <given-names>S.-O.</given-names></name></person-group> (<year>2008</year>). <article-title>A scaling model for estimating time-series party positions from texts</article-title>. <source>Am. J. Pol. Sci.</source> <volume>52</volume>, <fpage>705</fpage>&#x02013;<lpage>722</lpage>. <pub-id pub-id-type="doi">10.1111/j.1540-5907.2008.00338.x</pub-id></citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Spirling</surname> <given-names>A.</given-names></name></person-group> (<year>2023</year>). <article-title>Why open-source generative AI models are an ethical way forward for science</article-title>. <source>Nature</source> <volume>616</volume>, <fpage>413</fpage>. <pub-id pub-id-type="doi">10.1038/d41586-023-01295-4</pub-id><pub-id pub-id-type="pmid">37072520</pub-id></citation></ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tan</surname> <given-names>Z.</given-names></name> <name><surname>Wang</surname> <given-names>S.</given-names></name> <name><surname>Yang</surname> <given-names>Z.</given-names></name> <name><surname>Chen</surname> <given-names>G.</given-names></name> <name><surname>Huang</surname> <given-names>X.</given-names></name> <name><surname>Sun</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Neural machine translation: a review of methods, resources, and tools</article-title>. <source>AI Open</source> <volume>1</volume>, <fpage>5</fpage>&#x02013;<lpage>21</lpage>. <pub-id pub-id-type="doi">10.1016/j.aiopen.2020.11.001</pub-id></citation>
</ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tolochko</surname> <given-names>P.</given-names></name> <name><surname>Boomgaarden</surname> <given-names>H. G.</given-names></name></person-group> (<year>2019</year>). <article-title>Determining political text complexity: conceptualizations, measurements, and application</article-title>. <source>Int. J. Commun.</source> <volume>13</volume>, <fpage>1784</fpage>&#x02013;<lpage>1804</lpage>.</citation>
</ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="editor"><name><surname>Volkens</surname> <given-names>A.</given-names></name> <name><surname>Bara</surname> <given-names>J.</given-names></name> <name><surname>Budge</surname> <given-names>I.</given-names></name> <name><surname>McDonald</surname> <given-names>M. D.</given-names></name> <name><surname>Klingemann</surname> <given-names>H.-D.</given-names></name></person-group> (eds) (<year>2013</year>). <source>Mapping Policy Preferences from Texts: Statistical Solutions for Manifesto Analysts</source>. Oxford: Oxford University Press. <pub-id pub-id-type="doi">10.1093/acprof:oso/9780199640041.001.0001</pub-id></citation>
</ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Washington</surname> <given-names>A. R.</given-names></name></person-group> (<year>2023</year>). <article-title>Semantic and semiotic flows: examining variations and changes of &#x0201C;the N-Words&#x0201D; within an indexical field of dynamic meanings</article-title>. <source>Atl. Stud.</source> 1&#x02013;28. <pub-id pub-id-type="doi">10.1080/14788810.2023.2235204</pub-id> [Epub ahead of print].</citation>
</ref>
<ref id="B39">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wyndham</surname> <given-names>A.</given-names></name></person-group> (<year>2021</year>). <source>Inside DeepL: The World&#x00027;s Fastest-Growing, Most Secretive Machine Translation Company</source>. <publisher-loc>Zurich</publisher-loc>: <publisher-name>Slator AG</publisher-name>.</citation>
</ref>
<ref id="B40">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Zumbach</surname> <given-names>D.</given-names></name> <name><surname>Bauer</surname> <given-names>P. C.</given-names></name></person-group> (<year>2021</year>). <source>deeplr: Interface to the &#x00027;DeepL&#x00027; Translation API: R package version 2.0.0</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://github.com/zumbov2/deeplr">https://github.com/zumbov2/deeplr</ext-link></citation>
</ref>
</ref-list>
</back>
</article>