<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="2.3" xml:lang="EN">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Psychol.</journal-id>
<journal-title>Frontiers in Psychology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Psychol.</abbrev-journal-title>
<issn pub-type="epub">1664-1078</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpsyg.2023.1137038</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Psychology</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Probing sociodemographic influence on code-switching and language choice in Quebec with geolocation of tweets</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Kellert</surname>
<given-names>Olga</given-names>
</name>
<xref rid="c001" ref-type="corresp"><sup>&#x002A;</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/1373105/overview"/>
</contrib>
</contrib-group>
<aff><institution>Department of Romance Linguistics, University of G&#x00F6;ttingen</institution>, <addr-line>G&#x00F6;ttingen</addr-line>, <country>Germany</country></aff>
<author-notes>
<fn id="fn0001" fn-type="edited-by"><p>Edited by: Eirini Mavritsaki, Birmingham City University, United Kingdom</p></fn>
<fn id="fn0002" fn-type="edited-by"><p>Reviewed by: Christophe Coupe, The University of Hong Kong, Hong Kong SAR, China; Haroon N. Alsager, Prince Sattam Bin Abdulaziz University, Saudi Arabia</p></fn>
<corresp id="c001">&#x002A;Correspondence: Olga Kellert, <email>olga.kellert@phil.uni-goettingen.de</email></corresp>
<fn id="fn0003" fn-type="other"><p>This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>02</day>
<month>05</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>14</volume>
<elocation-id>1137038</elocation-id>
<history>
<date date-type="received">
<day>03</day>
<month>01</month>
<year>2023</year>
</date>
<date date-type="accepted">
<day>24</day>
<month>03</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2023 Kellert.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Kellert</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>This paper investigates the influence of the relative size of speech communities on language use in multilingual regions and cities. Due to peoples&#x2019; everyday mobility inside a city, it is still unclear whether the size of a population matters for language use on a sub-city scale. By testing the correlation between the size of a population and language use on various spatial scales, this study will contribute to a better understanding of the extent to which sociodemographic factors influence language use. The present study investigates two particular phenomena that are common to multilingual speakers, namely language mixing or Code-Switching and using multiple languages without mixing. Demographic information from a Canadian census will make predictions about the intensity of Code-Switching and language use by multilinguals in cities of Quebec and neighborhoods of Montreal. Geolocated tweets will be used to identify where these linguistic phenomena occur the most and the least. My results show that the intensity of Code-Switching and the use of English by bilinguals is influenced by the size of anglophone and francophone populations on various spatial scales such as the city level, land use level (city center vs. periphery of Montreal), and large urban zones on the sub-city level, namely the western and eastern urban zones of Montreal. However, the correlation between population figures and language use is difficult to measure and evaluate on a much smaller sub-urban scale such as the city block scale due to factors such as population figures missing from the census and people&#x2019;s mobility. A qualitative evaluation of language use on a small spatial scale seems to suggest that other social influences such as the location context or topic of discussion are much more important predictors for language use than population figures. Methods will be suggested for testing this hypothesis in future research. I conclude that geographic space can provide us information about the relation between language use in multilingual cities and sociodemographic factors such as a speech community&#x2019;s size and that social media is a valuable alternative data source for sociolinguistic research that offers new insights into the mechanisms of language use such as Code-Switching.</p>
</abstract>
<kwd-group>
<kwd>language contact</kwd>
<kwd>code-switching</kwd>
<kwd>bilingualism</kwd>
<kwd>Quebec</kwd>
<kwd>geolocation</kwd>
<kwd>Twitter</kwd>
</kwd-group>
<counts>
<fig-count count="14"/>
<table-count count="0"/>
<equation-count count="0"/>
<ref-count count="38"/>
<page-count count="14"/>
<word-count count="9573"/>
</counts>
</article-meta>
</front>
<body>
<sec id="sec1" sec-type="intro">
<label>1.</label>
<title>Introduction</title>
<p>Code-Switching or the mixing of multiple languages in a single conversation (abbreviated as CS) is a very well-known phenomenon of language contact. Examples (1) and (2) show CS within a sentence between English and French from <xref ref-type="bibr" rid="ref7">Cook (1991)</xref> and a tweet from Montreal. In both sentences, the switch occurs from French into English (henceforth CS-Engl.):</p>
<list list-type="order">
<list-item><p>J&#x2019;ai achet&#x00E9; <underline><italic>an american car</italic></underline>. &#x201C;I bought <underline>an American car</underline>&#x201D; (<xref ref-type="bibr" rid="ref7">Cook, 1991</xref>).</p></list-item>
<list-item><p>Nouveau caf&#x00E9; <underline><italic>in my hood</italic></underline>. &#x201C;New caf&#x00E9; <underline>in my hood</underline>&#x201D; (Twitter).</p></list-item>
</list>
<p>CS can also occur from English into French as shown in the following tweet in (3) from a bilingual photographer from Montreal (henceforth CS-French):</p>
<list list-type="order">
<list-item><p>So this guy is 1 TODAY!! Happy birthday C.! <underline><italic>Je ne savais pas quoi faire</italic></underline>. Soft and bright is what we needed. <underline><italic>Donc une vieille porte a fait l&#x2019;affaire</italic></underline>.</p>
<p>&#x201C;<underline>I did not know what to do</underline>. [&#x2026;]. <underline>An old door did the trick</underline>.&#x201D;</p></list-item>
</list>
<p>Bilingual or multilingual speakers can use two or more languages without necessarily using Code-Switching, as shown by examples (4) and (5) from the same bilingual user. This particular user is from Montreal and has posted 23 tweets in French and 22 in English.</p>
<list list-type="order">
<list-item><p>Notre devoir de citoyen est. fait! Allez voter, oubliez pas. @ xxxxx<xref rid="fn0004" ref-type="fn"><sup>1</sup></xref> High School.</p></list-item>
<list-item><p>Last year I got to see the mtlalouettes for the first time and promised myself to come back with my son (mtlalouettes is a football team from Montreal).</p></list-item>
</list>
<p>The influence factors on Code-Switching (CS) as in (1&#x2013;3) and on Language Choice of Bilinguals (LCB) as in (4) and (5) have been investigated from various perspectives, such as the structural (<xref ref-type="bibr" rid="ref29">Poplack, 1980</xref>; <xref ref-type="bibr" rid="ref7">Cook, 1991</xref>; <xref ref-type="bibr" rid="ref28">Myers-Scotton, 2002</xref>), social (<xref ref-type="bibr" rid="ref32">Schweda, 1980</xref>; <xref ref-type="bibr" rid="ref30">Poplack, 1985</xref>; <xref ref-type="bibr" rid="ref01">Fishman, 2000</xref>; <xref ref-type="bibr" rid="ref5">Bullock and Toribio, 2009</xref>; <xref ref-type="bibr" rid="ref8">Gardner-Chloros, 2009</xref>; <xref ref-type="bibr" rid="ref13">Holmes and Wilson, 2013</xref>; <xref ref-type="bibr" rid="ref37">Valenti, 2014</xref>), cognitive (<xref ref-type="bibr" rid="ref27">M&#x00FC;ller, 2017</xref>; <xref ref-type="bibr" rid="ref17">Kremin et al., 2021</xref>, among others), and discourse perspectives (<xref ref-type="bibr" rid="ref12">Gumperz and Dell, 1972</xref>; <xref ref-type="bibr" rid="ref16">Konidaris, 2004</xref>; <xref ref-type="bibr" rid="ref2">Auer, 2007</xref>; <xref ref-type="bibr" rid="ref3">Auer and Eastman, 2010</xref>). One important factor influencing CS and LCB is the relative size of speech communities (<xref ref-type="bibr" rid="ref32">Schweda, 1980</xref>; <xref ref-type="bibr" rid="ref30">Poplack, 1985</xref>), alongside other factors such as age, addressee, topic of discussion, and language attitude (see <xref ref-type="bibr" rid="ref9">Berisso Genemo, 2022</xref> for various factors). CS from French into English, for instance, has been shown to be more prominent in the city of Ottawa than in the city of Hull (<xref ref-type="bibr" rid="ref30">Poplack, 1985</xref>). This difference has been explained by the difference in the size of the anglophone population (<xref ref-type="bibr" rid="ref30">Poplack, 1985</xref>). Ottawa has a larger anglophone community than Hull. An intuitive explanation for this correlation is that the larger anglophone community in a city like Ottawa results in a higher amount of language use in English. This can affect the speakers of French from the same city in their language use by being exposed to a large amount of English use. Previous research has shown that frequency strongly affects language production (<xref ref-type="bibr" rid="ref36">Unsworth, 2016</xref>, among others).</p>
<p><xref ref-type="bibr" rid="ref32">Schweda (1980)</xref> has made an observation similar to that of <xref ref-type="bibr" rid="ref30">Poplack (1985)</xref> with respect to LCB. Bilingual speakers adapt to the local environment by choosing the language that is more often used in a given city or area. Bilinguals thus tend to use English more often than French in a city where more English than French is used (<xref ref-type="bibr" rid="ref32">Schweda, 1980</xref>). The intuitive explanation of this effect is that multilinguals adapt their language use to their local environment in a way similar to how they adapt their language use to their addressees (<xref ref-type="bibr" rid="ref16">Konidaris, 2004</xref>).</p>
<p>However, there are studies on LCB and CS that seem to contradict this intuitive correlation between the size of a language community and language use such as CS and LCB. <xref ref-type="bibr" rid="ref21">Lamarre et al. (2002)</xref> study shows that despite the Canadian census data from 2011 predicting more LCB (English) in the west of the island of Montreal than in the east (<xref ref-type="bibr" rid="ref33">Statistics Canada, 2011</xref>) because it shows a relatively higher number of anglophone speakers in the west than in the east (see <xref ref-type="bibr" rid="ref35">Timiou, 2014</xref> for visualization of <xref ref-type="bibr" rid="ref33">Statistics Canada, 2011</xref>), language use does not depend entirely on the geographic distribution of the population (<xref ref-type="bibr" rid="ref21">Lamarre et al., 2002</xref>). Bilinguals from Montreal use both languages independent of their location, especially in informal contexts such as on the street and in coffee houses (<xref ref-type="bibr" rid="ref21">Lamarre et al., 2002</xref>). This is unexpected given the results from previous studies based on different methodologies ranging from tweet analysis to picture analysis of street signs, which show a clear geographic separation of languages in Montreal, indicating more English in the west and more French in the east (<xref ref-type="bibr" rid="ref4">Bouchard, 2000</xref>; <xref ref-type="bibr" rid="ref22">Laur, 2003</xref>; <xref ref-type="bibr" rid="ref34">Termote, 2003</xref>; <xref ref-type="bibr" rid="ref26">Mocanu et al., 2013</xref>; <xref ref-type="bibr" rid="ref23">Leimgruber and Fern&#x00E1;ndez-Mallat, 2021</xref>). However, the latter studies did not investigate language use by <italic>bilinguals</italic> as the authors in <xref ref-type="bibr" rid="ref21">Lamarre et al. (2002)</xref> study did, which might explain the difference in the results. One possible explanation for the conflicting results in previous studies such as that of <xref ref-type="bibr" rid="ref30">Poplack (1985)</xref> and of <xref ref-type="bibr" rid="ref21">Lamarre et al. (2002)</xref> is that they are based on few location points and/or few bilingual speakers, which is very likely related to the challenge of data collection. CS, for instance, is a spontaneous phenomenon and is more often used in informal contexts. CS is almost never used in legal documents or other highly formal contexts. In order to collect natural language data with CS, natural language needs to be collected in authentic and informal communication contexts. This requirement excludes the use of many linguistic corpora that contain news articles, linguistic questionnaires, or survey-based methods. The latter two methods are based on asking a selected group of people about their language behavior in a specific context, mainly &#x201C;at home&#x201D; (<xref ref-type="bibr" rid="ref4">Bouchard, 2000</xref>; <xref ref-type="bibr" rid="ref22">Laur, 2003</xref>; <xref ref-type="bibr" rid="ref33">Statistics Canada, 2011</xref>). People cannot be asked under what circumstances they code-switch and how often they do it, as CS often occurs spontaneously and speakers are not always aware of when they are code-switching. For this reason, a different methodology and a different data source are needed to test correlations between the size of the population of a particular city or neighborhood and CS or LCB from the same location.</p>
<p>The present study aims at clarifying the influence of the population size on CS and LCB by using tweets from Twitter associated with location information and user IDs that are necessary for identifying bilinguals and for measuring the intensity of CS and LCB per geographic area. This information will provide answers concerning whether the intensity correlates with the size of the speech communities in a given location. The size of a speech community will be taken from population data published by <xref ref-type="bibr" rid="ref33">Statistics Canada (2011)</xref>.</p>
<disp-quote>
<p><italic>General Hypothesis</italic>: CS and LCB correlate with the size of speech communities (<xref ref-type="bibr" rid="ref30">Poplack, 1985</xref>).</p>
</disp-quote>
<p>The outline of the paper is as follows. Section 2 expands on the General Hypothesis with several more operational hypotheses in more detail by looking at predictions from <xref ref-type="bibr" rid="ref33">Statistics Canada (2011)</xref>. Section 3 presents the data source and the methodology to test the detailed hypotheses from Section 2. Section 4 shows the results, and section 5 discusses the results and future research plans.</p>
</sec>
<sec id="sec2">
<label>2.</label>
<title>Hypotheses</title>
<p>According to <xref rid="fig1" ref-type="fig">Figure 1</xref> from <xref ref-type="bibr" rid="ref33">Statistics Canada (2011)</xref>, there is a much higher percentage of anglophones on the island of Montreal and in the city of Gatineau (a city on the border to Ontario), than in the city of Quebec.</p>
<fig position="float" id="fig1">
<label>Figure 1</label>
<caption>
<p>Census Profile from <xref rid="ref33" ref-type="bibr">Statistics Canada (2011)</xref> showing relative numbers of French (see column &#x201C;Only French&#x201D;) and English speaking communities (see column &#x201C;Only English&#x201D;). My emphasis of differences between numbers of speech communities.</p>
</caption>
<graphic xlink:href="fpsyg-14-1137038-g001.tif"/>
</fig>
<disp-quote>
<p><italic>Hypothesis 1:</italic> CS into English should be higher on the island of Montreal and in the city of Gatineau than in the city of Quebec.</p>
</disp-quote>
<p>According to <xref rid="fig1" ref-type="fig">Figure 1</xref>, the anglophone population is larger on the island of Montreal than in Greater Montreal (GM), which includes the island and the surrounding area. According to <xref ref-type="bibr" rid="ref33">Statistics Canada (2011)</xref>, visualized by <xref ref-type="bibr" rid="ref35">Timiou (2014)</xref> on a geographic map, the anglophone population on the island of Montreal is larger in the western part of the city than in the eastern part. The population differences (+/&#x2212;Greater Montreal) and urban zones (+/&#x2212;western part of the island) should have an effect on language use.</p>
<disp-quote>
<p><italic>Hypothesis 2:</italic> Bilinguals use more English on the island of Montreal than in Greater Montreal.</p>
</disp-quote>
<disp-quote>
<p><italic>Hypothesis 3:</italic> Bilinguals use more English in the western part than in the eastern part of the island.</p>
</disp-quote>
<disp-quote>
<p><italic>Hypothesis 4:</italic> There is more CS into English on the island of Montreal than in Greater Montreal.</p>
</disp-quote>
<disp-quote>
<p><italic>Hypothesis 5:</italic> CS into English is higher in the western part than in the eastern part and CS into French is higher in the eastern part than in the western part.</p>
</disp-quote>
<p>These five hypotheses will be tested in section 4 after the section Data and methodology.</p>
</sec>
<sec id="sec3">
<label>3.</label>
<title>Data and methodology</title>
<sec id="sec4">
<label>3.1.</label>
<title>Data</title>
<p>This paper uses Twitter as a data source to test Hypotheses 1&#x2013;5 listed in the previous section. Twitter is characterized as a &#x201C;microblogging platform,&#x201D; with the prefix &#x201C;micro-&#x201D; referring to the brevity of the posts. The platform allows registered users to distribute short messages (tweets). Tweets, unlike WhatsApp messages, are not private but public, which means that everybody can read them.</p>
<p>One of the great advantages of using social media platforms like Twitter as a data source for linguistic analysis is the large amount of speech data and corresponding meta-data, such as geolocation data and user information (<xref ref-type="bibr" rid="ref26">Mocanu et al., 2013</xref>; <xref ref-type="bibr" rid="ref10">Gon&#x00E7;alves and S&#x00E1;nchez, 2014</xref>; <xref ref-type="bibr" rid="ref24">Levy et al., 2018</xref>, among others), which I will present in more detail below. Just to provide some numbers that illustrate the size of the data set I used in this study: I analyzed more than 100,000 geolocated tweets from bilinguals from the city of Montreal to study their language choices in space and almost 9,000 bilingual and non-bilingual Twitter users from Montreal (for detailed numbers, see <xref rid="fig2" ref-type="fig">Figures 2</xref>, <xref rid="fig3" ref-type="fig">3</xref>, which will be commented in the corresponding sections). This is a scale quite different from that of most sociolinguistic studies conducted in Montreal, which typically analyze no more than a handful of speakers from the city (<xref ref-type="bibr" rid="ref21">Lamarre et al., 2002</xref>; <xref ref-type="bibr" rid="ref16">Konidaris, 2004</xref>, among others). Another advantage of using Twitter for the study of CS and LCB is that many text messages are written in various contexts such as coffee bars, restaurants, streets, work and at home (<xref ref-type="bibr" rid="ref18">Kruspe et al., 2021</xref>) and that many tweets represent informal speech (<xref ref-type="bibr" rid="ref31">Scheffler et al., 2022</xref>). Indeed, Twitter has been used to extract CS such as Spanish&#x2013;English Code-Switching in the United States (<xref ref-type="bibr" rid="ref25">Mendels et al., 2018</xref>).</p>
<fig position="float" id="fig2">
<label>Figure 2</label>
<caption>
<p>Number of Tweets with and without CS in cities extracted by &#x201C;city name&#x201D; from Twitter that have a higher number than 150 Tweets for +CS. Right most column: Relative Frequency calculation of &#x2013;CS: &#x2013; CS/Total. [1] Dorval has not been studied as a separated city from the island of Montreal according to Canadian census 2011. This is why I use the same number of population as the island of Montreal. [2] <ext-link xlink:href="https://www.ontario.ca/document/2016-census-highlights/fact-sheet-6-mother-tongue-and-language" ext-link-type="uri">https://www.ontario.ca/document/2016-census-highlights/fact-sheet-6-mother-tongue-and-language</ext-link>. [3] <ext-link xlink:href="https://www12.statcan.gc.ca/census-recensement/2021/as-sa/fogs-spg/page.cfm?dguid=2021A00052462037%26lang=F%26topic=1" ext-link-type="uri">https://www12.statcan.gc.ca/census-recensement/2021/as-sa/fogs-spg/page.cfm?dguid=2021A00052462037&#x0026;lang=F&#x0026;topic=1</ext-link>.</p>
</caption>
<graphic xlink:href="fpsyg-14-1137038-g002.tif"/>
</fig>
<fig position="float" id="fig3">
<label>Figure 3</label>
<caption>
<p>Distribution of Language use [+/&#x2212;CS-Engl., +/&#x2212;CS-French, LCB (English), and LCB (French)] in GM.</p>
</caption>
<graphic xlink:href="fpsyg-14-1137038-g003.tif"/>
</fig>
<p>The data used in this study are from a tweet corpus collected from November 2017 through March 2021 (<xref ref-type="bibr" rid="ref14">Kellert, 2022</xref>). I used the language tag &#x201C;lang&#x201D; == &#x201C;fr&#x201D; for French and &#x201C;lang&#x201D; == &#x201C;en&#x201D; as defined by Twitter to find all French tweets and all English tweets in my corpus from particular locations, as will be defined in section 3.2.</p>
</sec>
<sec id="sec5">
<label>3.2.</label>
<title>Methodology</title>
<sec id="sec6">
<label>3.2.1.</label>
<title>Defining code-switching</title>
<p>Code-Switching into English (henceforth simply CS-Engl.) is defined as the use of English words in French tweets such as the tweet <italic><underline>Bye</underline> le frette et la neige</italic> &#x201C;<underline>Bye</underline>, ice cold and snow.&#x201D; Code-Switching into French (CS-French) is defined as the use of French words in English tweets such as the tweet <italic>Thanksgiving could not have been better</italic>&#x2026; <underline><italic>Merci Ch&#x00E9;rie</italic></underline> (&#x2026;.<underline>Thank you, darling</underline>.).</p>
<p>Note that the position of the English words used in French tweets and French words in English tweets was not considered in the present study (for structural aspects of CS, see <xref ref-type="bibr" rid="ref29">Poplack, 1980</xref>; <xref ref-type="bibr" rid="ref7">Cook, 1991</xref>; <xref ref-type="bibr" rid="ref28">Myers-Scotton, 2002</xref>).</p>
<p>In order to calculate the number of tweets with and without Code-Switching (+/&#x2013;CS), a list of English and French words that have the same meaning or grammatical function, such as French <italic>anniversaire</italic> vs. English <italic>birthday</italic> or English <italic>not</italic> vs. French <italic>pas</italic>, was created.</p>
<p>The list of French and English words contains around 150 word pairs from different semantic domains that occur very frequently in tweets: including greetings, goodbyes, and wishes (see <xref ref-type="bibr" rid="ref14">Kellert, 2022</xref> for the identification of the most frequent semantic domains on Twitter), which can be found as a supported file (see <xref rid="SM1" ref-type="supplementary-material">Supplementary File 1</xref>, henceforth &#x201C;my list&#x201D;).</p>
<p>This procedure of creating lists of lexical word pairs in order to be able to match their frequency in geographic space is a classic procedure in dialectology that investigates lexical distribution in space (<xref ref-type="bibr" rid="ref10">Gon&#x00E7;alves and S&#x00E1;nchez, 2014</xref>; <xref ref-type="bibr" rid="ref11">Grieve et al., 2019</xref>). Such a list has the advantage of controlling the spatial distribution of lexical items that have the same meaning [e.g., <italic><underline>Bye</underline> le frette et la neige</italic> (+CS-Engl.) vs. <italic><underline>Au revoir</underline> le frette et la neige</italic> &#x201C;<underline>Bye</underline>, ice cold and snow!&#x201D; (&#x2013;CS-Engl.)]. In addition, the advantage of using a list of manually selected word pairs is that we can exclude lexical borrowings from English that have been assimilated into the Canadian French lexicon, such as <italic>le sandwich</italic> &#x201C;the sandwich&#x201D; or <italic>cool</italic> &#x201C;cool/great&#x201D; and that do not represent synchronic CS. Other (automatic) methods of CS identification as well as various types of CS such as more conventionalized and more spontaneous CS are reserved for future research (see section 5).</p>
</sec>
<sec id="sec7">
<label>3.2.2.</label>
<title>City information</title>
<p>I used the city information encoded by the tag &#x201C;city name&#x201D; in the meta-data associated with the tweets to calculate the relative proportions of +/&#x2013;CS-Engl. per city and visualize the use of +/&#x2013;CS-Engl. on maps and compare cities where +/&#x2013;CS-Engl. is more intense. If Hypothesis 1 in Section 2 is correct, the cities with the strongest CS-Engl. will fall out according to the relative size of the anglophone and francophone speech communities.</p>
</sec>
<sec id="sec8">
<label>3.2.3.</label>
<title>Geolocation information</title>
<p>I used the <italic>precise</italic> geolocation of the tweets, that is, the location of the user when posting a tweet message to compare the amount of CS and LCB (French vs. English) per urban zone in Greater Montreal. Example (2), repeated here, is an example of a text message with geolocation information:<list list-type="order">
<list-item><p>Nouveau caf&#x00E9; in my hood. &#x201C;New caf&#x00E9; in my hood.&#x201D;</p></list-item>
</list></p>
<p>The exact location of the user when sending the text message can be mapped on a geographic map like Google Maps using coordinates such as 45.523081 (latitude), &#x2212;73.588132 (longitude), extracted from the tweets by using &#x201C;geo coordinates.&#x201D; In this case, the message was sent from <italic>Le Saint Louis caf&#x00E9;</italic> in Montreal.</p>
<p>By means of exact addresses or geolocation data, we can visualize the proportion of CS and LCB (French vs. English) on a geographic map showing Montreal and compare urban zones where CS-Engl. and CS-French and LCB are more intense (<xref ref-type="bibr" rid="ref26">Mocanu et al., 2013</xref>; <xref ref-type="bibr" rid="ref10">Gon&#x00E7;alves and S&#x00E1;nchez, 2014</xref>; <xref ref-type="bibr" rid="ref24">Levy et al., 2018</xref>; <xref ref-type="bibr" rid="ref11">Grieve et al., 2019</xref>, among others). This procedure will allow us to test Hypotheses 2&#x2013;5. If these hypotheses are correct, CS and LCB will vary according to the eastern and western part of Montreal&#x2019;s island and to +/&#x2212;Greater Montreal, which correlate with differences in the size of the anglophone and francophone speech communities.</p>
</sec>
<sec id="sec9">
<label>3.2.4.</label>
<title>Defining language choice of &#x201C;bilingual&#x201D; users</title>
<p>I defined bilingual users as those who tweet in both English and French, that is, with at least one tweet in English and at least one tweet in French in the area of Greater Montreal. In order to find bilingual users, I used user ID encoded by integers in the meta-data of tweets (&#x201C;user id&#x201D;).</p>
</sec>
<sec id="sec10">
<label>3.2.5.</label>
<title>Calculation of relative frequencies and visualization on maps</title>
<p>A substantial part of the methodology in this section such as calculation of relative frequencies and visualization on maps is based on <xref ref-type="bibr" rid="ref15">Kellert and Matlis (2022)</xref>. I calculated the relative frequency of French tweets with English words (+CS-Engl.) and French tweets with French equivalents (&#x2013;CS-Engl.) per city. I created city corpora by using the city information expressed as &#x201C;city name&#x201D; in the tweets&#x2019; meta-data.</p>
<p>In addition, I calculated +/&#x2212;CS-Engl. and +/&#x2212;CS-French as well as LCB (English and French) per urban zone or location in Greater Montreal. For the latter calculation, I used the geographic extent of Greater Montreal ([&#x2212;74.031218, &#x2212;73.284148, 45.833152, 45.323716]) to find tweets that were posted from the area covering this extent by using the geolocation information. I binned the geographic extent of Greater Montreal into 50&#x2009;&#x00D7;&#x2009;50 equal bins, which generated 2,500 bins for Greater Montreal. The size of a single bin corresponds more or less to the size of a city block (<xref ref-type="bibr" rid="ref15">Kellert and Matlis, 2022</xref>). The absolute frequency counts of +/&#x2212;CS-Engl. and +/&#x2212;CS-French as well as LCB (English and French) per urban zone or location in Greater Montreal can be found as <xref rid="SM1" ref-type="supplementary-material">Supplementary Files 2</xref>&#x2013;<xref rid="SM1" ref-type="supplementary-material">4</xref>.</p>
<p>We can refer to each such cluster of data&#x2014;i.e., all data pertaining to one city or to one urban zone&#x2014;as a &#x201C;container&#x201D; or &#x201C;bin.&#x201D; I used a particular calculation method that identifies the bins with the largest differences for +CS or &#x2013;CS (see differential distribution, <xref ref-type="bibr" rid="ref15">Kellert and Matlis, 2022</xref>).</p>
<p>Differential distribution compares the geographical shape of a distribution of one linguistic variant (e.g., tweets with English words or&#x2009;+CS-Engl.) with the geographical distribution of another linguistic variant (e.g., tweets with French words with the same meaning or &#x2013;CS-Engl.). If the geographical shapes of two distributions (+CS-Engl. and &#x2013;CS-Engl.) overlap, they are the same, which means there is no difference in the distributions. Ultimately, what the Differential Distribution does is measure the difference between distributions per bin. The following mathematical definition of the differential distribution is a modified version of that found in <xref ref-type="bibr" rid="ref15">Kellert and Matlis (2022)</xref>.</p>
<p>We first define normalized tweet distributions by: <inline-formula><mml:math id="M1"><mml:mrow><mml:msubsup><mml:mi>f</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mi>T</mml:mi></mml:msubsup><mml:mo>&#x2261;</mml:mo><mml:msubsup><mml:mi>c</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mi>T</mml:mi></mml:msubsup><mml:mo>/</mml:mo><mml:msup><mml:mi>N</mml:mi><mml:mi>T</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M2"><mml:mrow><mml:msubsup><mml:mi>f</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mi>R</mml:mi></mml:msubsup><mml:mo>&#x2261;</mml:mo><mml:msubsup><mml:mi>c</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mi>R</mml:mi></mml:msubsup><mml:mo>/</mml:mo><mml:msup><mml:mi>N</mml:mi><mml:mi>R</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math id="M3"><mml:mrow><mml:msup><mml:mi>N</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mo>&#x2261;</mml:mo><mml:munder><mml:mstyle displaystyle="true"><mml:mo>&#x2211;</mml:mo></mml:mstyle><mml:mi>i</mml:mi></mml:munder><mml:munder><mml:mstyle displaystyle="true"><mml:mo>&#x2211;</mml:mo></mml:mstyle><mml:mi>j</mml:mi></mml:munder><mml:msubsup><mml:mi>c</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mi>T</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M4"><mml:mrow><mml:msup><mml:mi>N</mml:mi><mml:mi>R</mml:mi></mml:msup><mml:mo>&#x2261;</mml:mo><mml:munder><mml:mstyle displaystyle="true"><mml:mo>&#x2211;</mml:mo></mml:mstyle><mml:mi>i</mml:mi></mml:munder><mml:munder><mml:mstyle displaystyle="true"><mml:mo>&#x2211;</mml:mo></mml:mstyle><mml:mi>j</mml:mi></mml:munder><mml:msubsup><mml:mi>c</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mi>R</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> are the total number of tweets in the target (+CS) and reference (&#x2013;CS) distributions, respectively. The quantities <inline-formula><mml:math id="M5"><mml:mrow><mml:msubsup><mml:mi>f</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mi>T</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M6"><mml:mrow><mml:msubsup><mml:mi>f</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mi>R</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> represent the fraction of tweets in the (i,j)th bin for the target and reference cases, respectively. The comparison between the two distributions is then done by calculating the difference in the tweet fraction per bin: <inline-formula><mml:math id="M7"><mml:mrow><mml:mi>&#x0394;</mml:mi><mml:mi mathvariant="normal"></mml:mi><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2261;</mml:mo><mml:msubsup><mml:mi>f</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mi>T</mml:mi></mml:msubsup><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mi>f</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mi>R</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula>, which is referred to as &#x201C;differential distribution&#x201D; in <xref ref-type="bibr" rid="ref15">Kellert and Matlis (2022)</xref>. This quantity can be interpreted as follows: bins with positive values of <inline-formula><mml:math id="M8"><mml:mrow><mml:mi>&#x0394;</mml:mi><mml:mi mathvariant="normal"></mml:mi><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> over-represent the target tweets, while negative values under-represent them, relative to the reference tweet distribution. Since bins with equal representation of tweets have <inline-formula><mml:math id="M9"><mml:mrow><mml:mi>&#x0394;</mml:mi><mml:mi mathvariant="normal"></mml:mi><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula>, independently of the total numbers of each variant, small variations in degree of representation can be resolved. This metric does not require special treatment for bins with zero counts, and results in larger values of <inline-formula><mml:math id="M10"><mml:mrow><mml:mi>&#x0394;</mml:mi><mml:mi mathvariant="normal"></mml:mi><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> for larger variations, even if either of the tweet counts are zero. As a result, noise associated with low-count bins is suppressed. A consequence of the normalization is that the sum of the distribution differences is exactly zero, <inline-formula><mml:math id="M11"><mml:mrow><mml:munder><mml:mstyle displaystyle="true"><mml:mo>&#x2211;</mml:mo></mml:mstyle><mml:mi>i</mml:mi></mml:munder><mml:munder><mml:mstyle displaystyle="true"><mml:mo>&#x2211;</mml:mo></mml:mstyle><mml:mi>j</mml:mi></mml:munder><mml:mi>&#x0394;</mml:mi><mml:mi mathvariant="normal"></mml:mi><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula>, so that for any two distributions, the contributions from each will be equal and all bins will be identically zero (<inline-formula><mml:math id="M12"><mml:mrow><mml:mi>&#x0394;</mml:mi><mml:mi mathvariant="normal"></mml:mi><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula>) for two distributions of exactly the same shape but a different total number of tweets.</p>
<p>Let us go through the mathematical calculation using a hypothetical example. Assume that we have five cities with a total sum of 80 tweets with French words (&#x2013;CS-Engl.) and a total of 240 tweets with English words (+CS-Engl.) from my list of word pairs. Let us further assume that a particular city x has 10 tweets with French words (&#x2013;CS-Engl.) and 30 with English words (+CS-Engl.). Applying the calculation of differential distribution above, we get &#x0394;&#x2009;=&#x2009;30/240&#x2013;10/80&#x2009;=&#x2009;0. The result of zero means that city x is not prominent for +CS-Engl. or &#x2013;CS-Engl., as the difference between +CS-Engl. and &#x2013;CS-Engl. is zero. If a city has a higher value, it is more prominent for +CS-Engl., and if it has a negative value, it is more prominent for &#x2013;CS-Engl. As will be shown in section 4.1, this calculation is especially susceptible to overemphasizing differences in distributions, where differences calculated by simple relative frequencies (that is, <italic>x</italic> number of observations divided by the sum of all observations) show only a small variation.</p>
<p>I visualized the results from the calculation of differential distribution on geographic maps by marking the location in red if the difference of CS-Engl. was positive, that is, if there were more tweets with English words in comparison to all other locations. Otherwise, I marked the location in blue (see visualization technique in <xref ref-type="bibr" rid="ref15">Kellert and Matlis, 2022</xref>). I marked location in purple, if CS-French was more prominent than the absence of CS-French. The size of the circle corresponds to the size of the delta. The larger the red circle, the more positive the delta of CS-Engl., and the larger the blue circle, the more negative the delta. The same case applies for CS-French, that is, the larger the purple circle, the higher the intensity of CS-French. For the visualization, I used Cartopy (see <xref ref-type="bibr" rid="ref6">Cartopy v0.11.2. Met Office, 2014</xref>), which is an open source, that is, freely available and modifiable, software that maps coordinates to Open Street Maps, which is also freely available. All figures in this document were produced by using the base map and data from OpenStreetMap and OpenStreetMap Foundation under the Open Database License.</p>
</sec>
<sec id="sec11">
<label>3.2.6.</label>
<title>Preprocessing/prefiltering</title>
<p>Montreal is a popular tourist destination from which tourists from France might tweet in French and tourists from various other countries might tweet in English. Tourists might thus influence the statistics.</p>
<p>In order to check how much tourists influence the differential distribution of LCB in Montreal, I performed an experiment. I defined local users as those who created their profile in Montreal, as encoded by &#x201C;user location&#x201D; on Twitter. I assumed that the user location very likely represents the &#x201C;place of residence&#x201D; of the Twitter user (see <xref ref-type="bibr" rid="ref15">Kellert and Matlis, 2022</xref>). I checked this assumption on a random set of Twitter profiles by looking at other cues that might provide evidence of the user&#x2019;s origin, such as the profile description and the content of the tweets. Let us consider an example of a user with an account from Montreal. The user says on her profile that she is a <italic>Passionate Montrealer</italic>, and furthermore she says she <italic>tweets in English and French</italic>, which can also be seen by the Code-Switching in the profile description, where she says she is a <italic>Creative mind</italic> and then code-switches into French: <italic>joueuse de tennis</italic> &#x201C;tennis player.&#x201D; By using user IDs, I was able to identify tweets from this user and to check whether she tweeted mostly from Montreal. I filtered out users who tweet from Montreal but who did not create their profile in that city, assuming that these users are temporarily in Montreal as tourists.</p>
<p>I then tested whether LCB or language choice of bilingual users was much different depending on local users or on all users (including tourists). As it turns out there is no difference in spatial distribution of tweet behavior between local and all bilingual users from Montreal (see section 4.2). Not only is there no difference in spatial distribution of tweets, there is also almost no difference in frequency numbers of LCB from local and all users (117,514 Tweets from local bilingual users vs. 121,109 Tweets from all bilingual users). This result suggests that the majority of bilingual users French-English are local users, which makes intuitively sense because someone who is tweeting in French and English in Montreal is very likely someone from a multilingual region or city like Montreal.</p>
</sec>
</sec>
</sec>
<sec id="sec12" sec-type="results">
<label>4.</label>
<title>Results</title>
<sec id="sec13">
<label>4.1.</label>
<title>Cities and population differences: Testing hypothesis 1</title>
<p>The results show that Hypothesis 1 is confirmed for most of the cities that show enough data for a statistical comparison.</p>
<p><xref rid="fig4" ref-type="fig">Figure 4</xref> shows the differential distribution of CS-Engl. in the province of Quebec and its surroundings, such as the province of Ontario (see <xref rid="fig4" ref-type="fig">Figure 4</xref>). There are only a few cities that show clear differences, which are marked by easily identifiable big red and blue circles (nine cities in total). Many other cities marked as small blue or red points do not provide enough data points to be statistically relevant; that is, they show very few tweets with +/&#x2013;CS-Engl. The statistical numbers for these nine cities are provided in <xref rid="fig2" ref-type="fig">Figure 2</xref>. The biggest red circle representing a city with the most prominent use of CS-Engl. in <xref rid="fig4" ref-type="fig">Figure 4</xref> is Dorval, which is a small provincial city in the west of the island of Montreal. The second largest red circle is the city of Toronto, in the province of Ontario, and the third largest circle is the city of Gatineau at the border between the provinces Ontario and Quebec. The largest blue circle marking the highest absence of English words in French tweets (&#x2013;CS-Engl.) corresponds to the city of Quebec. This is the city with the least CS compared to all other cities. <xref rid="fig2" ref-type="fig">Figure 2</xref> also provides the relative frequencies of &#x2013;CS-Engl. per city and the population numbers of anglophone communities in percentages published by Canadian Statistics (see <xref ref-type="bibr" rid="ref33">Statistics Canada, 2011</xref>). <xref rid="fig2" ref-type="fig">Figure 2</xref> shows that cities with the relative frequency (RF) of &#x2013;CS-Engl. lower than 0.8 (&#x003C;0.8) correspond to high percentages of anglophone inhabitants (over 10%). Cities with higher RF (&#x003E;0.9) mostly correspond to lower anglophone population figures (under 10%). However, the city of Montreal shows comparatively less CS-Engl. than the city of Gatineau, which is less than expected from Hypothesis 1. In addition, the city of Montreal shows relatively more CS-Engl. than the city of Quebec, which is expected according to Hypothesis 1 according to both calculations: the differential distribution (see <xref rid="fig4" ref-type="fig">Figure 4</xref>) and the relative frequency calculation in <xref rid="fig2" ref-type="fig">Figure 2</xref>. However, the visualization in <xref rid="fig4" ref-type="fig">Figure 4</xref> overemphasizes differences in CS-Engl. in cities; whereas the difference in RF between Montreal and the city of Quebec is a very low number (RF difference is 0.05 according to <xref rid="fig2" ref-type="fig">Figure 2</xref>).</p>
<fig position="float" id="fig4">
<label>Figure 4</label>
<caption>
<p>+/&#x2212; Differential distribution of +CS-Engl. (red) und &#x2013;CS-Engl. (blue) in French Tweets in Quebec and Ontario. City of Dorval is the most prominent city for +CS-Engl. in French tweets, followed by Toronto and the city of Gatineau. Base map and data from OpenStreetMap and OpenStreetMap Foundation under the Open Database License.</p>
</caption>
<graphic xlink:href="fpsyg-14-1137038-g004.tif"/>
</fig>
<p>To summarize, we observe a higher proportion of CS-Engl. in Dorval (a city on the island of Montreal) and cities on the border with Ontario, as well as in the city of Toronto in Ontario, than we do in the city of Quebec, which shows the least use of CS. The city of Montreal shows less use of CS than the city of Quebec, which is expected, but it shows less use of CS-Engl. than Gatineau, which is unexpected.</p>
</sec>
<sec id="sec14">
<label>4.2.</label>
<title>Influence of population differences in Montreal on LCB: Testing hypotheses 2 and 3</title>
<p>There are in total 8,974 local users in Montreal in my tweet corpus, 2,694 of whom (=almost 1/3) tweet in both languages. That is, there are 2,694 bilingual users, who posted 34,711 tweets in French and 82,803 in English (<xref rid="fig3" ref-type="fig">Figure 3</xref>). This result confirms <xref ref-type="bibr" rid="ref26">Mocanu et al.&#x2019;s (2013)</xref> observation that there is a general trend to write more tweets in English than in French in Montreal.</p>
<p><xref rid="fig5" ref-type="fig">Figure 5</xref> shows the differential distribution of English (red) and French tweets (blue) in Greater Montreal among bilingual users. <xref rid="fig5" ref-type="fig">Figure 5A</xref> shows the distribution in a square format, which emphasizes the geographic pattern of the distribution. The pattern is very clear: English is distributed more in the western part of the city, and French in the eastern part. Moreover, we see more blue than red outside the island of Montreal, which indicates that the difference between the island and the periphery plays a role in the distribution of LCB. These results confirm Hypothesis 2.</p>
<fig position="float" id="fig5">
<label>Figure 5</label>
<caption>
<p>English tweets (red) and French tweets (blue) posted by &#x201C;local&#x201D; bilingual users in Greater Montreal. (<bold>A</bold>, left): without intensity. (<bold>B</bold>, right): with intensity. Base map and data from OpenStreetMap and OpenStreetMap Foundation under the Open Database License.</p>
</caption>
<graphic xlink:href="fpsyg-14-1137038-g005.tif"/>
</fig>
<p><xref rid="fig5" ref-type="fig">Figure 5B</xref> shows the same differential distribution as <xref rid="fig5" ref-type="fig">Figure 5A</xref>, but with an intensity marking, which shows where the intensity of English and French tweets is located the most. English tweets are very much concentrated in the old town of the city, and few big red circles are visible in the city of Laval. The intensity of a few big red circles helps us to evaluate the strength of the difference (<xref ref-type="bibr" rid="ref15">Kellert and Matlis, 2022</xref>). In contrast to <xref rid="fig5" ref-type="fig">Figures 5A</xref>,<xref rid="fig5" ref-type="fig">B</xref> shows that the difference in the distribution of English tweets on the island and periphery is rather weak. This observation confirms the numbers from <xref rid="fig1" ref-type="fig">Figure 1</xref> from <xref ref-type="bibr" rid="ref33">Statistics Canada (2011)</xref>, which show an approximate difference of 5% in the Anglophone population inside and outside the island of Montreal (see 16.64% vs. 11.62% in +/&#x2212; Greater Montreal).</p>
<p><xref rid="fig6" ref-type="fig">Figure 6</xref> shows the underlying data of <xref rid="fig5" ref-type="fig">Figure 5</xref>, representing the distribution of frequencies of English tweets (y-axis) and French tweets (x-axis) in Greater Montreal per bin (2,500 bins in total). The most important point of the distribution in <xref rid="fig6" ref-type="fig">Figure 6</xref> is that not all points are distributed along the linear correlation line in red, which would suggest that each location in Greater Montreal has the same relative number of English and French tweets. If the data points followed the correlation line that would indicate that the location does not matter for LCB, contrary to Hypotheses 2 and 3 in section 2. What we instead see from the plot in <xref rid="fig6" ref-type="fig">Figure 6</xref> is that some data points do follow the correlation line, but some do not. There are visible data points representing locations that show strong preferences for English, which are distributed close to the y-axis, and that show strong preferences for French, distributed close to the x-axis. These locations are the ones that show the biggest red or blue circles in <xref rid="fig5" ref-type="fig">Figure 5B</xref>. However, some locations do not show any difference in the relative frequency of English and French tweets, which means that they are not relevant for LCB.</p>
<fig position="float" id="fig6">
<label>Figure 6</label>
<caption>
<p>Frequencies of English (y-axis) and French Tweets (x-axis) per urban zone in GM. Red dashed line corresponds to a correlation line. The points following the correlation line represent locations with no difference in language choice.</p>
</caption>
<graphic xlink:href="fpsyg-14-1137038-g006.tif"/>
</fig>
<p>Finally, <xref rid="fig7" ref-type="fig">Figure 7</xref> compares two distributions of English and French tweets. <xref rid="fig7" ref-type="fig">Figure 7A</xref> shows the distribution of tweets produced by local bilingual users (repeated from <xref rid="fig5" ref-type="fig">Figure 5B</xref>), whereas <xref rid="fig7" ref-type="fig">Figure 7B</xref> shows the distribution of tweets produced by all bilingual users without filtering out non-local bilingual users (see section 3.2.6 on preprocessing and filtering). The pattern in <xref rid="fig7" ref-type="fig">Figure 7B</xref> is almost the same as that in <xref rid="fig7" ref-type="fig">Figure 7A</xref>, which indicates that non-local bilingual users do not change significantly the spatial pattern of tweet distribution.</p>
<fig position="float" id="fig7">
<label>Figure 7</label>
<caption>
<p>Comparison of English tweets (red) and French tweets (blue) posted by bilingual local and all users in Greater Montreal. (<bold>A</bold>, left): &#x201C;local&#x201D; bilingual users (see also <xref rid="fig5" ref-type="fig">Figure 5B</xref>). (<bold>B</bold>, right): &#x201C;all&#x201D; bilingual users. Base map and data from OpenStreetMap and OpenStreetMap Foundation under the Open Database License.</p>
</caption>
<graphic xlink:href="fpsyg-14-1137038-g007.tif"/>
</fig>
<p>To summarize the results, some locations do show a division of the city into two linguistic zones among bilinguals, with French in the eastern part of the island and English in the western part, as well as more English on the island than on the periphery. However, some other locations do not show any difference in the distribution of French and English tweets. The latter observation is probably the effect that <xref ref-type="bibr" rid="ref21">Lamarre et al. (2002)</xref> observed in their study on the basis of bilingual Montrealers using a different method.</p>
</sec>
<sec id="sec15">
<label>4.3.</label>
<title>Influence of population differences in Montreal on CS: Testing hypotheses 4 and 5</title>
<p>This section shows that CS depends on the location in Montreal, which confirms the hypotheses 4 and 5, but the strength of the pattern is rather weak.</p>
<p>There is a total of 57,415 georeferenced French tweets from Greater Montreal from local users that contain one of the words from my list (see <xref rid="fig3" ref-type="fig">Figure 3</xref>). 15,020 tweets contain English words from my list (+CS-Engl.), and 42,395 tweets contain French words from my list (&#x2013;CS-Engl.; see <xref rid="fig3" ref-type="fig">Figure 3</xref>). Despite this large number of tweets with English words or&#x2009;+&#x2009;CS-Engl., the tweets are not distributed everywhere on the island of Montreal and the city&#x2019;s periphery, as <xref rid="fig8" ref-type="fig">Figure 8</xref> shows. The square format in <xref rid="fig8" ref-type="fig">Figure 8A</xref> shows clearly a spatial pattern. CS-Engl. is heavily concentrated in the western part of the city, as shown by the color red. Consequently, CS-Engl. is mostly used in the area with the larger anglophone population according to <xref ref-type="bibr" rid="ref33">Statistics Canada (2011)</xref>.</p>
<fig position="float" id="fig8">
<label>Figure 8</label>
<caption>
<p>Differential distribution of +CS-Engl. (red) and &#x2013;CS-Engl. (blue) in Greater Montreal. (<bold>A</bold>, left): without intensity. (<bold>B</bold>, right): with intensity. Base map and data from OpenStreetMap and OpenStreetMap Foundation under the Open Database License.</p>
</caption>
<graphic xlink:href="fpsyg-14-1137038-g008.tif"/>
</fig>
<p><xref rid="fig8" ref-type="fig">Figure 8B</xref> shows the same distribution of CS as in <xref rid="fig8" ref-type="fig">Figure 8A</xref>, but with an intensity marker. The biggest red circles are located at the airport and in the old town. The question is why these contexts are especially prominent for English words in French tweets. One possible explanation is that at the airport, a particularly large number of goodbyes and greetings are expressed in English. Indeed, it is noticeable that many fixed or idiomatic expressions in English are used in goodbyes like <italic>Bye, Here we come! It&#x2019;s gonna be fun!</italic>, as shown in examples (6&#x2013;8).</p>
<list list-type="order">
<list-item><p>D&#x00E9;part pour des petites vacances avec ma France!!! La Floride <underline>here We come</underline>!!! Apr&#x00E8;s &#x00E7;a va aller vite&#x2026;<underline>busy busy</underline>&#x2026;</p>
<list list-type="simple">
<list-item><p>&#x201C;Departure with my France for a few days of vacation. Florida, <underline>here we come</underline>. Later, it will be very fast. <underline>Busy busy.</underline>&#x201D;</p></list-item></list></list-item>
<list-item><p><underline>&#x1F6E3; on the road -. &#x1F699; | It&#x2019;s road trip time</underline>! On part pour quelques jours &#x00E0; Qu&#x00E9;bec! &#x1F499; <underline>It&#x2019;s gonna be fun</underline>! &#x1F917;. &#x1F4F8; | ou&#x2026;.</p>
<list list-type="simple">
<list-item><p>&#x201C;<underline>on the road -. &#x1F699; | It&#x2019;s road trip time</underline>! We&#x2019;re going to Quebec for a few days! &#x1F499; <underline>It&#x2019;s gonna be fun</underline>!. &#x1F4F8;| or&#x2026;&#x201D;</p></list-item>
</list></list-item>
<list-item><p>&#x201C;<underline>Bye</underline> le frette et la neige!,&#x201D; &#x201C;<underline>Bye</underline> ice cold and snow!&#x201D;</p></list-item>
</list>
<p>This type of CS-Engl. corresponds to <xref ref-type="bibr" rid="ref29">Poplack&#x2019;s (1980)</xref> notion of &#x201C;emblematic CS&#x201D;; that is, CS at the airport is mainly used in connection with discourse units, particles, and word fillers, such as <italic>bye, well, you know</italic>, whereas the main message is written in French.</p>
<p>In order to determine how stable the spatial pattern of CS-Engl. is in Montreal as shown in <xref rid="fig8" ref-type="fig">Figure 8B</xref>, I used a subset of the word pairs from my list that only contains self-referring expressions such as <italic>my</italic> or <italic>mine</italic> in English and corresponding words in French. <xref rid="fig9" ref-type="fig">Figure 9</xref> shows a comparison between a tweet distribution of +/&#x2212;CS-Engl. based on a full list of word pairs (see <xref rid="fig9" ref-type="fig">Figure 9A</xref>, repeated from <xref rid="fig8" ref-type="fig">Figure 8B</xref>) and a tweet distribution based on a subset of word pairs from my list (see <xref rid="fig9" ref-type="fig">Figure 9B</xref>). The comparison shows that the spatial pattern of +CS-Engl. and &#x2013;CS-Engl. is still the same for a subset of word pairs from my list, that is, with more CS in the west than in the east. Actually, the pattern is even stronger in <xref rid="fig9" ref-type="fig">Figure 9B</xref> than in <xref rid="fig9" ref-type="fig">Figure 9A</xref>, which means that when users from Montreal talk about themselves, they use English words even more in the west and French words in the east compared to other words with no self-reference.</p>
<fig position="float" id="fig9">
<label>Figure 9</label>
<caption>
<p>Comparison of +/&#x2212;CS-Engl. in Greater Montreal. (<bold>A</bold>, left): full list. (<bold>B</bold>, right): self-referring items. Base map and data from OpenStreetMap and OpenStreetMap Foundation under the Open Database License.</p>
</caption>
<graphic xlink:href="fpsyg-14-1137038-g009.tif"/>
</fig>
<p><xref rid="fig10" ref-type="fig">Figure 10</xref> shows a plot with frequency distribution of tweet counts (+CS-Engl. and &#x2013;CS-Engl. on a full list of word pairs). It shows similar results as with tweet distribution of LCB in <xref rid="fig6" ref-type="fig">Figure 6</xref>, that is, some locations do not show any difference for CS-Engl., whereas some do. As the numbers are much smaller for CS than for LCB, I used a base-10 logarithm on the data (<xref rid="fig11" ref-type="fig">Figure 11</xref>), which makes the pattern in <xref rid="fig10" ref-type="fig">Figure 10</xref> more visible for small numbers. <xref rid="fig11" ref-type="fig">Figure 11</xref> illustrates that higher tweet frequencies show a higher linear correlation in the distribution of tweets with and without CS-Engl. This means that locations from which a high number of tweets is posted do not show a huge difference in +/&#x2212;CS-Engl. To test this observation statistically, I calculated the Pearson correlation coefficient (see <xref rid="fig12" ref-type="fig">Figure 12</xref>). The Pearson correlation coefficient value is rather high (&#x003E;0.8). This means that, overall, the location does not strongly influence the distribution of +/&#x2013;CS-Engl. in Greater Montreal, especially with higher frequency numbers.</p>
<fig position="float" id="fig10">
<label>Figure 10</label>
<caption>
<p>+/&#x2212;CS-Engl. per urban location in GM.</p>
</caption>
<graphic xlink:href="fpsyg-14-1137038-g010.tif"/>
</fig>
<fig position="float" id="fig11">
<label>Figure 11</label>
<caption>
<p>Base-10 logarithm on +/&#x2212;CS per bin in GM.</p>
</caption>
<graphic xlink:href="fpsyg-14-1137038-g011.tif"/>
</fig>
<fig position="float" id="fig12">
<label>Figure 12</label>
<caption>
<p>Statistics of +/&#x2212;CS per bin in GM.</p>
</caption>
<graphic xlink:href="fpsyg-14-1137038-g012.tif"/>
</fig>
<p>How should we interpret this result, which seems to contradict the clear spatial pattern of +/-CS-Engl. in <xref rid="fig8" ref-type="fig">Figure 8A</xref>? The spatial pattern is there, but the signal or the effect is weak (see <xref ref-type="bibr" rid="ref15">Kellert and Matlis, 2022</xref> for a detailed discussion of the difference between the presence of a spatial pattern and the signal or the strength of the pattern).</p>
<p><xref rid="fig13" ref-type="fig">Figure 13</xref> shows a comparison between the spatial distribution of CS-Engl. in French Tweets (<xref rid="fig13" ref-type="fig">Figure 13A</xref>) and CS-French in English Tweets in Greater Montreal (<xref rid="fig13" ref-type="fig">Figure 13B</xref>). As there is comparatively few tweets with CS-French in English tweets (see <xref rid="fig3" ref-type="fig">Figure 3</xref> for exact numbers), there are only few circles showing where CS-French is used comparatively more than the absence of CS-French in Greater Montreal. The few visible circles showing CS-French (in purple) are clearly concentrated in the east of the island of Montreal and outside of the island. The spatial pattern of Code-Switching (CS-Engl. and CS-French) shows that CS depends on location as predicted by the General Hypothesis. However, the strength of the pattern is rather weak.</p>
<fig position="float" id="fig13">
<label>Figure 13</label>
<caption>
<p>Comparison between +/&#x2212;CS-Engl. (left) and +/&#x2212;CS-French (right) in GM. (<bold>A</bold>, left): +CS-Engl. (red) and &#x2013;CS-Engl. (blue). (<bold>B</bold>, right)&#x2009;+&#x2009;CS-French (purple) and &#x2013;CS-French (green). Base map and data from OpenStreetMap and OpenStreetMap Foundation under the Open Database License.</p>
</caption>
<graphic xlink:href="fpsyg-14-1137038-g013.tif"/>
</fig>
</sec>
</sec>
<sec id="sec16">
<label>5.</label>
<title>Discussion and future research plans</title>
<p>One of the major contributions of this paper was to test the General Hypothesis, repeated here below, which has led to conflicting results in previous studies (<xref ref-type="bibr" rid="ref30">Poplack, 1985</xref> vs. <xref ref-type="bibr" rid="ref21">Lamarre et al., 2002</xref>):</p>
<disp-quote>
<p><italic>General Hypothesis</italic>: CS and LCB correlate with the size of speech communities (<xref ref-type="bibr" rid="ref30">Poplack, 1985</xref>).</p>
</disp-quote>
<p>The results of CS-Engl. in cities seem to show a trend confirming the General Hypothesis when comparing entire cities (see Hypothesis 1). However, not all cities have yielded enough data to be able to confirm the General Hypothesis due to low numbers of tweets. This result can be improved in the future. One possible improvement is to extend my word-pair list with additional word pairs or to use a completely different approach for measuring CS that does not rely on manually crafted word-pair lists (<xref ref-type="bibr" rid="ref25">Mendels et al., 2018</xref>, among others). In the future, it will be necessary to test the automatic language classification of tweets and/or of words to improve the identification of CS and LCB. In this paper, I relied on the language classification performed by Twitter. Twitter&#x2019;s language classifier from 2015 labels English Tweets with 99% precision (see Twitter website<xref rid="fn0005" ref-type="fn"><sup>2</sup></xref>). However, Twitter also mentions that minor languages and tweets with mixed languages such as English and French have a lower degree of precision (<xref ref-type="bibr" rid="ref25">Mendels et al., 2018</xref>).</p>
<p>One important issue related to the identification of CS concerns the definition of CS. In this paper, as well as in many other computational linguistic approaches to CS (<xref ref-type="bibr" rid="ref25">Mendels et al., 2018</xref> for an overview), CS is defined very broadly as mixing of languages and thus also includes any kind of mixing such as formulaic expressions or &#x201C;conventionalized&#x201D; CS and language translations used in the same tweet such as <italic>We are open! Nous sommes ouverts!</italic> In linguistics, however, formulaic expressions are distinguished from &#x201C;proper,&#x201D; &#x201C;genuine&#x201D; or &#x201C;spontaneous&#x201D; CS (<xref ref-type="bibr" rid="ref29">Poplack, 1980</xref>, among others). I will address this issue in detail in future research.</p>
<p>Another important result of this study is the contribution to testing the General Hypothesis at the level of Greater Montreal (island of Montreal and periphery) and on the sub-city level (east and west of the island) using precise geolocation information. On this level of spatial granularity, the results are mixed. Looking at geographic patterns of CS and LCB in <xref rid="fig5" ref-type="fig">Figures 5A</xref>, <xref rid="fig8" ref-type="fig">8A</xref>, there is a trend of spatial division as predicted by the General Hypothesis or more precisely by Hypotheses 2&#x2013;5. However, by looking into the statistical numbers and testing frequency correlations per bin, the effect of spatial pattern visible in <xref rid="fig5" ref-type="fig">Figures 5A</xref>, <xref rid="fig8" ref-type="fig">8A</xref> is rather weak. Most locations do not show a big difference in frequency distributions of LCB (English and French) and of CS (+/&#x2013;CS), especially in locations with higher frequency numbers. In the future, I will use methods to test spatial correlation in the data to see whether locations with preferences for English are clustered together (see <xref ref-type="bibr" rid="ref1">Anselin, 1995</xref> for Local Moran&#x2019;s I).</p>
<p>This study has shown that there is considerably more English tweets than French tweets and considerably more Code-Switching in French tweets than in English tweets in Greater Montreal (see <xref rid="fig3" ref-type="fig">Figure 3</xref>). This observation suggests that language use on Twitter is not entirely predicted by population numbers of speech communities as shown in <xref rid="fig1" ref-type="fig">Figure 1</xref> from Canadian statistics. The higher numbers of tweet counts in English in <xref rid="fig3" ref-type="fig">Figure 3</xref> strongly indicate that English is the more dominant language on Twitter, which can be related to various factors such as English being used as <italic>Lingua Franca</italic> in social media (<xref ref-type="bibr" rid="ref20">Laitinen and Lundberg, 2020</xref>). The observation of spatial patterns of CS-Engl. in cities of Quebec and spatial patterns of LCB and CS in Greater Montreal indicates that despite the dominance of English in the counts of tweets, the geographic context also influences language use to some extent, more strongly at city than sub-city level. This implies that the mechanisms of the digital and non-digital language contact are not the same.</p>
<p>In the future, I will test the influence of social contexts as defined by buildings or location contexts on CS and LCB. For instance, the airport has been shown to play an important role for +CS-Engl. (<xref rid="fig8" ref-type="fig">Figure 8B</xref>), but the airport is not the most important context for English tweets posted by bilinguals (<xref rid="fig5" ref-type="fig">Figure 5B</xref>). Instead, bilinguals tweet slightly more in French than in English at the airport. However, in order to compare similar contexts such as all coffee bars or airports requires information about the social use of buildings and urban districts. <xref ref-type="bibr" rid="ref21">Lamarre et al. (2002)</xref> notice a difference in language use in formal/informal contexts by observing bilinguals&#x2019; language use as they move through the city. One could test the consequence of the difference between formal and informal contexts and CS or LCB in the future on the basis of tweets&#x2019; content. Formal and informal contexts usually correlate with different topics of discussion. Formal contexts often contain information of interest to the general public such as information about vaccination or elections. Informal contexts more often contain personal information of interest to particular user groups or people such as information about personal events or things. This hypothesis will be tested in the future on methods tailored for a topic analysis of tweets.</p>
<p>Another topic that needs to be explored in the future is user variation with respect to CS and LCB. User IDs can be used to find all tweets produced by the same user and then to sort users according to the intensity of Code-Switching and/or their language preferences. This method allows us to find users who are resistant to CS or who use CS very frequently. The geolocation of tweets produced by a specific user group, such as a user group with a high preference for English or French or users without a language preference, allows us to classify these user groups with respect to the locations they visit. <xref rid="fig14" ref-type="fig">Figure 14</xref> shows an analysis that explores user IDs to classify user groups according to their language preferences and to classify users by their visited location. The results show that users with the highest preference for English are dispersed throughout the city (marked by red points), whereas users with no language preference are concentrated in the city center (marked by green points). One tentative explanation for the concentration of users without a language preference is the assumption that the majority of users from the city center are entrepreneurs of some kind, who tweet in both languages to address as many clients as possible, including both French and English speakers. This hypothesis predicts that tweets from this user group should more often correspond to translations or paraphrases of the same content in different languages. This prediction can be tested by looking at temporal features of the tweets, assuming that they will very likely be posted at a similar time and by analyzing the content of the tweets. These methods will allow us to study ways of describing users and to contribute to sociolinguistic studies (<xref ref-type="bibr" rid="ref19">Labov, 2006</xref>).</p>
<fig position="float" id="fig14">
<label>Figure 14</label>
<caption>
<p>Distribution of English and French tweets per bilingual user group in GM. Tweets of bilingual users with preference for English (&#x003E;90% tweets in English) in red, with preference for French in blue and with no preference in green.</p>
</caption>
<graphic xlink:href="fpsyg-14-1137038-g014.tif"/>
</fig>
<p>Another question that needs to be addressed in the future is how the factor &#x201C;time&#x201D; influences the distribution of language use. The present study measures the distribution of language use in a particular period, namely on the basis of tweets from 2017 to 2021. It is possible that users from Montreal pass their time in different locations depending on the period (year or day). However, if we compare the language distribution of bilinguals in Greater Montreal performed on the dataset used in this study with the language distribution of all users in Montreal performed by <xref ref-type="bibr" rid="ref26">Mocanu et al. (2013)</xref> on an earlier dataset, we see a similar effect: English is used in the west and French is used in the east. This result suggests that the temporal difference in years does not matter much for language choice. In order to confirm this conclusion, though, other periods need to be considered.</p>
<p>Finally, more research needs to be done to investigate the extent to which bilingual users on Twitter represent bilingual speakers outside Twitter. It is also important to note that users allowing their geolocation to be seen publicly represent only a small proportion of all users on Twitter (<xref ref-type="bibr" rid="ref18">Kruspe et al., 2021</xref>). It is thus quite possible that the location differences observed in this study are only representative for a particular user group on Twitter. In order to address this issue, various methods have been suggested in the research on Twitter (<xref ref-type="bibr" rid="ref18">Kruspe et al., 2021</xref> for an overview). The representativity of Twitter users allowing their geolocation to be visible will be investigated in the future.</p>
<p>To sum up, this study has shown that sociodemographic factors measured by a speech community&#x2019;s size tend to influence CS and LCB on various spatial scales&#x2014;the city and sub-city scales&#x2014;but to a different extent. Geolocated tweets offer the possibility of studying the influence of location on language behavior. In order to maximize the benefit of social media data for sociolinguistic research, more techniques are needed to be able to analyze large quantities of data. This includes location classification by their social use and classification of tweets by user, topic, and language.</p>
</sec>
<sec id="sec30">
<title>Author&#x2019;s note</title>
<p>The methodology of this article is based on OK&#x2019;s previous work in <xref ref-type="bibr" rid="ref15">Kellert and Matlis (2022)</xref>.</p>
</sec>
<sec id="sec17" sec-type="data-availability">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/<xref rid="SM1" ref-type="supplementary-material">Supplementary material</xref>, further inquiries can be directed to the corresponding author.</p>
</sec>
<sec id="sec18">
<title>Author contributions</title>
<p>OK: data collection, formulation of the hypotheses, design of testing the hypotheses, discussion of results, and article writing and editing.</p>
</sec>
<sec id="sec19" sec-type="funding-information">
<title>Funding</title>
<p>OK acknowledges the support given by the Open Access Publication Funds of the G&#x00F6;ttingen University and by the German Research Foundation (DFG) (Grant number: 468416293).</p>
</sec>
<sec id="conf1" sec-type="COI-statement">
<title>Conflict of interest</title>
<p>The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="sec100" sec-type="disclaimer">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<ack>
<p>OK acknowledges the discussions of the results by the reviewers of this article and cooperation partners.</p>
</ack>
<sec id="sec21" sec-type="supplementary-material">
<title>Supplementary material</title>
<p>The Supplementary material for this article can be found online at: <ext-link xlink:href="https://www.frontiersin.org/articles/10.3389/fpsyg.2023.1137038/full#supplementary-material" ext-link-type="uri">https://www.frontiersin.org/articles/10.3389/fpsyg.2023.1137038/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.PDF" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Data_Sheet_2.pdf" id="SM2" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Data_Sheet_3.PDF" id="SM3" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Data_Sheet_4.pdf" id="SM4" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="ref1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Anselin</surname> <given-names>L.</given-names></name></person-group> (<year>1995</year>). <article-title>Local indicators of spatial association &#x2014; LISA</article-title>. <source>Geogr. Anal.</source> <volume>27</volume>, <fpage>93</fpage>&#x2013;<lpage>115</lpage>.</citation></ref>
<ref id="ref2"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Auer</surname> <given-names>P.</given-names></name></person-group> (<year>2007</year>). &#x201C;<article-title>The monolingual bias in bilingualism research, or: why bilingual talk is (still) a challenge for linguistics</article-title>&#x201D; in <source>Bilingualism. A social approach</source>. ed. <person-group person-group-type="editor"><name><surname>Heller</surname> <given-names>M.</given-names></name></person-group> (<publisher-loc>London</publisher-loc>: <publisher-name>Palgrave Macmillan</publisher-name>), <fpage>319</fpage>&#x2013;<lpage>339</lpage>.</citation></ref>
<ref id="ref3"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Auer</surname> <given-names>P.</given-names></name> <name><surname>Eastman</surname> <given-names>C. M.</given-names></name></person-group> (<year>2010</year>). &#x201C;<article-title>Code-switching</article-title>&#x201D; in <source>Society and language use</source>. eds. <person-group person-group-type="editor"><name><surname>Jaspers</surname> <given-names>J.</given-names></name> <name><surname>&#x00D6;stman</surname> <given-names>J. O.</given-names></name> <name><surname>Verschueren</surname> <given-names>J.</given-names></name></person-group> (<publisher-loc>Amsterdam, Philadelphia</publisher-loc>: <publisher-name>Benjamins</publisher-name>), <fpage>84</fpage>&#x2013;<lpage>112</lpage>.</citation></ref>
<ref id="ref9"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Berisso Genemo</surname> <given-names>T.</given-names></name></person-group> (<year>2022</year>). <article-title>Multilingualism and Language Choice in Domains [Internet]</article-title>. <source>Multilingualism - Interdisciplinary Topics. IntechOpen</source>. doi: <pub-id pub-id-type="doi">10.5772/intechopen.101660</pub-id></citation></ref>
<ref id="ref4"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Bouchard</surname> <given-names>P.</given-names></name></person-group> (<year>2000</year>). &#x201C;<article-title>Montr&#x00E9;al</article-title>&#x201D; in <source>Espaces urbains et coexistence des langues (Terminogramme 93&#x2013;94)</source>. ed. <person-group person-group-type="editor"><name><surname>Mackey</surname> <given-names>W. F.</given-names></name></person-group> (<publisher-loc>Quebec</publisher-loc>: <publisher-name>Office de la langue fran&#x00E7;aise</publisher-name>), <fpage>31</fpage>&#x2013;<lpage>57</lpage>.</citation></ref>
<ref id="ref5"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Bullock</surname> <given-names>B. E.</given-names></name> <name><surname>Toribio</surname> <given-names>A. J.</given-names></name></person-group> (<year>2009</year>). &#x201C;<article-title>Themes in the study of code-switching</article-title>&#x201D; in <source>The Cambridge handbook of linguistic code-switching</source>. eds. <person-group person-group-type="editor"><name><surname>Bullock</surname> <given-names>B. E.</given-names></name> <name><surname>Toribio</surname> <given-names>A. J.</given-names></name></person-group> (<publisher-loc>Cambridge, UK</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>), <fpage>1</fpage>&#x2013;<lpage>18</lpage>.</citation></ref>
<ref id="ref6"><citation citation-type="other"><person-group person-group-type="author"><collab id="coll1">Cartopy v0.11.2. Met Office</collab></person-group>. (<year>2014</year>). UK. Available at: <ext-link xlink:href="https://github.com/SciTools/cartopy/archive/v0.11.2.tar.gz" ext-link-type="uri">https://github.com/SciTools/cartopy/archive/v0.11.2.tar.gz</ext-link> (Accessed 24 January 2022).</citation></ref>
<ref id="ref7"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Cook</surname> <given-names>V.</given-names></name></person-group> <source>Second language learning and language teaching</source>. <publisher-loc>London, UK</publisher-loc>: <publisher-name>Edward Arnold</publisher-name> (<year>1991</year>). <fpage>168</fpage>.</citation></ref>
<ref id="ref01"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Fishman</surname> <given-names>J. A.</given-names></name></person-group> (<year>2000</year>). &#x201C;<article-title>Who speaks what language to whom and when?</article-title>&#x201D; in <source>The bilingualism reader: second edition</source>. ed. <person-group person-group-type="editor"><name><surname>Wei</surname> <given-names>L.</given-names></name></person-group> (<publisher-loc>Oxon</publisher-loc>: <publisher-name>Routledge</publisher-name>), <fpage>55</fpage>&#x2013;<lpage>69</lpage>.</citation></ref>
<ref id="ref8"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Gardner-Chloros</surname> <given-names>P.</given-names></name></person-group> (<year>2009</year>). &#x201C;<article-title>Code-switching and language contact</article-title>&#x201D; in <source>Code-switching</source>. (<publisher-loc>Cambridge</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>), <fpage>20</fpage>&#x2013;<lpage>41</lpage>.</citation></ref>
<ref id="ref10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gon&#x00E7;alves</surname> <given-names>B.</given-names></name> <name><surname>S&#x00E1;nchez</surname> <given-names>D.</given-names></name></person-group> (<year>2014</year>). <article-title>Crowdsourcing dialect characterization through twitter</article-title>. <source>PLoS One</source> <volume>9</volume>:<fpage>e112074</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0112074</pub-id>, PMID: <pub-id pub-id-type="pmid">25409174</pub-id></citation></ref>
<ref id="ref11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grieve</surname> <given-names>J.</given-names></name> <name><surname>Montgomery</surname> <given-names>C.</given-names></name> <name><surname>Nini</surname> <given-names>A.</given-names></name> <name><surname>Murakami</surname> <given-names>A.</given-names></name> <name><surname>Guo</surname> <given-names>D.</given-names></name></person-group> (<year>2019</year>). <article-title>Mapping lexical dialect variation in British English using twitter</article-title>. <source>Front. Artif. Intell.</source> <volume>2</volume>, <fpage>1</fpage>&#x2013;<lpage>18</lpage>. doi: <pub-id pub-id-type="doi">10.3389/frai.2019.00011</pub-id>, PMID: <pub-id pub-id-type="pmid">33733100</pub-id></citation></ref>
<ref id="ref12"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Gumperz</surname> <given-names>J. J.</given-names></name> <name><surname>Dell</surname> <given-names>H.</given-names></name></person-group> <source>Directions in sociolinguistics: The ethnography of communication</source>. <publisher-loc>New York</publisher-loc>: <publisher-name>Holt, Rinehart and Winston</publisher-name> (<year>1972</year>). <fpage>598</fpage> pp.</citation></ref>
<ref id="ref13"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Holmes</surname> <given-names>J.</given-names></name> <name><surname>Wilson</surname> <given-names>N.</given-names></name></person-group> <source>An introduction to sociolinguistics</source>. <publisher-loc>New York</publisher-loc>: <publisher-name>Longman</publisher-name> (<year>2013</year>). <fpage>512</fpage> p.</citation></ref>
<ref id="ref14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kellert</surname> <given-names>O.</given-names></name></person-group> (<year>2022</year>). <article-title>Gender neutral language in (greater) Buenos Aires, (greater) La Plata, and C&#x00F3;rdoba: an analysis of social context information using textual and temporal features</article-title>. <source>Front. Sociol.</source> <volume>7</volume>, <fpage>1</fpage>&#x2013;<lpage>10</lpage>. doi: <pub-id pub-id-type="doi">10.3389/fsoc.2022.805716</pub-id>, PMID: <pub-id pub-id-type="pmid">35372565</pub-id></citation></ref>
<ref id="ref15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kellert</surname> <given-names>O.</given-names></name> <name><surname>Matlis</surname> <given-names>N.</given-names></name></person-group> (<year>2022</year>). <article-title>Geolocation of multiple sociolinguistic markers in Buenos Aires</article-title>. <source>PLoS One</source> <volume>17</volume>:<fpage>e0274114</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0274114</pub-id>, PMID: <pub-id pub-id-type="pmid">36084118</pub-id></citation></ref>
<ref id="ref16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Konidaris</surname> <given-names>E.</given-names></name></person-group> (<year>2004</year>). <article-title>Code-switching among trilingual Montrealers: French, English, and a heritage language</article-title>. <source>J Natl Council Less Commonly Taught Lang</source> <volume>1</volume>, <fpage>19</fpage>&#x2013;<lpage>67</lpage>.</citation></ref>
<ref id="ref17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kremin</surname> <given-names>L. V.</given-names></name> <name><surname>Alves</surname> <given-names>J.</given-names></name> <name><surname>Orena</surname> <given-names>A. J.</given-names></name> <name><surname>Polka</surname> <given-names>L.</given-names></name> <name><surname>Byers-Heinlein</surname> <given-names>K.</given-names></name></person-group> (<year>2021</year>). <article-title>Code-switching in parents&#x2019; everyday speech to bilingual infants</article-title>. <source>J. Child Lang.</source> <volume>49</volume>, <fpage>714</fpage>&#x2013;<lpage>740</lpage>. doi: <pub-id pub-id-type="doi">10.1017/S0305000921000118</pub-id>, PMID: <pub-id pub-id-type="pmid">34006344</pub-id></citation></ref>
<ref id="ref18"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Kruspe</surname> <given-names>A.</given-names></name> <name><surname>H&#x00E4;berle</surname> <given-names>M.</given-names></name> <name><surname>Hoffmann</surname> <given-names>E. J.</given-names></name> <name><surname>Rode-Hasinger</surname> <given-names>S.</given-names></name> <name><surname>Abdulahhad</surname> <given-names>K.</given-names></name> <name><surname>Zhu</surname> <given-names>X. X.</given-names></name></person-group> Changes in twitter geolocations: insights and suggestions for future usage. ACL, Workshop W-NUT: The Seventh Workshop on Noisy User-generated Text (<year>2021</year>). p. 212&#x2013;221.</citation></ref>
<ref id="ref19"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Labov</surname> <given-names>W.</given-names></name></person-group> <source>The social stratification of English in New York City</source>. <publisher-loc>Cambridge, UK</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name> (<year>2006</year>).</citation></ref>
<ref id="ref20"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Laitinen</surname> <given-names>M.</given-names></name> <name><surname>Lundberg</surname> <given-names>J.</given-names></name></person-group> (<year>2020</year>). &#x201C;<article-title>ELF, language change and social networks: evidence from real-time social media data</article-title>&#x201D; in <source>Language change: The impact of English as a lingua Franca</source>. eds. <person-group person-group-type="editor"><name><surname>Mauranen</surname> <given-names>A.</given-names></name> <name><surname>Vetchinnikova</surname> <given-names>S.</given-names></name></person-group> (<publisher-loc>Cambridge</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>), <fpage>179</fpage>&#x2013;<lpage>204</lpage>.</citation></ref>
<ref id="ref21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lamarre</surname> <given-names>P.</given-names></name> <name><surname>Paquette</surname> <given-names>J.</given-names></name> <name><surname>Kahn</surname> <given-names>E.</given-names></name> <name><surname>Ambrosi</surname> <given-names>S.</given-names></name></person-group> (<year>2002</year>). <article-title>Multilingual Montreal: listening in on the language practices of young Montrealers</article-title>. <source>Can. Ethnic. Stud. J.</source> <volume>34</volume>:<fpage>47</fpage>&#x2013;<lpage>75</lpage>.</citation></ref>
<ref id="ref22"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Laur</surname> <given-names>E.</given-names></name></person-group> (<year>2003</year>). &#x201C;<article-title>Lecture sociale des s&#x00E9;gr&#x00E9;gations &#x00E0; Montr&#x00E9;al</article-title>&#x201D; in <source>Sociolinguistique urbaine, fronti&#x00E8;res et territoires</source>. eds. <person-group person-group-type="editor"><name><surname>Bulot</surname> <given-names>T.</given-names></name> <name><surname>Messaoudi</surname> <given-names>L.</given-names></name></person-group> (<publisher-loc>Namur</publisher-loc>: <publisher-name>Modulaires Europ&#x00E9;ennes</publisher-name>), <fpage>265</fpage>&#x2013;<lpage>302</lpage>.</citation></ref>
<ref id="ref23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Leimgruber</surname> <given-names>J. R. E.</given-names></name> <name><surname>Fern&#x00E1;ndez-Mallat</surname> <given-names>V.</given-names></name></person-group> (<year>2021</year>). <article-title>Language attitudes and identity building in the linguistic landscape of Montreal</article-title>. <source>Open Linguist</source> <volume>7</volume>, <fpage>406</fpage>&#x2013;<lpage>422</lpage>. doi: <pub-id pub-id-type="doi">10.1515/opli-2021-0021</pub-id></citation></ref>
<ref id="ref24"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Levy</surname> <given-names>A. J.</given-names></name> <name><surname>Karsai</surname> <given-names>M.</given-names></name> <name><surname>Magu&#x00E9;</surname> <given-names>J.-P.</given-names></name> <name><surname>Chevrot</surname> <given-names>J.-P.</given-names></name> <name><surname>Fleury</surname> <given-names>E.</given-names></name></person-group> Socioeconomic dependencies of linguistic patterns in twitter: a multivariate analysis. In: <italic>Proceedings of the 2018 World Wide Web Conference WWW&#x2019;18</italic>, (<year>2018</year>) 1125&#x2013;1134.</citation></ref>
<ref id="ref25"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Mendels</surname> <given-names>G.</given-names></name> <name><surname>Soto</surname> <given-names>V.</given-names></name> <name><surname>Jaech</surname> <given-names>A.</given-names></name> <name><surname>Hirschberg</surname> <given-names>J.</given-names></name></person-group> Collecting code-switched data from social media. In <italic>Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)</italic> (<year>2018</year>) Miyazaki, Japan. European Language Resources Association (ELRA).</citation></ref>
<ref id="ref26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mocanu</surname> <given-names>D.</given-names></name> <name><surname>Baronchelli</surname> <given-names>A.</given-names></name> <name><surname>Perra</surname> <given-names>N.</given-names></name> <name><surname>Gon&#x00E7;alves</surname> <given-names>B.</given-names></name> <name><surname>Zhang</surname> <given-names>Q.</given-names></name> <name><surname>Vespignani</surname> <given-names>A.</given-names></name></person-group> (<year>2013</year>). <article-title>The twitter of babel: mapping world languages through microblogging platforms</article-title>. <source>PLoS One</source> <volume>8</volume>:<fpage>e61981</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0061981</pub-id>, PMID: <pub-id pub-id-type="pmid">23637940</pub-id></citation></ref>
<ref id="ref27"><citation citation-type="book"><person-group person-group-type="author"><name><surname>M&#x00FC;ller</surname> <given-names>N.</given-names></name></person-group> <source>Code-switching</source>. <publisher-loc>T&#x00FC;bingen</publisher-loc>: <publisher-name>Narr Francke Attempto</publisher-name> (<year>2017</year>).</citation></ref>
<ref id="ref28"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Myers-Scotton</surname> <given-names>C.</given-names></name></person-group> (<year>2002</year>). <source>Contact linguistics: Bilingual encounters and grammatical outcomes</source>. (<publisher-loc>Oxford and New York</publisher-loc>: <publisher-name>Oxford University Press</publisher-name>).</citation></ref>
<ref id="ref29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Poplack</surname> <given-names>S.</given-names></name></person-group> (<year>1980</year>). <article-title>Sometimes I'll start a sentence in Spanish y termino en Espa&#x00F1;ol: toward a typology of code-switching</article-title>. <source>Linguistics</source> <volume>18</volume>, <fpage>581</fpage>&#x2013;<lpage>618</lpage>. doi: <pub-id pub-id-type="doi">10.1515/ling.1980.18.7-8.581</pub-id></citation></ref>
<ref id="ref30"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Poplack</surname> <given-names>S.</given-names></name></person-group> (<year>1985</year>). &#x201C;<article-title>Contrasting patterns of codeswitching in two communities</article-title>&#x201D; in <source>Codeswitching. Anthropological and sociolinguistic perspectives</source>. ed. <person-group person-group-type="editor"><name><surname>Heller</surname> <given-names>M.</given-names></name></person-group> (<publisher-loc>Berlin</publisher-loc>: <publisher-name>Mouton De Gruyter</publisher-name>), <fpage>215</fpage>&#x2013;<lpage>243</lpage>.</citation></ref>
<ref id="ref31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Scheffler</surname> <given-names>T.</given-names></name> <name><surname>Brandt</surname> <given-names>L.</given-names></name> <name><surname>de la Fuente</surname> <given-names>M.</given-names></name> <name><surname>Nenchev</surname> <given-names>I.</given-names></name></person-group> (<year>2022</year>). <article-title>The processing of emoji-word substitutions: a self-paced-reading study</article-title>. <source>Comput. Hum. Behav.</source> <volume>127</volume>:<fpage>107076</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.chb.2021.107076</pub-id></citation></ref>
<ref id="ref32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schweda</surname> <given-names>N.</given-names></name></person-group> (<year>1980</year>). <article-title>Bilingual education and code-switching in Maine</article-title>. <source>Linguist Rep</source> <volume>23</volume>, <fpage>12</fpage>&#x2013;<lpage>13</lpage>.</citation></ref>
<ref id="ref33"><citation citation-type="other"><person-group person-group-type="author"><collab id="coll2">Statistics Canada</collab></person-group> (<year>2011</year>) <comment>Available at: </comment><ext-link xlink:href="https://www12.statcan.gc.ca/census-recensement/2011/as-sa/98-314-x/98-314-x2011001-fra.pdf" ext-link-type="uri">https://www12.statcan.gc.ca/census-recensement/2011/as-sa/98-314-x/98-314-x2011001-fra.pdf</ext-link></citation></ref>
<ref id="ref34"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Termote</surname> <given-names>M.</given-names></name></person-group> (<year>2003</year>). &#x201C;<article-title>La dynamique d&#x00E9;molinguistique du Qu&#x00E9;bec et de ses r&#x00E9;gions</article-title>&#x201D; in <source>La d&#x00E9;mographie qu&#x00E9;b&#x00E9;coise: Enjeux du XXIe si&#x00E8;cle. Montr&#x00E9;al: Les Presses de l'Universit&#x00E9; de Montr&#x00E9;al, collection "Param&#x00E8;tres"</source>. eds. <person-group person-group-type="editor"><name><surname>Pich&#x00E9;</surname> <given-names>V.</given-names></name> <name><surname>Le Bourdais</surname> <given-names>C.</given-names></name></person-group>, <fpage>264</fpage>&#x2013;<lpage>299</lpage>.</citation></ref>
<ref id="ref35"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Timiou</surname> <given-names>E.</given-names></name></person-group> Montr&#x00E9;al&#x2014;Secteurs statistique 2011&#x2014;Langue Maison. Wikimedia Commons (<year>2014</year>). <comment>Available at: </comment><ext-link xlink:href="https://commons.wikimedia.org/wiki/File:Montr%C3%A9al_-_Secteurs_statistique_2011_-_Langue_Maison.svg" ext-link-type="uri">https://commons.wikimedia.org/wiki/File:Montr%C3%A9al_-_Secteurs_statistique_2011_-_Langue_Maison.svg</ext-link></citation></ref>
<ref id="ref36"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Unsworth</surname> <given-names>S.</given-names></name></person-group> (<year>2016</year>). &#x201C;<article-title>Quantity and quality of language input in bilingual language development</article-title>&#x201D; in <source>Bilingualism across the lifespan: Factors moderating language proficiency</source>. eds. <person-group person-group-type="editor"><name><surname>Nicoladis</surname> <given-names>E.</given-names></name> <name><surname>Montanari</surname> <given-names>S.</given-names></name></person-group> (<publisher-name>American Psychological Association</publisher-name>), <fpage>103</fpage>&#x2013;<lpage>121</lpage>.</citation></ref>
<ref id="ref37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Valenti</surname> <given-names>E.</given-names></name></person-group> (<year>2014</year>). <article-title>&#x201C;<italic>Nous autres c&#x2019;est toujours bilingue</italic> anyways&#x201D;: code-switching and linguistic displacement among bilingual Montr&#x00E9;al students</article-title>. <source>Am. Rev. Can. Stud.</source> <volume>44</volume>, <fpage>279</fpage>&#x2013;<lpage>292</lpage>. doi: <pub-id pub-id-type="doi">10.1080/02722011.2014.939423</pub-id></citation></ref></ref-list>
<fn-group>
<fn id="fn0004"><p><sup>1</sup>I have anonymized names and place names that could be linked to an individual person.</p></fn>
<fn id="fn0005"><p><sup>2</sup><ext-link xlink:href="https://blog.twitter.com/engineering/en_us/a/2015/evaluating-language-identification-performance" ext-link-type="uri">https://blog.twitter.com/engineering/en_us/a/2015/evaluating-language-identification-performance</ext-link></p></fn>
</fn-group>
</back>
</article>