Swiss-AL Media Corpora exclusively contain articles from journalistic media in German, French, Italian and Rhaeto-Romance. Since 2021, all articles in German and French are obtained via Swissdox@LiRi and provided by the Swiss Media Database. The corpora are published in December each year, covering the last 5 years. They are used as the database for selection of the Swiss word of the year (WDJ: Wort des Jahres).
Since 2021, the corpora for French and German needed to be downsampled because of the quantity of articles available (making the resulting corpora to large for the corpus analysis tools). A stratified sampling method was used: 25% of all texts per source per week were randomly chosen.
The texts in the Italien corpus are either obtained via Swissdox@LiRi or are collected via web crawling (no paywalled content).
S_AL_DE_WDJ_21_SAMP
- size: 597k texts
- stratified sampling: 25% of all texts per source per week
- time span: 2016-2021
Sources
acronym |
texts |
source |
APPZ |
5347 |
Appenzeller Zeitung |
AZM |
21295 |
Aargauer Zeitung |
AZO |
3117 |
aargauerzeitung.ch |
BAZ |
18214 |
Basler Zeitung |
BEO |
1239 |
Beobachter |
BEOL |
5470 |
Berner Oberländer |
BEOLO |
1555 |
berneroberlaender.ch |
BEOO |
543 |
beobachter.ch |
BIZ |
900 |
Bilanz |
BIZO |
916 |
bilanz.ch |
BLI |
15623 |
Blick |
BLIA |
4464 |
Blick am Abend |
BLIO |
47230 |
blick.ch |
BT |
2202 |
Badener Tagblatt |
BTO |
2125 |
badenertagblatt.ch |
BU |
11186 |
Der Bund |
BZ |
16811 |
Berner Zeitung |
BZB |
3584 |
bz - Zeitung für die Region Basel |
BZBO |
2518 |
bzbasel.ch |
BZM |
4701 |
Basellandschaftliche Zeitung |
CAMP |
9 |
NZZ Campus |
ENC |
132 |
encore! (dt) |
FRME |
100 |
frame |
FUW |
4233 |
Finanz und Wirtschaft |
FUWO |
6998 |
fuw.ch |
GSCH |
158 |
NZZ Geschichte |
GTB |
1530 |
Grenchner Tagblatt |
GTBO |
1410 |
grenchnertagblatt.ch |
HZI |
226 |
HZ Insurance |
LAL |
481 |
Schweizer LandLiebe |
LAT |
1677 |
Langenthaler Tagblatt / MLZ |
LB |
10564 |
Der Landbote |
LBO |
2413 |
landbote.ch |
LTZ |
6157 |
Limmattaler Zeitung / MLZ |
LTZO |
1267 |
limmattalerzeitung.ch |
LUZ |
16847 |
Luzerner Zeitung |
LUZO |
4505 |
luzernerzeitung.ch |
NIW |
3687 |
Nidwaldner Zeitung |
NNBE |
13873 |
bernerzeitung.ch |
NNBS |
7737 |
bazonline.ch |
NNBU |
8991 |
derbund.ch |
NNTA |
9195 |
Newsnet / Tages-Anzeiger |
NZZ |
21810 |
Neue Zürcher Zeitung |
NZZB |
492 |
bellevue.nzz.ch |
NZZF |
192 |
NZZ Folio |
NZZG |
74 |
NZZ PRO Global |
NZZM |
333 |
NZZ am Sonntag Magazin |
NZZO |
31590 |
nzz.ch |
NZZS |
6743 |
NZZ am Sonntag |
OBW |
3801 |
Obwaldner Zeitung |
OLT |
5531 |
Oltner Tagblatt / MLZ |
OLTO |
1396 |
oltnertagblatt.ch |
SBLI |
4934 |
Blick.ch |
SF |
1864 |
Schweizer Familie |
SGT |
19735 |
St. Galler Tagblatt |
SGTO |
5443 |
tagblatt.ch |
SHZ |
3296 |
Handelszeitung |
SHZO |
6272 |
handelszeitung.ch |
SI |
2739 |
Schweizer Illustrierte |
SIG |
122 |
SI Gruen |
SIO |
3132 |
schweizer‐illustrierte.ch |
SISP |
202 |
SI Sport |
SOZM |
8057 |
Solothurner Zeitung / MLZ |
SOZO |
1578 |
solothurnerzeitung.ch |
SRF |
33810 |
srf.ch |
SRFV |
2130 |
srf Video |
SWII |
1250 |
swissinfo.ch |
TA |
13632 |
Tages-Anzeiger |
TAM |
734 |
Das Magazin |
TAS |
4703 |
SonntagsZeitung |
TASI |
1265 |
Thalwiler Anzeiger/Sihltaler |
TAZT |
1053 |
züritipp (Tages-Anzeiger) |
TBT |
5338 |
Toggenburger Tagblatt |
TELE |
966 |
Tele |
THT |
2023 |
Thuner Tagblatt |
THTO |
1570 |
thunertagblatt.ch |
TVLL |
134 |
TV Land & Lüt |
TVS |
605 |
TV Star |
TVST |
160 |
Streaming |
TVZW |
234 |
TV2 |
TZ |
19340 |
Thurgauer Zeitung |
URZ |
5095 |
Urner Zeitung |
WEOB |
8682 |
Werdenberger & Obertoggenburger |
WOZ |
2226 |
Die Wochenzeitung |
WZ |
5507 |
Wiler Zeitung |
ZHUL |
8884 |
Der Landbote |
ZHUO |
1540 |
zuonline.ch |
ZOF |
9495 |
Zofinger Tagblatt / MLZ |
ZSZ |
12368 |
Der Landbote |
ZSZO |
1623 |
zsz.ch |
ZUGB |
467 |
Zugerbieter |
ZUGP |
595 |
Zuger Presse |
ZUGZ |
6635 |
Zuger Zeitung |
ZWA |
17846 |
20 minuten |
ZWAF |
245 |
20 minuten friday |
ZWAO |
27277 |
20 minuten online |
S_AL_FR_WDJ_21_SAMP
- size: 157k texts
- stratified sampling: 25% of all texts per source per week
- time span: 2016-2021
Sources
acronym |
texts |
source |
BILA |
1020 |
Bilan |
BLIO |
1771 |
blick.ch |
ENCF |
135 |
encore! |
FEM |
1584 |
Femina |
HEB |
130 |
LHebdo |
HEU |
13350 |
24 heures |
ILLE |
2119 |
LIllustré |
ILLO |
198 |
NA |
NNHEU |
15481 |
24heures.ch |
NNTDG |
14187 |
Newsnet / Tribune de Genève |
NNTLM |
22023 |
lematin.ch |
PME |
596 |
PME Magazine |
PMEO |
100 |
pme.ch |
RTS |
14805 |
rts.ch |
SWII |
1232 |
swissinfo.ch |
TDG |
11852 |
La Tribune de Genève |
TLM |
4230 |
Le Matin |
TLMD |
5008 |
Le Matin Dimanche |
TPS |
14985 |
Le Temps |
TPSO |
3460 |
letemps.ch |
TVHU |
1318 |
TV 8 |
ZWAS |
14550 |
20 minutes |
ZWSO |
31679 |
20 minutes online |
S_AL_IT_WDJ_2021
Sources
acronym |
texts |
source |
azione |
7743 |
Azione |
cdt |
63628 |
Corriere del Ticino |
coopzeitung |
566 |
Coopzeitung |
laregione |
59777 |
La Regione |
mattinonline |
12963 |
MattinOnline |
RSI |
14956 |
rsi.ch |
rsinews |
74677 |
RSI News |
SWII |
4574 |
swissinfo.ch |
ticinonews |
41082 |
Ticino News |
tio |
177603 |
tio.ch |
The texts in the corpus were published online on media websites listed below. No pay-walled data is included in the corpus. The releases covers the time span from October 2015 to October 2020.
S_AL_WDJ20_DE
Sources
acronym |
texts |
class |
subclass |
source |
blick |
209099 |
media |
online |
Blick |
grenchnertagblatt |
199140 |
media |
daily_newspaper |
Grenchner Tagblatt |
watson |
100786 |
undefined |
undefined |
Watson |
tagesanzeiger |
87582 |
media |
daily_newspaper |
Tagesanzeiger |
basellandschaftlichezeitung |
82027 |
media |
daily_newspaper |
Basellandschaftliche Zeitung |
srf |
79939 |
media |
online |
Schweizer Radio und Fernsehen |
nzz |
58377 |
media |
online |
Neue Zürcher Zeitung |
suedostschweiz |
55635 |
media |
daily_newspaper |
Südostschweiz |
bazonline |
48634 |
media |
daily_newspaper |
Basler Zeitung |
blickamabend |
18247 |
media |
online |
Blick am Abend |
derbund |
15311 |
media |
daily_newspaper |
Der Bund |
woz |
8159 |
media |
weekly_newspaper |
Die Wochenzeitung |
coopzeitung |
5807 |
media |
weekly_newspaper |
Coop Zeitung |
20min |
4479 |
media |
online |
20 Minuten |
migroszeitung |
379 |
media |
weekly_newspaper |
Migros Magazin |
Be careful: Since the corpus is quite large, it needs some time to load the results.
Due to performance reasons, the LDA topic model was calculated for a sample of 400.000 texts.
S_AL_WDJ20_IT
Sources
acronym |
texts |
class |
subclass |
source |
rsinews |
79902 |
media |
online |
Radiotelevisione Svizzera |
tio |
78834 |
media |
online |
Ticinonline |
cdt |
46113 |
media |
weekly_newspaper |
Corriere del Ticino |
ticinonews |
33858 |
media |
online |
Ticino News |
gdp |
31647 |
media |
daily_newspaper |
Giornale del popolo |
laregione |
11445 |
media |
daily_newspaper |
La Regione |
azione |
6459 |
media |
weekly_newspaper |
Azione |
mattinonline |
5273 |
media |
online |
Il Mattino Online |
coopzeitung |
1859 |
media |
weekly_newspaper |
Coop Zeitung |
S_AL_WDJ20_FR
Sources
acronym |
texts |
class |
subclass |
source |
lematin |
97619 |
media |
daily_newspaper |
Le Matin |
24heures |
71526 |
media |
online |
24 Heures |
letemps |
66452 |
media |
daily_newspaper |
Le Temps |
rts |
59864 |
media |
online |
Radio Télévision Suisse |
tdg |
36043 |
media |
daily_newspaper |
Tribune de Genève |
lagefi |
14247 |
media |
daily_newspaper |
L'Agefi |
ghi |
13033 |
media |
weekly_newspaper |
Genève home informations |
onefm |
7039 |
media |
online |
One FM |
lecourrier |
3359 |
media |
daily_newspaper |
Le Courrier |
coopzeitung |
2240 |
media |
weekly_newspaper |
Coop Zeitung |
20min |
1211 |
media |
online |
20 Minuten |
leman |
536 |
media |
online |
Leman Bleu |
migroszeitung |
168 |
media |
weekly_newspaper |
Migros Magazin |
S_AL_WDJ20_RM
This corpus contains texts from RTR (Radiotelevisiun Svizra Rumantscha) and is a first attempt to build a media corpus in Rumantsch. It was used for the Swiss "Word of the Year" and contains data from 2011 to 2020 (but with very few token for the years 2011-2014)