Kabyle Text Statistics

File: kab.txt    Language: kabyle    Generated: 2025-11-05 18:29:56

1.1 Basic Statistics

3,838,679
Characters
978,510
Tokens
66,851
Unique Words
132,372
Sentences
0.0683
Type-Token Ratio
0.5344
Hapax Ratio

2 Characters & N-grams

Special Characters Analysis
total_special_chars260766
unique_special_chars33
special_char_frequency{'-': 111013, '.': 100225, '?': 28065, ',': 14501, '!': 4747, '"': 1408, ':': 281, ';': 170, '“': 72, '”': 68, "'": 53, '(': 36, ')': 35, '\u200b': 16, '%': 11, '‑': 9, '—': 8, '«': 7, '»': 7, '°': 6, '+': 4, '♓': 4, '€': 2, '…': 2, '*': 2, '#': 2, '$': 2, '–': 2, '[': 2, ']': 2, '/': 2, '&': 1, '̣': 1}
special_char_ratio0.06793118153406419
Kabyle Character Frequencies
4226
Ǧ308
231
ɛ18748
Ɛ1098
ɣ78599
ǧ8995
14245
31560
1556
98
Ɣ1555
Č205
22863
408
776
č10128
8029
Bigrams & Trigrams
Bi-grams: 3755 Tri-grams: 30630

3 Words

Top 50 Most Frequent Words 445619
WordCount
.94712
d39825
i28137
?28065
ad27461
ara18434
n15879
Tom14588
,14494
Ur11391
deg9490
s8000
ur7687
D6809
is5755
nni5375
iyi5335
t5227
a4972
ɣer4837
Mary4823
!4747
Ad4523
as4510
ɣef3731
yid3625
tt3607
acu3572
k3494
iw3383
neɣ3363
akk3198
id3074
kra3044
akken3040
fell3032
aya2771
ik2718
Yella2554
seg2359
kan2332
ma2187
am2109
ayen1991
s.1988
yiwen1964
aṭas1963
yella1946
iman1763
win1735
Top Prefixes
a148452
t147957
i96755
y79030
d78844
n49415
ye46873
te46647
u42806
ad33321
s30819
m24880
yi23955
w23757
k22415
ar21445
ta20645
ur19963
ara18508
l18384
to14903
tom14868
ɣ13761
tt13601
ak13411
ne13393
as13392
ma13294
ti12369
nn11397
de11369
deg10622
wa10543
im9850
ɣe9729
am9554
ay9187
is9177
f8970
d-8851
akk8835
tu8717
yid8070
ke7988
b7930
ac7873
yel7711
nni7417
tes7379
yes7142
Top Suffixes
n108936
d107519
a96187
i88442
t69174
ɣ54854
en52651
s47302
m44692
r43598
ad34076
32538
ra27188
l25786
k24877
23541
an23440
ur21839
ara19300
u15141
g15100
om14877
tom14864
la14428
eg14092
ni13865
eḍ13770
as13349
is13140
er12993
em12912
nt12771
in12121
12054
nni11357
-d10795
al10620
deg10328
id10189
ent10007
.10003
yi9812
y9545
wen9069
f9058
w8941
mt8901
neɣ8396
it7814
iyi7696
Longest Words (Top 1 000)
Longest Words in Top 1,000
yiseggasentafṛansisttamezwaruttefṛansistyettmeslaytafransisttiɣawsiwintaneggarutyimeddukaltefransistttxemmimeɣtakeṛṛustTanemmirttaqbaylitiseggasenmmeslayeɣtɣawsiwinteqbaylityemmeslayiceṭṭiḍenumeddakeltuccḍiwinimeddukalameddakeltamecṭuḥtTettbaneḍnemmeslayHalloweenimeṭṭawenUstṛalyawalebɛaḍtkeṛṛusttameddittameṭṭutTzemremtSsarameɣigerrzentḥemmleḍkennemtiyidrimenyimawlantezmireḍteqqimeḍamezwarutamaynutTḥemmleḍtxeddmeḍtxedmemtyedrimentemsulta
Longest Words (Top 10 000)
Longest Words in Top 10,000
Tettmeslayemttettmeslayemtssiwel-iyi-d.tettqellibemttesmenyifiyeḍtettxemmimemttimerkantiyinyettmeslayentettmeslayeḍTettmeslayeḍtettxemmimeḍtameẓrengayttamẓerbeṭṭutimerkantiyentemmeslayemtyettmuqqulenTettmeslayemyettfeṛṛiǧenyettnezzihenyettemsefhamtettqellibemtettmeslayemyettxemmimenimarikaniyenImarikaniyentettxemmimedyettbeddayentettqellibeḍteskiddibemtteskerkisemtyettfeǧǧiǧenyettwaxeyyebtemyussanemtyettqenniɛenyettḥezzibentameddakkelttettɛeṭṭileḍyettseḥḥirentettqeṣṣireḍTettxemmimeḍtettxemmimemtettnuddumeḍimeṛkantiyentettmuqqulemtettwaxeyyebtettkeyyifeḍtemmeslayeḍtameddakeltyettwassnenyisteqsiyen
Number & Pattern Occurrences
Number Patterns Summary
Pure Numbers662
Numbers with Non-alphanumeric96
Year Patterns (1980-2029)99
Date-like Patterns0
Decimal Numbers5
Numbers with Commas3
Word Length Distribution (Without Multiplicity)
3725
2219
69460
176
43015
55569
79729
95224
86116
102490
111063
12600
13222
1461
1531
183
192
241
175
1615
Word Length Distribution (With Multiplicity)
3123001
2107372
690812
1258587
495433
595820
754992
917792
826890
108269
112272
121238
13312
1473
1533
183
192
241
175
1615
Zipf Distribution
Zipf plot
Levenshtein Similarity Examples
Word AWord B
neɣ (freq: 3363)nneɣ (freq: 1196, dist: 1, sim: 0.75)
akk (freq: 3198)akka (freq: 954, dist: 1, sim: 0.75)
akken (freq: 3040)dakken (freq: 1255, dist: 1, sim: 0.833)
akken (freq: 3040)akked (freq: 855, dist: 1, sim: 0.8)
aya (freq: 2771)waya (freq: 846, dist: 1, sim: 0.75)
Yella (freq: 2554)yella (freq: 1946, dist: 1, sim: 0.8)
Yella (freq: 2554)Tella (freq: 842, dist: 1, sim: 0.8)
Yella (freq: 2554)tella (freq: 738, dist: 1, sim: 0.8)
ayen (freq: 1991)awen (freq: 873, dist: 1, sim: 0.75)
ayen (freq: 1991)asen (freq: 751, dist: 1, sim: 0.75)
ayen (freq: 1991)wayen (freq: 679, dist: 1, sim: 0.8)
yiwen (freq: 1964)yiwet (freq: 935, dist: 1, sim: 0.8)
yella (freq: 1946)yellan (freq: 914, dist: 1, sim: 0.833)
yella (freq: 1946)Tella (freq: 842, dist: 1, sim: 0.8)
yella (freq: 1946)tella (freq: 738, dist: 1, sim: 0.8)
yella (freq: 1946)yelli (freq: 586, dist: 1, sim: 0.8)
iman (freq: 1763)yiman (freq: 1473, dist: 1, sim: 0.8)
kent (freq: 1432)ken (freq: 788, dist: 1, sim: 0.75)
kent (freq: 1432)tent (freq: 754, dist: 1, sim: 0.75)
kent (freq: 1432)nkent (freq: 753, dist: 1, sim: 0.8)
kent (freq: 1432)akent (freq: 672, dist: 1, sim: 0.8)
nnes (freq: 1399)nneɣ (freq: 1196, dist: 1, sim: 0.75)
Bɣiɣ (freq: 1244)bɣiɣ (freq: 619, dist: 1, sim: 0.75)
Ilaq (freq: 1164)ilaq (freq: 922, dist: 1, sim: 0.75)
ten (freq: 1146)tent (freq: 754, dist: 1, sim: 0.75)
Amek (freq: 1117)amek (freq: 739, dist: 1, sim: 0.75)
nwen (freq: 1050)awen (freq: 873, dist: 1, sim: 0.75)
nwen (freq: 1050)wen (freq: 846, dist: 1, sim: 0.75)
nwen (freq: 1050)nsen (freq: 764, dist: 1, sim: 0.75)
Anwa (freq: 932)Anda (freq: 587, dist: 1, sim: 0.75)
yellan (freq: 914)yelhan (freq: 515, dist: 1, sim: 0.833)
mačči (freq: 878)Mačči (freq: 747, dist: 1, sim: 0.8)
awen (freq: 873)wen (freq: 846, dist: 1, sim: 0.75)
awen (freq: 873)asen (freq: 751, dist: 1, sim: 0.75)
nekk (freq: 861)Nekk (freq: 629, dist: 1, sim: 0.75)
Ulac (freq: 853)ulac (freq: 671, dist: 1, sim: 0.75)
uxxam (freq: 850)axxam (freq: 713, dist: 1, sim: 0.8)
Tella (freq: 842)tella (freq: 738, dist: 1, sim: 0.8)
fell-as (freq: 816)fell-i (freq: 552, dist: 2, sim: 0.714)
fell-as (freq: 816)fell-ak (freq: 425, dist: 1, sim: 0.857)
belli (freq: 815)yelli (freq: 586, dist: 1, sim: 0.8)
nsen (freq: 764)asen (freq: 751, dist: 1, sim: 0.75)
nkent (freq: 753)akent (freq: 672, dist: 1, sim: 0.8)
tenna (freq: 745)Yenna (freq: 659, dist: 1, sim: 0.8)
tenna (freq: 745)yenna (freq: 558, dist: 1, sim: 0.8)
tenna (freq: 745)tenna-d (freq: 472, dist: 2, sim: 0.714)
ugar (freq: 686)gar (freq: 424, dist: 1, sim: 0.75)
yid-s. (freq: 681)yid-s (freq: 485, dist: 1, sim: 0.833)
iman-is (freq: 674)yiman-is (freq: 594, dist: 1, sim: 0.875)
ulac (freq: 671)ula (freq: 504, dist: 1, sim: 0.75)

3.12 Advanced Word Patterns

Hyphenated Words 100
Palindromes 50
Reverse Pairs 30
Abbreviations 36

3.10 Text Coverage

29.9%
Top 10 words
51.6%
Top 100 words
70.8%
Top 1 000 words

4 Sentence Analysis

Shortest Sentences
...M.Ih.Ih.Ih.Ih!Ah!Ax!Ih.Ih.Aɣ!Ṭa!Ih?Aw!Ha!Ih.ih.Uk!Uk!Uk!Ih.Ttu.Ldi.Lli.Aha!Suɣ.Acu?Waw!Uhu!Knu!Tru.Ɣeṛ!Tfu!Tfu!Bed!Uhu.Sya.Zzi.Ala.Uhu.Sel!Cfu!Yya!Ihi?Ali.Knu!Ṛuḥ!Ɣer!
Longest Sentences
Mi d-yeqreb lawan n rrwaḥ, truḥ temɣart n yemma-s ɣer taxxamt n nadam, teddem-d yiwen n lmus d abestuḥ, tegzem yes-s iḍudan-is armi d-zerqen idammen ; imir teṭṭef-d aceṭṭiḍ d amellal swadda-nsen, terja armi d-ɣlint fell-as tlata tiqqa n yidammen, tefka-t i yelli-s, tenna-yas ; "A yelli ɛzizen, awi aceṭṭiḍ-a yid-m tḥadreḍ-d mliḥ : yezmer ad kem-yenfeɛ deg usikel." "Ma teffudeḍ", i tenna tqeddact, qqel ɣef udem zdat waman tesweḍ ; nekk ur ttuɣaleɣ ara ad iliɣ d taqeddact-im." Lukan kan ad teɛlem yemma-m, ul-is ad yebḍu ɣef sin.Tilmeẓyin-nneɣ d yilmeẓyen yettwarzen, tid akked wid yettnadin kan tayett ɣer win d tin ur-tt-nesɛi, gar iɣallen n umwanes d temwanest kan yellan deg ssɛaya n wid i umi tettunefk tegnitt i ifuṛsen wid ten-yeṛban deg miḥyaf n tumert, i sḍemɛen s yilellucen, s tirga, wigi ilaq-asen ad kerrcen aqendur akken ad nadin ɣef tmeddurt ara ten-yessufɣen seg tillas d timḍellas, s yiman-nsen, ama d tallest ama d alles.D tidet belli win yerran i yal yiwen ayla-s imi yugad tacangalt, ixeddem akken s lameṛ n wiyaḍ yerna d cceṛ-nni yetthabi i t-iḥettmen ; ur nezmir ad nini d aḥeqqi i yella : maca win yerran i yal yiwen ayla-s acku yessen acu d isuḍaf d tḍullit-nsen, winna d lewfeq yesɛa d yiman-is i t-yeǧǧan iga akken yerna d netta i tt-igezmen deg ul-is, mačči s lameṛ n wiyaḍ ; yukal ihi ma nenna atan d aḥeqqi..Yiwen weqcic yezzenz tafunast deg ssuq n Hereford, dɣa yezwar-as-d yiwen uqeṭṭaɛ, i iḥeṛṛmen deg-s deg umkan i yextar ad s-d-yefk idrimen-nni ; aqcic-nni yeṭṭef-itt ihi d tazzla akken ad yerwel, maca akken yettwaṭṭef, yers-d seg uɛidiw, yekkes-d idrimen seg lǧib, yezzuzer-iten akka d wakka, dɣa akken yella umakar ijemmeɛ-iten-id, netta yuli ɣef uɛidiw, yeṭṭef-itt s uqlaqal armi d axxam.Izumal n tfekka d yiman ɣur yemdanen at wannuz, yessefk ad ilin d ttejra d waman akken mseḍfaren, acku ttejra tettǧhid ɣer berra trennu tgemmu ɣef teɣzi n wakud akken ad tesdari wid yeḥwaǧen u ad sen-d-terr tili ; si tama nniḍen, aman irekkden i talwit, kifkif i neffɛen i yal yiwen yerna sɛan kra n yiɣil i ufessed yezmer ad ihudd aɣlanen-nni yeqwan akk deg umaḍal.Yessuqel-d Igider ɣer teqbaylit isefra yettwassnen am Yugurten akked Aɣeṛṛabu icaxen n Arthur Rimbaud, El Desdichado n Gérard de Nerval, Agelmim akked Aɛzal n Alphonse de Lamartine, Taẓuṛi n tmedyezt n Paul Verlaine, Abeḥri n lebḥeṛ n Stéphane Mallarmé, Amasrag n Charles Baudelaire, Azekka n tafrara n Victor Hugo d isefra n yimedyazen nniḍen."Tḥemmleḍ izerman?" "Iban akk ala." "Tḥemmlem izerman?" "Iban akk ala." "Tḥemmlemt izerman?" "Iban akk ala." "Amek i tettruḥuḍ s aɣerbaz?" "Deg usakal." "Amek i tettruḥum s aɣerbaz?" "Deg usakal." "D acu-tent tigi?" "Tigi d tiwlafin-ik." "D acu-tent tigi?" "D tiwlafin-im." Tɣemmez-iyi-d am akken tenna-d: "ḥemmleɣ-k".Targit-nni n tegnatin ara d-yeɣlin i medden akk, ur ɛad teffiɣ i yal yiwen deg Marikan, maca lweɛd-is mazal-t yella i kra n win ara d-yawḍen sswaḥel-nneɣ - aya d ayen deg i d-ddan qrib sebɛa imelyan n Yimselmen imarikaniyen deg tmurt-nneɣ assa, seg wid ileḥḥqen ticehriyin d leqraya s wayen yugaren tanammast.Akken yebda lexyal-nni n tebḥirt yettembiwil yettsuɣ armi yesserwel akk ifrax yellan din yerna yessexleɛ Martin armi qrib yemmut, yezzi-d, iɣemmez-as i ugrud-nni, yenna-yas : " Ɣas kkes anezgum, tzemreḍ dima ad tettekleḍ fell-i, ma d igḍaḍ-ihin imelɛan ur tetten ara imɣan-ik." Ad d-tṣubbeḍ neɣ ad n-aliɣ.Ixxamen si yal tama llan meqqrit yerna ɛlayit, maca d iqdimen mliḥ, yerna i ten-izedɣen d tasmelt-nni tigellilt maḍi : d ayen i d-tesbeyyin ddeqs-is ṣṣifa-nni-nsen iwumi serrḥen, ɣas ma ur d-teddi tnagit n tmuɣli-nni tameɛfunt n kra-nni n yergazen d tlawin i yettɛeddin ssya ɣer s yiɣallen-nsen mxallfen.Yeckem-d ɣer texxamt yiwen ujantlman d amɣar s nnwaḍer n wuṛeɣ d ucekkuḥ aciban, yenna s kra n tnaɣa akken n Yefransisen : "D mas Erskine i yi-d-iṣuḥ yiseɣ ad laɛiɣ akka?" Iḍ-nni i d imensawen n wass ideg ara s-rren tasmert, dɣa agellid-nni meẓẓiyen yella yeqqim weḥd-s deg texxamt-is icebḥen.Akken i d-inna Thomas Jefferson: "Ma tessunefeḍ cwiṭ n talwit af umud n tlelli, yiwet deg-sent ur tt-tuklaleḍ." Akken i d-inna Thomas Jefferson: "Ma tessunefeḍ cwiṭ n talwit af umud n tlelli, yiwet deg-sent ur tt-tuklaleḍ." Yeggul deg-i ad eɣzeɣ tasraft i baba, u yerna yuker-iyi agelzim.Uzzleɣ deg cceṭ a sseɛwaǧeɣ deg ifassen-iw, kkateɣ deg uqerru-w d wudem-iw, la ttwehhimeɣ deg lmeḥna ideg lliɣ, ttsuɣuɣ: "A nnger-iw, a nnger-iw!" armi dayen ɛyiɣ rniɣ duxeɣ, terra-yi tmara ad ẓẓleɣ ɣer lqaɛa akken ad sgunfuɣ, maca ur qdifeɣ ara ad ṭṭseɣ seg akken ugadeɣ ad ttwiččeɣ."Atan tura ad wen-nessels i sin yid-wen am teqcicin, i ikemmel yenna, wissen ahat ur d-qerrṣen ara fell-awen." Setḥerṣeɣ, nniɣ akken lliɣ, am urgaz yettlusen aserwal, maca ɣas ini ur ɛeṭṭleɣ ara sellmeɣ mi d-yessumer baba ad d-yaf kra n uqcic iḍen iwumi ara yessels u ad dduɣ d Jed.I tezɣal izemren ad ṣubbent ddaw n -30 °C, ɣef teɣzi n yiwen umecwaṛ yettnegzamen ssya ɣer da s yigidan n wegris i d-yettḍummu waḍu, 160 km d tasuft yexlan, atta-ya ad taẓ yiwet n terbaɛt i selḥawen sin imassanen n NASA, deg tjumma n wegris anṭarktiki ur yeɛfis yiwen uqbel.Mehenni yesbedd Amussu i Timanit n tmurt n Yiqbayliyen anida yuɣal d aselway n Unabaḍ Aqbayli Uɛḍil di lɣerba deg Fṛansa seg tallit n tefsut taberkant n tmurt n yiqbayliyen deg 2001, anida aɣref aqbayli yella yettqabal aseḥreṣ n Unabaḍ Azzayri mgal idles-is d tutlayt-is.D luluf n teɣbula tumḍinin n uɣawas amezwaru i d-yeffɣen s talɣa n tɣuri n tmacinin akken ad mmeẓrent s wudem azayez, ilelli, deg usmel web data.gov, si tid i d-yeṭṭfen seg isefka n waddad n tegnawt alamma d isisemlen n tɣellist n tkeṛyas d ssuma n tkaliwin n tdawsa.Iwennaten gelmen-d ṣṣut-nni ujewwaq n vuvuzella d akken "yesseɛyaw" yerna "d imciṭen", qurnen-t dɣa ɣer "tqeḍɛit n yilwen yeṭṭef yisiḍ", ɣer "terẓaẓayt n yibẓaẓ", "taɣaṭ iteddun ɣer tzelli", "taɣrast tageɛmirt yeččuren d tizizwa yessḍen" neɣ "abṛik yeččan lehyuf".Ma d nekk ɛeddaɣ-d gar waɣlanen-nni yakk am win i d-iɛeddan gar lɣaci yeqwan deg kra n lferja akken ad d-yeg abrid i iman-is, s yiwen ufus ṭṭfeɣ anzaren-iw, s win nniḍen ɛusseɣ leǧyub-iw, ur luɛaɣ yiwen seg-sen, nekk i yellan ḥareɣ ad waliɣ ayen i bɣiɣ ad t-waliɣ.Aql-i d nekk, Robinson Crusoe ameɣbun meskin, i wumi yezder lbabuṛ deg i d-iruḥ deg yiwet n tzawwa tarehbanit deg uzegza n yilel, kecmeɣ-d tigzirt-a tamekḥust, tuḥzint, i wumi semmaɣ "Tigzirt n Layas"; urbaɛ usilel akk ččan-ten waman yerna ula d nekk qrib mmuteɣ.
Sentence Length Statistics
Average Length (Characters)28.0
Average Length (Words)5.8
Total Sentences132372
Generated by Kabyle Text Statistics • 2025-11-05 18:29:56