I was updating my profile at the UK social networking site, Friends Reunited, and decided to pay attention to a question that I’d hitherto ignored: “What are your 20 favourite words”. This is what I came up with:love, peace, new, unexpected, honest, whimsy, intense, tranquil, touch, trust, humanism, experiment, imagination, learn, reflection, evolution, revolution, reason, create, teach
…and taking a far more practical approach to the matter, it struck me that for myself a more accurate impression might be had through counting the numbers of words in all my blogpostings, combined.
So I thought for a moment, engaged leet-drive and typed:
$ find articles/ archive/ -type f -print |
pargs cat |
perl -pe ‘s!<[^>]*?>!!go’ |
tr -cd ‘a-zA-Z \010\012’ |
tr ‘ A-Z’ ‘\012a-z’ |
tally |
sort -rn |
head -50 |
cat -n
…and it did prettymuch what I wanted, first time; pargs is a better xargs of my own devising, and tally is an awk script which counts instances of records which pipe through it. The only goof was not to expunge blank lines before running tally, but that’s a simple grep -v '^$'.
The results are decidedly boring:
| index | count | word |
|---|---|---|
| 1 | 15982 | the |
| 2 | 9082 | to |
| 3 | 8502 | a |
| 4 | 8322 | and |
| 5 | 7685 | of |
| 6 | 5577 | i |
| 7 | 4813 | in |
| 8 | 3628 | that |
| 9 | 3288 | is |
| 10 | 3182 | for |
| 11 | 3105 | it |
| 12 | 2421 | on |
| 13 | 2169 | with |
| 14 | 1950 | this |
| 15 | 1940 | was |
| 16 | 1927 | my |
| 17 | 1917 | be |
| 18 | 1827 | at |
| 19 | 1670 | you |
| 20 | 1615 | have |
…and things get only marginally more interesting further down:
| index | count | word |
|---|---|---|
| 71 | 485 | security |
| 91 | 349 | work |
| 124 | 254 | bike |
| 167 | 182 | police |
| 202 | 155 | british |
“Sex” is not even in the top 500, alas. Nor is “love”. Only some small crum of comfort does come from the fact that in 30 seconds I created an incantation that would functionally stump most people in the real world.
Being a old-style unix hacker is still useful.
Leave a Reply