I had a chance to look at the GOP primary debate this weekend between going to the amazing shows that come to town during the weeks surrounding Jazzfest.
I thought it would be fun to do a little analysis of word frequencies:
The number in parenthesis is the word count.
>Huckabee(31)>Paul,Reagan(29)>Thompson(27)>Hunter(21)
Below I’ve generated a simplistic “most significant” measure, by computing the ratio of the word frequency in the debate transcript to the word frequency in a corpus of spoken English. The number in the first table is the ratio, whereas the number in the second table is the word count. I’m going to get this in Exhibit and have a play with some neato visualizations as soon as I get a chance.
Here are the caveats: The word frequencies I’m using in the first table are actually from the British National Corpus, so many of the words in the speech transcript are highly represented simply because they’re speaking American English, and in the second table, using the ANC Corpus, many words are highly represented because they represent current events that hadn’t occurred when the corpus was compiled and because debates are part speeches, which are more like written English. I’ll update it when I find a better reference.
Top 100 Words in GOP Primary Debate 2007
sorted by appearance ratio
GOVERNOR | 589.4664 |
IRAN | 335.5764 |
IRAQ | 174.853 |
CLINTON | 137.7629 |
FEDERAL | 90.07577 |
TAXES | 84.77719 |
COALITION | 79.47862 |
STATES | 74.18005 |
READER | 74.18005 |
BUSH | 70.64766 |
CALIFORNIA | 61.8167 |
AMERICANS | 61.1374 |
BORDER | 57.06157 |
NATION | 56.51813 |
BELIEFS | 52.98575 |
DIPLOMATIC | 52.98575 |
EXPORTS | 52.98575 |
MILITARY | 49.86894 |
UNITED | 47.09844 |
SPENDING | 46.07456 |
WASHINGTON | 45.41635 |
DEMOCRATS | 44.15479 |
PROGRAM | 39.73931 |
SECURE | 39.73931 |
ACQUIRE | 39.73931 |
DEFEAT | 39.73931 |
KOREA | 39.73931 |
NUCLEAR | 39.04213 |
WEAPONS | 38.53509 |
PRESIDENT | 38.34495 |
MAYOR | 37.84696 |
ISRAEL | 37.09002 |
ACQUISITION | 35.32383 |
AUTHORS | 35.32383 |
JOURNAL | 35.32383 |
LIMITATIONS | 35.32383 |
PRESIDENTIA | 35.32383 |
TROOPS | 33.7182 |
FOREIGN | 33.3614 |
GLOBAL | 33.11609 |
AMERICA | 32.99113 |
AMERICAN | 32.75483 |
VALUES | 31.79145 |
GAINS | 31.79145 |
SUPREME | 31.79145 |
GREATEST | 29.80448 |
SERVING | 29.43653 |
FAITH | 26.49287 |
ENTIRE | 26.49287 |
CONSTITUTIO | 26.49287 |
ILLEGAL | 26.49287 |
CATHOLIC | 26.49287 |
COMMANDER | 26.49287 |
ACCOMPANIE | 26.49287 |
JUDICIAL | 26.49287 |
VIEWED | 26.49287 |
WALKER | 26.49287 |
PROTECT | 25.43316 |
THREAT | 24.93447 |
CELLS | 24.45496 |
GRADE | 24.08443 |
CANDIDATES | 23.54922 |
WEAPON | 23.54922 |
CONCERNING | 22.70818 |
JUDGES | 22.70818 |
BILLS | 22.30979 |
ELECTED | 22.30979 |
WELFARE | 22.07739 |
STABILITY | 21.1943 |
ADMINISTRATI | 20.60557 |
FORMER | 20.18505 |
DEFICIT | 19.86966 |
CELL | 19.52106 |
SOLVE | 18.92348 |
VOTED | 18.92348 |
PROUD | 18.54501 |
LEAD | 18.1267 |
MIDDLE | 18.1267 |
TAX | 17.82698 |
CRITICAL | 17.66192 |
FREEDOM | 17.66192 |
CONSISTENT | 17.66192 |
PRINCIPLES | 17.66192 |
TRANSFER | 17.66192 |
EXPERIMENT | 17.66192 |
INTELLIGENCE | 17.66192 |
ROMAN | 17.66192 |
SUCCEED | 17.66192 |
DISCRETION | 17.66192 |
ENEMY | 17.66192 |
STUDIED | 17.66192 |
WEALTH | 17.66192 |
COLLAPSE | 17.66192 |
CONCLUDED | 17.66192 |
CONVICTION | 17.66192 |
HUMANS | 17.66192 |
PAKISTAN | 17.66192 |
REVEAL | 17.66192 |
SEPARATION | 17.66192 |
WIN | 16.89401 |
Top 50 Words using ANC Corpus
sorted by appearance ratio
9 | KARL | |
7 | OPTIMISM | |
21 | RONALD | |
4 | CONFRONT | |
4 | SCOOTER | |
9 | ISLAMIC | |
3 | ALTERED | |
3 | CONSERVATIVES | |
3 | VETOED | |
3 | DIPLOMATIC | |
7 | REGIMES | |
7 | REPEAL | |
12 | STEM | |
4 | AISLE | |
4 | HYDE | |
4 | STRENGTHS | |
2 | BATTALIONS | |
2 | CURES | |
2 | FLATTER | |
2 | GOVERNS | |
2 | HOSTILITY | |
2 | JUSTICES | |
2 | NOMINEE | |
2 | PARDONS | |
2 | SECRECY | |
2 | UNIFY | |
2 | EXPORTS | |
6 | CELLS | |
6 | ENGAGE | |
3 | CONSERVATISM | |
3 | CRITICIZED | |
15 | ID | |
4 | IRANIANS | |
5 | BIN | |
5 | STRENGTHEN | |
5 | CELEBRATE | |
4 | MISMANAGED | |
3 | RACISM | |
2 | ABORTIONS | |
2 | COMMUNION | |
2 | CONVEY | |
2 | CROSSES | |
2 | CURING | |
2 | ENDORSE | |
2 | GOVERNED | |
2 | IMPERATIVE | |
2 | TAMPER |