Forum Posts Word Count

Topic's author

#1

badger

Debates: 0

Posts: 2,472

3

badger

01.23.2023 11:20AM

3

01.23.2023 11:20AM

So I was thinking forum post count is a bit of an abstraction away from real life. Word count however is something we see in a whole lot of places.

I wrote a script to get word count for people's forum posts. It's not exactly correct. I'm only splitting on spaces, so words split by forward slashes or colons will get by. Also, I suck at web development and the only response I could get from this place was html, so that was a pain. (Was gonna do debates and debate comments and questions too until I realised how much of a pain it was.)

RationalMadman has written 983,946 words on the DebateArt forums. That's roughly equivalent to 14 theses or 11 novels, or the entire Harry Potter series.

That's fun, right.

#2

RationalMadman

Debates: 573

Posts: 19,931

10

11

RationalMadman

01.23.2023 11:30AM

10

11

01.23.2023 11:30AM

I never related to the way boys or men talk so little when they speak. It's like they say shit in the most basic and dull way possible, I'm rather verbose and I like being it.

That doesn't mean I speak amazingly 'well' to women but the reverse is very true, when I hear the average female preacher or lecturer on a topic I am more likely to comprehend her way of teaching, this happened to me my entire school life through to university and I could never pinpoint why until I realised it's because women word things more strung along where men leave gaps often expecting you to fill in the blanks (men literally talk in bullet points quite often).

#3

RationalMadman

Debates: 573

Posts: 19,931

10

11

RationalMadman

01.23.2023 11:38AM

10

11

01.23.2023 11:38AM

Not that I am 'embarassed' to post that much, rather I know I do that most places I verbally/textually interact but I must say that you include quotes in that count for sure and links etc. I am not downplaying the count being so high but some of it is due to quoting.

#4

PREZ-HILTON

Debates: 18

Posts: 2,806

3

4

9

PREZ-HILTON

01.23.2023 11:55AM

3

4

9

01.23.2023 11:55AM

-->

@badger

RationalMadman has written 983,946 words on the DebateArt forums

I am working on fixing this by having Mike make the lifetime word count of an individual 500,000.

#5

Best.Korea

Debates: 417

Posts: 12,563

4

6

10

Best.Korea

01.23.2023 12:44PM

4

6

10

01.23.2023 12:44PM

RationalMadman has written 983,946 words on the DebateArt forums. That's roughly equivalent to 14 theses or 11 novels, or the entire Harry Potter series.

Thats a lot.

#6

Best.Korea

Debates: 417

Posts: 12,563

4

6

10

Best.Korea

01.23.2023 12:46PM

4

6

10

01.23.2023 12:46PM

Shila would surpass everyone if she continued her 80 posts per day.

#7

Intelligence_06

Debates: 172

Posts: 3,954

5

8

11

Intelligence_06

01.23.2023 01:04PM

5

8

11

01.23.2023 01:04PM

-->

@PREZ-HILTON

I am working on fixing this by having Mike make the lifetime word count of an individual 500,000.

So a word limit. Upper bound or lower bound.

If I misunderstood anything, feel free to correct me.

#8

Intelligence_06

Debates: 172

Posts: 3,954

5

8

11

Intelligence_06

01.23.2023 01:08PM

5

8

11

01.23.2023 01:08PM

The main point for posting here is definitely not solely posting for posting. On the contrary, that is spam. We are never meant to just post, we see stuff and we present our opinion(s) and that is a post or more. That is how it works.

I suggest the default forums leaderboard ranking should be based on likes/posts ratio. At least clickbaity titles are better than spamming videos on youtube.com.

#9

Intelligence_06

Debates: 172

Posts: 3,954

5

8

11

Intelligence_06

01.23.2023 01:12PM

5

8

11

01.23.2023 01:12PM

Actually, having the leaderboard based on the aggregate number of likes is probably better.

#10

BearMan

Debates: 16

Posts: 1,067

3

4

11

BearMan

01.23.2023 07:44PM

3

4

11

01.23.2023 07:44PM

-->

@badger

send script

i can help figure out debate comments + questions if u want

Topic's author

#11

badger

Debates: 0

Posts: 2,472

3

badger

01.23.2023 08:26PM

3

01.23.2023 08:26PM

-->

@BearMan

import urllib.request
import re
from bs4 import BeautifulSoup

word_count = 0

def count_words(text):
words = text.split()
return len(words)

def get_post_text(html_string, thread_id, post_id):
soup = BeautifulSoup(html_string, 'html5lib')
post_link = soup.find('a', href=f'/forum/topics/{thread_id}/post-links/{post_id}', rel='nofollow')
post_text_div = post_link.find_next('div', class_='forum-topic-show__post-text', itemprop="text")
i = count_words(post_text_div.text)

return i

# o = urllib.request.urlopen("https://www.debateart.com/participants/RationalMadman/forum_posts")
# b = o.read()
# s = b.decode("utf-8")
# matches = re.findall("a href=\"/forum/topics/(\\d+)/post-links/(\\d+)", s, re.DOTALL)
# for match in matches:
# topic = match[0]
# post = match[1]
# url = "https://www.debateart.com" + "/forum/topics/" + str(topic) + "/post-links/" + str(post)
# o = urllib.request.urlopen(url)
# b = o.read()
# s = b.decode("utf-8")
# html_string = s
# i = get_post_text(html_string, match[0], match[1])
# word_count += i

curr = 859

while urllib.request.urlopen(f"https://www.debateart.com/participants/RationalMadman/forum_posts?page={curr}"):
o = urllib.request.urlopen(f"https://www.debateart.com/participants/RationalMadman/forum_posts?page={curr}")
b = o.read()
s = b.decode("utf-8")
matches = re.findall("a href=\"/forum/topics/(\\d+)/post-links/(\\d+)", s, re.DOTALL)
for match in matches:
topic = match[0]
post = match[1]
url = "https://www.debateart.com" + "/forum/topics/" + str(topic) + "/post-links/" + str(post)
o = urllib.request.urlopen(url)
b = o.read()
s = b.decode("utf-8")
html_string = s
i = get_post_text(html_string, match[0], match[1])
word_count += i
print(curr)
print(word_count)
curr += 1

print(word_count)

Just takes too long. Site is all php and html. All you can get back is full page html on every request, then need to search that. 7k lines on every post.

#12

sadolite

Debates: 0

Posts: 3,459

3

2

4

sadolite

01.23.2023 10:38PM

3

2

4

01.23.2023 10:38PM

If you wrote one word every second it would take 11 days to write 983,946 words. With that said, over a few years , Meh.

#13

BearMan

Debates: 16

Posts: 1,067

3

4

11

BearMan

01.24.2023 10:53PM

3

4

11

01.24.2023 10:53PM

-->

@badger

github?

indentation is being screwed up

Topic's author

#14

badger

Debates: 0

Posts: 2,472

3

badger

01.24.2023 11:47PM

3

01.24.2023 11:47PM

-->

@BearMan

Simple loops dude. Indent everything under the while loops once. Indent under the for loop once more down until word_count += i. The first commented out bit is to get the first page of comments on user profile. The while loops gets everything else from page=2. curr was set to 800 there because I did it in increments. Set it to 2 to run from beginning.

Topic's author

#15

badger

Debates: 0

Posts: 2,472

3

badger

01.24.2023 11:48PM

3

01.24.2023 11:48PM

Everything under the for loop in the first comment out part is indented once.

Topic's author

#16

badger

Debates: 0

Posts: 2,472

3

badger

01.24.2023 11:52PM

3

01.24.2023 11:52PM

import urllib.request

import re

from bs4 import BeautifulSoup

word_count = 0

def count_words(text):

words = text.split()

return len(words)

def get_post_text(html_string, thread_id, post_id):

soup = BeautifulSoup(html_string, 'html5lib')

post_link = soup.find('a', href=f'/forum/topics/{thread_id}/post-links/{post_id}', rel='nofollow')

post_text_div = post_link.find_next('div', class_='forum-topic-show__post-text', itemprop="text")

i = count_words(post_text_div.text)

return i

# o = urllib.request.urlopen("https://www.debateart.com/participants/RationalMadman/forum_posts")

# b = o.read()

# s = b.decode("utf-8")

# matches = re.findall("a href=\"/forum/topics/(\\d+)/post-links/(\\d+)", s, re.DOTALL)

# for match in matches:

# topic = match[0]

# post = match[1]

# url = "https://www.debateart.com" + "/forum/topics/" + str(topic) + "/post-links/" + str(post)

# o = urllib.request.urlopen(url)

# b = o.read()

# s = b.decode("utf-8")

# html_string = s

# i = get_post_text(html_string, match[0], match[1])

# word_count += i

curr = 859

while urllib.request.urlopen(f"https://www.debateart.com/participants/RationalMadman/forum_posts?page={curr}"):

o = urllib.request.urlopen(f"https://www.debateart.com/participants/RationalMadman/forum_posts?page={curr}")

b = o.read()

s = b.decode("utf-8")

matches = re.findall("a href=\"/forum/topics/(\\d+)/post-links/(\\d+)", s, re.DOTALL)

for match in matches:

topic = match[0]

post = match[1]

url = "https://www.debateart.com" + "/forum/topics/" + str(topic) + "/post-links/" + str(post)

o = urllib.request.urlopen(url)

b = o.read()

s = b.decode("utf-8")

html_string = s

i = get_post_text(html_string, match[0], match[1])

word_count += i

print(curr)

print(word_count)

curr += 1

print(word_count)

Topic's author

#17

badger

Debates: 0

Posts: 2,472

3

badger

01.24.2023 11:56PM

3

01.24.2023 11:56PM

Also the "while urllib.request.urlopen(f"https://www.debateart.com/participants/RationalMadman/forum_posts?page={curr}"):" always returns true no matter the val of curr because the site just gives a pop up. I just found the curr for RM's number of pages of posts on his profile and took the word_count from under it. You might want to fix that too or input manually or whatever.

Script is honestly not worth it, searching html is dumb.