Have been trying to use a search engine (Zoom by wrensoft) that indexes my website to index the message board also.

you put skips in to have it not index BS links like the login screens, cookies, profiles, etc.

sure is tough with this board.


from a knowledgable friend:

i asked:
Good morning! I use a search engine called Zoom (wrensoft.com) to index my website. when i turn it loose in the message board it runs forever.

i can put "skips" in there to skip files with a URL that doesn't add much like the calendar and the PM zones and stuff, but it seems like the threads are referenced more than one way. some like this:

http://ambergriscaye.com/forum/ubbthreads.php/topics/357702/1.html
with numbers and some like this:
http://ambergriscaye.com/forum/ubbthreads.php/topics/97134/Cave_Tubing_Attire.html

with the words of the topic name.

then again like this:
http://ambergriscaye.com/forum/ubbthreads.php/topics/227639/JeanH.html

with the name of the person who did each post.

I need to figure out how to just index each file once.

here's my skip list:
?ubb=private_message
?ubb=edit_post
?ubb=send_topic
?ubb=report_a_post
?ubb=reply
?ubb=get_ip
/ubb/get_profile/
?ubb=get_daily
?ubb=next_topic
?ubb=delete_topic
?ubb=print_topic
?ubb=close_topic
?ubb=stick_topic
?ubb=send_topic
?/ubb/my_profile.html
?/ubb/directory.html
?/ubb/search.html
?/ubb/logoff.html
ubb=poll
ubb=transfer
/forums/private.php
/forums/usercp.php
/forums/memberlist.php
/forums/calendar.php
/forums/login.php
/forums/modcp/
/forums/member.php
&topic=0&Search=true
__fav
__subscribe
__postcomment
login&lostpw
&gonew=
ubb=online
printthread
ubb=sendprivate
ubb=showday
ubb=calendar
ubb=showprofile
ubb=newreply
ubb=addfavuser
/forums/subscription.php
/forums/poll.php
/forums/sendmessage.php
/forums/printthread.php
/forums/forumdisplay.php?do=markread
/forums/showthread.php?goto=newpost
/forums/newthread.php
/forums/infraction.php
/forums/archive/
/forums/editpost.php
/forums/newreply.php
/forums/online.php
/forums/profile.php
/forums/report.php
/forums/cron.php
/forums/admincp/
=com_events&task
&uid=default&view_records=
addsubscription&threadid
sendtofriend.php?
lastpost&threadid
asc&sortfield
addlist&userlist
addsubscription&forumid
&daysprune
ubb=markallread
ubb=mycookies
/ubb/newuser
/ubb/cfrm
/ubb/calendar
showprofile
ubbthreads.php/posts/
/userposts/
addfavuser
sendprivate
ubbthreads.php/users/
activetopics
/Re_
/all/
grabnext
ubb/markallread.html
ubbthreads.php/ubb/boardrules

any ideas how to narrow it down so it just indexes each thread once??

they all start with
http://ambergriscaye.com/forum/ubbthreads.php/topics/

so i can't figure out a good skip



His response:

Hmmm. 3 different views of the same data, all presented with the same URL structure using Apache URL re-writing.
http://ambergriscaye.com/forum/ubbthreads.php/topics/97134/Cave_Tubing_Attire.html
http://ambergriscaye.com/forum/ubbthreads.php/topics/97134/jsmoore.html
http://ambergriscaye.com/forum/ubbthreads.php/topics/97134/2.html
http://ambergriscaye.com/forum/ubbthreads.php/topics/97134/
http://ambergriscaye.com/forum/ubbthreads.php/topics/97134/index.html
http://ambergriscaye.com/forum/ubbthreads.php/topics/97134/xqxqxqxqxq.html

None of these are real static URLs. They must be getting rewritten by the web server into hidden dynamic URLs, which are then processed by UBB. Note the last one can’t really exist, but UBB doesn’t seem to mind and returns a page for the garbage URL.

There might be a UBB option to turn off the re-writing, but turning this off might break a lot of incoming links.

This is going to be hard to filter. Looks like someone tried to make the links look search engine friendly, but didn’t think very hard about it.


Comments? i was told by a smart person on this forum that its all dynamic addressing thus real tough to filter out duplicate urls.


Is there anyway to tighten this up without turning off SEO friendly URLS? will that even do it?

gracias