 |
 |
 |
 |
Registered: 04/09/11
Posts: 140
|
|
|
 |
 |
 |
 |
|
 |
 |
 |
 |
#235536 - 03/11/10 07:24 PM
spidering difficult
|
member
|
Registered: 11/01/06
Posts: 181
Loc: oregon and belize
|
|
Have been trying to use a search engine (Zoom by wrensoft) that indexes my website to index the message board also. you put skips in to have it not index BS links like the login screens, cookies, profiles, etc. sure is tough with this board. from a knowledgable friend: i asked: Good morning! I use a search engine called Zoom (wrensoft.com) to index my website. when i turn it loose in the message board it runs forever.
i can put "skips" in there to skip files with a URL that doesn't add much like the calendar and the PM zones and stuff, but it seems like the threads are referenced more than one way. some like this:
http://ambergriscaye.com/forum/ubbthreads.php/topics/357702/1.html with numbers and some like this: http://ambergriscaye.com/forum/ubbthreads.php/topics/97134/Cave_Tubing_Attire.html
with the words of the topic name.
then again like this: http://ambergriscaye.com/forum/ubbthreads.php/topics/227639/JeanH.html
with the name of the person who did each post.
I need to figure out how to just index each file once.
here's my skip list: ?ubb=private_message ?ubb=edit_post ?ubb=send_topic ?ubb=report_a_post ?ubb=reply ?ubb=get_ip /ubb/get_profile/ ?ubb=get_daily ?ubb=next_topic ?ubb=delete_topic ?ubb=print_topic ?ubb=close_topic ?ubb=stick_topic ?ubb=send_topic ?/ubb/my_profile.html ?/ubb/directory.html ?/ubb/search.html ?/ubb/logoff.html ubb=poll ubb=transfer /forums/private.php /forums/usercp.php /forums/memberlist.php /forums/calendar.php /forums/login.php /forums/modcp/ /forums/member.php &topic=0&Search=true __fav __subscribe __postcomment login&lostpw &gonew= ubb=online printthread ubb=sendprivate ubb=showday ubb=calendar ubb=showprofile ubb=newreply ubb=addfavuser /forums/subscription.php /forums/poll.php /forums/sendmessage.php /forums/printthread.php /forums/forumdisplay.php?do=markread /forums/showthread.php?goto=newpost /forums/newthread.php /forums/infraction.php /forums/archive/ /forums/editpost.php /forums/newreply.php /forums/online.php /forums/profile.php /forums/report.php /forums/cron.php /forums/admincp/ =com_events&task &uid=default&view_records= addsubscription&threadid sendtofriend.php? lastpost&threadid asc&sortfield addlist&userlist addsubscription&forumid &daysprune ubb=markallread ubb=mycookies /ubb/newuser /ubb/cfrm /ubb/calendar showprofile ubbthreads.php/posts/ /userposts/ addfavuser sendprivate ubbthreads.php/users/ activetopics /Re_ /all/ grabnext ubb/markallread.html ubbthreads.php/ubb/boardrules
any ideas how to narrow it down so it just indexes each thread once??
they all start with http://ambergriscaye.com/forum/ubbthreads.php/topics/
so i can't figure out a good skipHis response: Hmmm. 3 different views of the same data, all presented with the same URL structure using Apache URL re-writing. http://ambergriscaye.com/forum/ubbthreads.php/topics/97134/Cave_Tubing_Attire.html http://ambergriscaye.com/forum/ubbthreads.php/topics/97134/jsmoore.html http://ambergriscaye.com/forum/ubbthreads.php/topics/97134/2.html http://ambergriscaye.com/forum/ubbthreads.php/topics/97134/ http://ambergriscaye.com/forum/ubbthreads.php/topics/97134/index.html http://ambergriscaye.com/forum/ubbthreads.php/topics/97134/xqxqxqxqxq.html None of these are real static URLs. They must be getting rewritten by the web server into hidden dynamic URLs, which are then processed by UBB. Note the last one can’t really exist, but UBB doesn’t seem to mind and returns a page for the garbage URL. There might be a UBB option to turn off the re-writing, but turning this off might break a lot of incoming links. This is going to be hard to filter. Looks like someone tried to make the links look search engine friendly, but didn’t think very hard about it.Comments? i was told by a smart person on this forum that its all dynamic addressing thus real tough to filter out duplicate urls. Is there anyway to tighten this up without turning off SEO friendly URLS? will that even do it? gracias
|
|
Top
|
|
|
|
|
 |
 |
 |
 |
 |
 |
 |
 |
|
Express Hosting
"We are the official hosting company of UBB.threads. Ask us about our free migration services to migrate your UBB.threads installation."
|
|
|
 |
 |
 |
 |
 |
 |
 |
 |
#235552 - 03/12/10 10:18 AM
Re: spidering difficult
[Re: mcasado]
|
Post-a-holic
|
Registered: 06/04/06
Posts: 10164
Loc: Aberdeen, WA
|
|
Turning off the search engine friendly URLs is going to probably be your best option for now, since as Gizmo stated it won't add the subject line, poster name, after the topic/post. But as mentioned, if you turn them off and you've been running with them on for quite a while, any indexed links by search engines will not work.
We don't do any webserver url rewriting. There are always going to be multiple URLs to each individual topic, since you'll have the topic url, and then the permalink to individual posts in the topic. The only part of the URL that is actually used is the topic or post id, everything after that is ignored, which is why anything will work, such as topics/97134/xqxqxqxqxq.html as was noted. But that URL would never be generated by the forum.
We will be working on cleaning this up to try and cut down on the URLs that are displayed when a search engine visits the forum.
|
|
Top
|
|
|
|
|
 |
 |
 |
 |
|
|