here's the exclude list i am using. none of this stuff need to be indexed

&daysprune
ubb=private_message
ubb=edit_post
ubb=send_topic
ubb=report_a_post
ubb=reply
ubb=get_ip
ubb/get_profile/
ubb=get_daily
ubb=next_topic
ubb=delete_topic
ubb=print_topic
ubb=close_topic
ubb=stick_topic
ubb=send_topic
ubb/my_profile.html
ubb/directory.html
ubb/search.html
ubb/logoff.html
ubb=poll
ubb=transfer
ubb=recent_user_posts
ubb=pntf
ubb=get_profile
ubb=email
ubb=newtopic
ubb=search
page/1/fpart/1
ubb=newpost
ubb=markallread
ubb=mycookies
/ubb/newuser
/ubb/cfrm
/ubb/calendar
/ubb/search
/ubb/faq
/fpart/all/
/ubb/showprofile
/ubb/dosearch
/showflat/sticky/
/ubb/printthread
/mode/showthreaded/
ubb=sendprivate
ubb=showday
ubb=calendar
ubb=showprofile
ubb=newreply
ubb=addfavuser
/ubb/showthreaded
All_Forums&Name=
&topic=0&Search=true
__fav
__subscribe
__postcomment


the result is that only lines are indexed that have this format, the showflat ones. but i get like 25 lines in a row in the spider log, each named differently, but each goes to the same page. i look for some way to put in an exclude that will further reduce the duplication, but any further excludes makes the thing cease in about 150 files. so i really think i have the exclude list as tight as i can get it.

14:09:02 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/214831/page/5
14:09:03 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/214894/page/0/fpart/1


this exclude "/ubb/showthreaded" takes out ALL the threaded views, which really cuts down the size of the index. no reason to index by flat AND threaded views.

but i still have the thing bouncing around in there like a pong ball. it can't get out or find an end. the previous ubb version indexed in about 25,000 files. this one runs up above 500,000 still heading up when i stop it after about 24 hours when i get tired of paying the bandwidth. its just pulling pulling pulling data non stop 8 threads at a time forever. man that bleeds the bandwidth hard.

i really would like to try to find the end even if it runs for three days, but the results would be so stuffed with dupes, many many for each, that even if i ever hit an end i couldn't use the results.