Automatically adds <a target="_blank" href=http:// and > and </a> around Internet addresses. "
It's defaultly checked at the bottom of the post box. This feature is pretty much a MUST because of all the people on this forum that post links like "http://www.microsoft.com". That would correct the link to <a target="_blank" href=http://www.microsoft.com>http://www.microsoft.com</a> Doug
Yeah cause even going through the W3T's own msg boards many users just paste the links like http://www.microsoft.com so Scream himself would benefit to adding it in.. But...I could probably write a PHP hack if it doesn't get included....
I've tried adding this before but couldn't come up with a foolproof regular expression. If it was in a sentence with a period right after the url it would mungle up the url and a few other things. I'd have no problem adding it into both versions if I could find a regex that worked flawlessly.
OK, I'll bite. How about (before processing markup): $content =~ s~<font color=red>(?< !\[url[=\]])</font color=red><font color=purple>\b</font color=purple>(https?://<font color=blue>[^\s<>"'{}\[\]]*</font color=blue><font color=green>[^,.!\s<>"'{}\[\]]</font color=green>)~\[url\]$1\[/url\]~g;
The goal is to replace http://wherever by [url]http://wherever[/url]. The red part keeps us from match something where the http:// (or https://) is already preceeded by [url] or [url= The purple makes sure we have a "word boundary" before the http: The blue matches a bunch of characters which may not contain: whitespace < > " ' { } [ ] The green matches the last character in the URL. It is not allowed to match a comma, period, or explanation point (in addition to the characters outlawed by the blue part). This will exclude punctuation which is intended to end a sentence from the URL (with the exception of a sentence that ends in a question mark--we really can't do anything about that since it is hard to know if it should be part of the URL).
Bill Dimm, <A target="_blank" HREF=http://MagPortal.com/>MagPortal.com</A> - <font color=red>free</font color=red> feeds for your site.<P ID="edit"><FONT SIZE=-1><EM>Edited by BillD on 04/15/01 01:59 PM (server time).</EM></FONT>
I think gerrit just said that he has uploaded 4/13/2001 hack for those. I have not checked if it handles those situations <img border="0" title="" alt="[Smile]" src="images/icons/smile.gif" />
If you *do* add it, I trust it will be optional? I don't like it! It's too restricted. I like being able to just have the description showing instead of the raw URL...
I am fully aware of those complicated situations, e.g. if a URL is trailed by periods, commas, exclamation marks (e.g. "check out http://www.microcrap.com!") I've played with my hack for several weeks and observed carefully what happens in which circumstances (which is why it took months before I dared to publish this hack). The only known bug in it is that it will not work if the URL is the very first thing of the post, but it seems to work fairly well otherwise, just come by my boards and test it in the "Test" board (way at the bottom of the Forum Home).
Well atleast i am using that hack until its in official version <img border="0" title="" alt="[Smile]" src="images/icons/smile.gif" /> Good job Gerrit.
Your hack should be changed to also recognize https:// and it doesn't work right if I type something like <http://magportal.com> (picks up the extra '>'). Also, if you enter something like junkhttp://magportal.com it will take it (and exclude "junk") - maybe this is a good thing, but maybe not (the regular expression I supplied before can be modified to handle this case in the same way by removing the \b that appears in purple).
Bill Dimm, <A target="_blank" HREF=http://MagPortal.com/>MagPortal.com</A> - <font color=red>free</font color=red> feeds for your site.
Well, I just tested your code on my site... it does not accept www.mysite.com and messes up with trailing question mark, e.g. "do you like <A target="_blank" HREF=http://www.mysite.com?>http://www.mysite.com?</A>" (it includes the ? into the hyperlink). However, I could not copy the trailing > error... how did you do that?
Eileen, not to worry... as it stands, it does not interfere with markup, just makes it easier on those that don't know how to use markup and just paste a URL into their post.
That it did not accept https:// was a bug in that 5.0.9 version, I got mine patched up now, and the hack just needs to have https? instead of http, that's all...
As I mentioned in my original post, that was intentional since the question mark could be part of the URL. Now that I think about it more, that was stupid. If the question mark was intended to be part of the URL (with nothing following it) it doesn't do any harm to cut it off, so might as well do so. It's an easy change--just add a question mark inside the square brackets in the green part.
Regarding the trailing <font color=blue>></font color=blue> - I just typed in <font color=blue><http://magportal.com></font color=blue>. I have a guess on what happened here - I think maybe W3T saw > and converted it to > and your hack maybe picked up everything before the semicolon (note that the <font color=blue>;</font color=blue> which appears right after the link is not something I typed - it just "appeared").
I wasn't trying to handle www.mysite.com. I can't see a good way to do that without adding a second regular expression that handles it separately. Here is the second regular expression to add: $content =~ s~<font color=red>(?< ![/\]])</font color=red><font color=purple>\b</font color=purple>(www\.<font color=blue>[^\s<>"'{}\[\]]*</font color=blue><font color=green>[^,.!?\s<>"'{}\[\]]</font color=green>)~\[url=http://$1\]$1\[/url\]~g; This changes www.site.com into [url=http://www.site.com]www.site.com[/url]. This is somewhat different from what Garrit did - his hack makes the inserted <font color=blue>http://</font color=blue> visible in the link. The red part ensures that we don't have a <font color=blue>/</font color=blue> (indicating that there is probably a "http://") or a <font color=blue>]</font color=blue> (indicating that there is probably a [url=...]) before the www (note that we can't be much more precise about this because variable length lookbacks are not supported by Perl). The rest of the color coding is as before.
Another thing that I noticed--if someone types something like <font color=blue>[url=http://www.magportal.com/c/bus/]business section of www.magportal.com[/url]</font color=blue> the www.magportal.com part is going to get screwed up (using Garrit's code or mine). This seems impossible to avoid without a lot of work.
Bill Dimm, <A target="_blank" HREF=http://MagPortal.com/>MagPortal.com</A> - <font color=red>free</font color=red> feeds for your site.
I see... well, I'll give your solution another try tomorrow morning (don't want to keep uploading w3t.pm when there's traffic on my site, so I perfer to do that early a.m.) because your solution sounds promising, I'll write up a test batch of URLs and possilbe troublesome situations.
About the >, as you suggested on my boards that is only when a non-admin posts... interesting... thanks for pointing that out!
A few further comments. 1) If you want to make it case-insensitive, change <font color=blue>~g;</font color=blue> to <font color=blue>~gi;</font color=blue> at the end of the statement. 2) I have not prohibited parenthesis (i.e. <font color=blue>( )</font color=blue>) from appearing inside the URL since they seem to be allowed according to <A target="_blank" HREF=http://www.segue.com/html/s_tech/s_silkperformer_faq_web.htm#06>here</A> (which could be completely out of context--wound up here from a google search). Someone who is more of a URL wizzard might be able to make some good adjustments to the blue and green parts of my expressions (which specify the characters that may not appear in the URL or as the last character in the URL respectively).
Bill Dimm, <A target="_blank" HREF=http://MagPortal.com/>MagPortal.com</A> - <font color=red>free</font color=red> feeds for your site.
The hack cannot handle wwwthreads.com but is supposed to work with www.wwwthreads.com, however it turns it into <A target="_blank" HREF=http://www.wwwthreads.com>http://www.wwwthreads.com</A> (but which is fine with me...).
I've just updated my hack.
Gerrit <A target="_blank" HREF=http://www.channeling.net/forum/>SpiritBoard</A> <A target="_blank" HREF=http://www.channeling.net>http://www.channeling.net</A><P ID="edit"><FONT SIZE=-1><EM>Edited by Gerrit on 04/16/01 10:17 PM (server time).</EM></FONT>
Thanks, Gerrit. I *had* guessed right (where to put them) after all but I'd forgotten it can't be the first thing in the post so it didn't seem to be working.
Yes, right. Sorry about that (minor?) restriction, haven't found a way around, unless BillD's code can do that and also holds up to the tests... Still didn't get around to putting that in, had to work this morning, hopefully tomorrow a.m. I can check that out.
What version of Perl are you using (do 'perl -v')? The code you are using looks fine to me and I put it into a little Perl script and it seemed to work OK. The "internal error" message typically means that Perl bailed because it encountered a syntax error (unfortunately the log file doesn't tell what the error message was). Could you try putting the code into a stand-alone Perl program and run it to see what it says? For example: <pre>#!/usr/bin/perl $Body = "www.mysite.com http://mysite.com"; insert code here print "$Body\n"; </pre>
Bill Dimm, <A target="_blank" HREF=http://MagPortal.com/>MagPortal.com</A> - <font color=red>free</font color=red> feeds for your site.
I'm doing it the easy way and using Gerrit's hack instead. <img border="0" title="" alt="[Smile]" src="images/icons/smile.gif" /> Maybe he'll discover what's going wrong with your code...
Hi BillD, I've been testing your code... when I use a normal user (HTML off) then I keep having the same problem with your code, that you have with mine. It includes all trailing brackets into the URL, even ")" which mine excludes. Any suggestions?
Gerrit <A target="_blank" HREF=http://www.channeling.net/forum/>SpiritBoard</A> <A target="_blank" HREF=http://www.channeling.net>http://www.channeling.net</A><P ID="edit"><FONT SIZE=-1><EM>Edited by Gerrit on 04/18/01 00:00 AM (server time).</EM></FONT>
Hi Gerrit, To exclude the parenthesis from the tail of the URL (which will make <font color=red>(http://www.mysite.com/whatever) stuff</font color=red> into <font color=red>(<a target="_blank" href=http://www.mysite.com/whatever>http://www.mysite.com/whatever</a>) stuff</font color=red>) you should add them to the list of characters in the green part of my code. To exclude them from the URL completely (which will make <font color=red>http://www.mysite.com/whatever)stuff</font color=red> into <font color=red>(<a target="_blank" href=http://www.mysite.com/whatever>http://www.mysite.com/whatever</a>)stuff</font color=red> - note that there is no space or other illegal-in-a-URL character before the "stuff" this time), also add them to the blue part. I didn't do that because I wasn't sure if it would cause problems since URLs are apparently allowed to contain parenthesis (although I can't think of anywhere that I've seen that used). My guess for why the > is showing up in the URL is that you probably inserted my code at a point where the > has already been converted to & gt;. My code really needs to run before W3T has done other processing on the input. If you have trouble finding a working location to insert it into, let me know and I'll look into it (I'll have to download the latest version of W3T and refamiliarize myself with the code though--it's been a while). Bill Dimm, <a target="_blank" href=http://MagPortal.com/>MagPortal.com</a> - <font color=red>free</font color=red> feeds for your site.<P ID="edit"><FONT><EM>Edited by BillD on 04/18/01 06:03 AM.</EM></FONT>
Hey, now I truly see the potential of your code. Since I placed this into w3t.pm, sub do_markup, it is past the point where the < and > substitutions are done, so I had to fix it differently, but your code has the advantage that it does allow placing a URL at the very beginning of a post. I had to add, though, not only to disallow trailing ) but also ; but now it works like a charm, as it seems <img border="0" title="" alt="[Smile]" src="images/icons/smile.gif" /> ! Well done - how about a (c)leaner code for e-mail addresses? Any suggestions?
Things start to get a bit more complicated, but you can force it to have at least one period by making two copies of the blue part and putting a <font color=blue>\.</font color=blue> between them. It's probably also a good idea to change the <font color=blue>*</font color=blue> at the end of the blue parts into a <font color=blue>+</font color=blue> to eliminate matching something like <font color=blue>http://.com</font color=blue>.
Bill Dimm, <A target="_blank" HREF=http://MagPortal.com/>MagPortal.com</A> - <font color=red>free</font color=red> feeds for your site.
OK, here's my attempt for email addresses. It's getting late and this is complicated so I make no promises... $Body =~ s/<font color=red>(^|[^-.!#\$\%&*+~\w\]])</font color=red>(<font color=blue>[-.!#\$\%&*+~\w]</font color=blue>+\@<font color=green>[-.\w]+\.\w+</font color=green>)/$1\[email\]$2\[\/email\]/g;
<font color=blue>blue</font color=blue> - valid characters for a user name. The characters we accept come from the last paragraph in <A target="_blank" HREF=http://email.about.com/compute/email/library/weekly/aa062298.htm>this article</A>. The article also explains that you may use quotes or backslashes to permit all sorts of weird stuff in the user name--I make no attempt to allow such things here. <font color=red>red</font color=red> - we make sure that whatever comes before the email address is either the beginning of the body (i.e. <font color=red>^</font color=red>) or it is any character that we do not allow in a user name with one exception--we also do not allow <font color=red>]</font color=red> because it might indicate that we have a [email] already in front (the ] could be there for some other reason like boldfacing but it's probably OK not to worry about that). <font color=green>green</font color=green> - this is the host name. We require the "top level domain" part (i.e. the part after the last period) to be all alpha-numeric (e.g. "com" or "net") so email address will terminate when we hit punctuation or whitespace.
Note that it is legal for a URL to have an @ in it (see <A target="_blank" HREF=http://www.pc-help.org/obscure.htm>obscure URLs</A>), so using the above could cause weird URLs to get messed up.
Bill Dimm, <A target="_blank" HREF=http://MagPortal.com/>MagPortal.com</A> - <font color=red>free</font color=red> feeds for your site.
This part: \@[-.\w]+\.\w+ (the @ and the hostname part) allows a hostname to begin with a '-', which isn't allowed in the RFCs (822 et al).
Change it to something like: \@\w[-.\w]*\.\w+ and this problem should be fixed. Note that I also changed the '+' to a '*' just before the '.' because otherwise a single letter second level domain wouldn't match and an address like joe@a.com would not be matched, and it should because it is a perfectly legal address.
Yours,
<font color=red>Per Gøtterup</font color=red> System Administrator, NetGroup A/S
Thanks for all the tips... I think I won't go into checking for period, because if they mistype the URL, there is little that can be done about that, I don't want to have to add a URL checker that only allows valid URLs... that's going too far... if they post a URL, I think it is their reponsibility to put it in right. Checking for http://* and www.* and turning those into hyperlinks is enough convenience, IMHO... I haven't tested your E-Mail code yet, because I ran into problems with [url=http://www.test.com]Test Site[/url] markup usage, it would turn out quite garbled, got that fixed now... just had to add that it is not valid to have = before the http:// or www (because then <font color=red>[url=</font color=red> is assumed) which is a restriction that I could live with.
I have now updated the <A target="_blank" HREF=http://www.channeling.net/forum/showflat.pl?Cat=&Board=mods&Number=6444&page=0&view=collapsed&sb=5>hack</A>, included the improved support for EMail addresses.