Question

我有一个输入字符串说Please go to http://stackoverflow.com。检测到String的url部分，并且许多浏览器/ IDE /应用程序自动添加锚<a href=""></a>。所以它变成了Please go to <a href='http://stackoverflow.com'>http://stackoverflow.com</a>。

我需要使用Java做同样的事情。

Answer 1

使用java.net.URL !!

嘿，为什么不在java中为这个“java.net.URL”使用核心类，让它验证URL。

虽然以下代码违反了黄金原则“仅针对异常条件使用异常”，但尝试重新发明轮子以获得在Java平台上成熟的东西是没有意义的。

以下是代码：

import java.net.URL;
import java.net.MalformedURLException;

// Replaces URLs with html hrefs codes
public class URLInString {
    public static void main(String[] args) {
        String s = args[0];
        // separate input by spaces ( URLs don't have spaces )
        String [] parts = s.split("\\s+");

        // Attempt to convert each item into an URL.   
        for( String item : parts ) try {
            URL url = new URL(item);
            // If possible then replace with anchor...
            System.out.print("<a href=\"" + url + "\">"+ url + "</a> " );    
        } catch (MalformedURLException e) {
            // If there was an URL that was not it!...
            System.out.print( item + " " );
        }

        System.out.println();
    }
}

使用以下输入：

"Please go to http://stackoverflow.com and then mailto:oscarreyes@wordpress.com to download a file from    ftp://user:pass@someserver/someFile.txt"

产生以下输出：

Please go to <a href="http://stackoverflow.com">http://stackoverflow.com</a> and then <a href="mailto:oscarreyes@wordpress.com">mailto:oscarreyes@wordpress.com</a> to download a file from    <a href="ftp://user:pass@someserver/someFile.txt">ftp://user:pass@someserver/someFile.txt</a>

当然，可以以不同方式处理不同的协议。您可以使用URL类的getter获取所有信息，例如

 url.getProtocol();

或其他属性：spec，port，file，query，ref等等

http://java.sun.com/javase/6/docs/api/java/net/URL.html

处理所有协议（至少所有java平台都知道的协议）并作为额外的好处，如果有任何java当前无法识别的URL并最终被合并到URL类中（通过库更新）我会透明地得到它！

Answer 2

虽然它不是特定于Java的，但Jeff Atwood最近发布了一篇文章，介绍了在尝试查找和匹配任意文本的URL时可能遇到的陷阱：

The Problem With URLs

它提供了一个很好的正则表达式，可以与您需要用来正确（或多或少）处理parens的代码片段一起使用。

正则表达式：

\(?\bhttp://[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|]

paren cleanup：

if (s.StartsWith("(") && s.EndsWith(")"))
{
    return s.Substring(1, s.Length - 2);
}

Answer 3

你可以这样做（调整正则表达式以满足你的需要）：

String originalString = "Please go to http://www.stackoverflow.com";
String newString = originalString.replaceAll("http://.+?(com|net|org)/{0,1}", "<a href=\"$0\">$0</a>");

Answer 4

以下代码对“Atwood方法”进行了这些修改：

除http之外还检测https（添加其他方案很简单）
使用CASE_INSENSTIVE标志，因为HtTpS：//有效。
将匹配的括号组剥离（它们可以嵌套到任何级别）。此外，任何剩余的不匹配的左括号是剥离，但尾随右括号保持不变（尊重维基百科风格的网址）
网址为链接文字中的HTML编码。
目标属性通过method参数传入。可以根据需要添加其他属性。
在匹配URL之前，不使用\ b来标识分词符。 URL可以以左括号或http [s]：//开头，没有其他要求。

注意：

Apache Commons Lang的StringUtils用于下面的代码
下面对HtmlUtil.encode（）的调用是最终调用的util 一些Tomahawk代码对链接文本进行HTML编码，但任何类似的实用程序都可以。
请参阅方法注释以了解JSF或默认情况下输出为HTML编码的其他环境中的用法。

这是为了响应客户的要求而编写的，我们认为它代表了RFC中允许的字符与常用用法之间的合理折衷。它在这里提供，希望它对其他人有用。

可以进一步扩展，允许输入任何Unicode字符（即不使用％XX（两位十六进制）转义和超链接，但这需要接受所有Unicode字母加上有限的标点符号然后拆分“可接受的“分隔符（例如，％，|，＃等），URL编码每个部分然后粘合在一起。例如，http://en.wikipedia.org/wiki /Björn_Andrésen（Stack Overflow生成器未检测到）将在href中是“http://en.wikipedia.org/wiki/Bj%C3%B6rn_Andr%C3%A9sen”，但在页面上的链接文本中包含Björn_Andrésen。

// NOTES:   1) \w includes 0-9, a-z, A-Z, _
//          2) The leading '-' is the '-' character. It must go first in character class expression
private static final String VALID_CHARS = "-\\w+&@#/%=~()|";
private static final String VALID_NON_TERMINAL = "?!:,.;";

// Notes on the expression:
//  1) Any number of leading '(' (left parenthesis) accepted.  Will be dealt with.  
//  2) s? ==> the s is optional so either [http, https] accepted as scheme
//  3) All valid chars accepted and then one or more
//  4) Case insensitive so that the scheme can be hTtPs (for example) if desired
private static final Pattern URI_FINDER_PATTERN = Pattern.compile("\\(*https?://["+ VALID_CHARS + VALID_NON_TERMINAL + "]*[" +VALID_CHARS + "]", Pattern.CASE_INSENSITIVE );

/**
 * <p>
 * Finds all "URL"s in the given _rawText, wraps them in 
 * HTML link tags and returns the result (with the rest of the text
 * html encoded).
 * </p>
 * <p>
 * We employ the procedure described at:
 * http://www.codinghorror.com/blog/2008/10/the-problem-with-urls.html
 * which is a <b>must-read</b>.
 * </p>
 * Basically, we allow any number of left parenthesis (which will get stripped away)
 * followed by http:// or https://.  Then any number of permitted URL characters
 * (based on http://www.ietf.org/rfc/rfc1738.txt) followed by a single character
 * of that set (basically, those minus typical punctuation).  We remove all sets of 
 * matching left & right parentheses which surround the URL.
 *</p>
 * <p>
 * This method *must* be called from a tag/component which will NOT
 * end up escaping the output.  For example:
 * <PRE>
 * <h:outputText ... escape="false" value="#{core:hyperlinkText(textThatMayHaveURLs, '_blank')}"/>
 * </pre>
 * </p>
 * <p>
 * Reason: we are adding <code>&lt;a href="..."&gt;</code> tags to the output *and*
 * encoding the rest of the string.  So, encoding the outupt will result in
 * double-encoding data which was already encoded - and encoding the <code>a href</code>
 * (which will render it useless).
 * </p>
 * <p>
 * 
 * @param   _rawText  - if <code>null</code>, returns <code>""</code> (empty string).
 * @param   _target   - if not <code>null</code> or <code>""</code>, adds a target attributed to the generated link, using _target as the attribute value.
 */
public static final String hyperlinkText( final String _rawText, final String _target ) {

    String returnValue = null;

    if ( !StringUtils.isBlank( _rawText ) ) {

        final Matcher matcher = URI_FINDER_PATTERN.matcher( _rawText );

        if ( matcher.find() ) {

            final int originalLength    =   _rawText.length();

            final String targetText = ( StringUtils.isBlank( _target ) ) ? "" :  " target=\"" + _target.trim() + "\"";
            final int targetLength      =   targetText.length();

            // Counted 15 characters aside from the target + 2 of the URL (max if the whole string is URL)
            // Rough guess, but should keep us from expanding the Builder too many times.
            final StringBuilder returnBuffer = new StringBuilder( originalLength * 2 + targetLength + 15 );

            int currentStart;
            int currentEnd;
            int lastEnd     = 0;

            String currentURL;

            do {
                currentStart = matcher.start();
                currentEnd = matcher.end();
                currentURL = matcher.group();

                // Adjust for URLs wrapped in ()'s ... move start/end markers
                //      and substring the _rawText for new URL value.
                while ( currentURL.startsWith( "(" ) && currentURL.endsWith( ")" ) ) {
                    currentStart = currentStart + 1;
                    currentEnd = currentEnd - 1;

                    currentURL = _rawText.substring( currentStart, currentEnd );
                }

                while ( currentURL.startsWith( "(" ) ) {
                    currentStart = currentStart + 1;

                    currentURL = _rawText.substring( currentStart, currentEnd );
                }

                // Text since last match
                returnBuffer.append( HtmlUtil.encode( _rawText.substring( lastEnd, currentStart ) ) );

                // Wrap matched URL
                returnBuffer.append( "<a href=\"" + currentURL + "\"" + targetText + ">" + currentURL + "</a>" );

                lastEnd = currentEnd;

            } while ( matcher.find() );

            if ( lastEnd < originalLength ) {
                returnBuffer.append( HtmlUtil.encode( _rawText.substring( lastEnd ) ) );
            }

            returnValue = returnBuffer.toString();
        }
    } 

    if ( returnValue == null ) {
        returnValue = HtmlUtil.encode( _rawText );
    }

    return returnValue;

}

Answer 5

我制作了一个小型图书馆，完成了这个：

https://github.com/robinst/autolink-java

一些棘手的例子和它检测到的链接：

http://example.com.→http://example.com。
http://example.com,→http://example.com，
(http://example.com)→（http://example.com）
(... (see http://example.com))→（...（见http://example.com））
https://en.wikipedia.org/wiki/Link_(The_Legend_of_Zelda)→ https://en.wikipedia.org/wiki/Link_(The_Legend_of_Zelda)
http://üñîçøðé.com/→http://üñîçøðé.com/

Answer 6

原始

String msg = "Please go to http://stackoverflow.com";
String withURL = msg.replaceAll("(?:https?|ftps?)://[\\w/%.-]+", "<a href='$0'>$0</a>");
System.out.println(withURL);

这需要改进，以匹配正确的URL，特别是GET参数（？foo = bar＆amp; x = 25）

Answer 7

你在问两个问题。

在字符串中识别网址的最佳方法是什么？请参阅this thread
如何用Java编写上述解决方案？说明String.replaceAll用法的其他回复已解决此问题

Answer 8

对PhiLho答案的一个很好的改进是： msg.replaceAll("(?:https?|ftps?)://[\w/%.-][/\??\w=?\w?/%.-]?[/\?&\w=?\w?/%.-]*", "$0");

Answer 9

我编写了自己的URI / URL提取器，并认为有人可能会觉得它很有用，考虑到恕我直言，其他答案要好于其他答案，因为：

基于Stream，可用于大型文档
可扩展以通过战略链处理各种"Atwood Paren"问题。

由于帖子的代码有点长（尽管只有一个Java文件），我已将它放在gist github上。

以下是调用它的主要方法之一的签名，以显示上述要点：

public static Iterator<ExtractedURI> extractURIs(
    final Reader reader,
    final Iterable<ToURIStrategy> strategies,
    String ... schemes);

有一个默认的策略链可以处理大部分Atwood问题。

public static List<ToURIStrategy> DEFAULT_STRATEGY_CHAIN = ImmutableList.of(
    new RemoveSurroundsWithToURIStrategy("'"),
    new RemoveSurroundsWithToURIStrategy("\""),
    new RemoveSurroundsWithToURIStrategy("(", ")"),
    new RemoveEndsWithToURIStrategy("."),
    DEFAULT_STRATEGY,
    REMOVE_LAST_STRATEGY);

享受！

Answer 10

建议在2017年采用更方便的方式：

<TextView
    android:layout_width="wrap_content"
    android:layout_height="wrap_content"
    android:autoLink="web"
    android:linksClickable="true"/>

或android:autoLink="all"用于各种链接。

Answer 11

有一个非常好的javascript框架可以直接在浏览器中呈现链接：https://github.com/gregjacobs/Autolinker.js

它支持：html，电子邮件，（仅限我们）电话号码，推特和主题标签。

它还会呈现没有的链接：http：//

Answer 12

要检测您需要的URL：

if (yourtextview.getText().toString().contains("www") || yourtextview.getText().toString().contains("http://"){ your code here if contains URL;}

如何检测字符串中URL的存在

12 个答案:

使用java.net.URL !!