java如何实现python的urllib.quote(str,safe=/)

发布时间：2020-07-22 04:03:36 所属栏目：Python 来源：互联网
导读：最近需要将一些python代码转成java，遇到url编码 urllib.quote(str,safe=/) 但java中URLEncoder.encode(arg, Consta
最近需要将一些python代码转成java，遇到url编码
urllib.quote(str,safe='/')
但java中URLEncoder.encode(arg,Constant.UTF_8)会将'/'转成%2F
网上查了一下 java没见到类似的safe方式，只好自己实现一个类
<span style="color: #0000ff;">public <span style="color: #0000ff;">class<span style="color: #000000;"> UrlSafeEncoder {
</span><span style="color: #0000ff;"&gt;static</span><span style="color: #000000;"&gt; BitSet dontNeedEncoding;
</span><span style="color: #0000ff;"&gt;static</span> <span style="color: #0000ff;"&gt;final</span> <span style="color: #0000ff;"&gt;int</span> caseDiff = ('a' - 'A'<span style="color: #000000;"&gt;);
</span><span style="color: #0000ff;"&gt;static</span> String dfltEncName = <span style="color: #0000ff;"&gt;null</span><span style="color: #000000;"&gt;;

</span><span style="color: #0000ff;"&gt;static</span><span style="color: #000000;"&gt; {

    </span><span style="color: #008000;"&gt;/*</span><span style="color: #008000;"&gt; The list of characters that are not encoded has been
     * determined as follows:
     *
     * RFC 2396 states:
     * -----
     * Data characters that are allowed in a URI but do not have a
     * reserved purpose are called unreserved.  These include upper
     * and lower case letters,decimal digits,and a limited set of
     * punctuation marks and symbols.
     *
     * unreserved  = alphanum | mark
     *
     * mark        = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
     *
     * Unreserved characters can be escaped without changing the
     * semantics of the URI,but this should not be done unless the
     * URI is being used in a context that does not allow the
     * unescaped character to appear.
     * -----
     *
     * It appears that both Netscape and Internet Explorer escape
     * all special characters from this list with the exception
     * of "-","_",".","*". While it is not clear why they are
     * escaping the other characters,perhaps it is safest to
     * assume that there might be contexts in which the others
     * are unsafe if not escaped. Therefore,we will use the same
     * list. It is also noteworthy that this is consistent with
     * O'Reilly's "HTML: The Definitive Guide" (page 164).
     *
     * As a last note,Intenet Explorer does not encode the "@"
     * character which is clearly not unreserved according to the
     * RFC. We are being consistent with the RFC in this matter,* as is Netscape.
     *
     </span><span style="color: #008000;"&gt;*/</span><span style="color: #000000;"&gt;

    dontNeedEncoding </span>= <span style="color: #0000ff;"&gt;new</span> BitSet(256<span style="color: #000000;"&gt;);
    </span><span style="color: #0000ff;"&gt;int</span><span style="color: #000000;"&gt; i;
    </span><span style="color: #0000ff;"&gt;for</span> (i = 'a'; i <= 'z'; i++<span style="color: #000000;"&gt;) {
        dontNeedEncoding.set(i);
    }
    </span><span style="color: #0000ff;"&gt;for</span> (i = 'A'; i <= 'Z'; i++<span style="color: #000000;"&gt;) {
        dontNeedEncoding.set(i);
    }
    </span><span style="color: #0000ff;"&gt;for</span> (i = '0'; i <= '9'; i++<span style="color: #000000;"&gt;) {
        dontNeedEncoding.set(i);
    }
    dontNeedEncoding.set(</span>' '); <span style="color: #008000;"&gt;/*</span><span style="color: #008000;"&gt; encoding a space to a + is done
                                * in the encode() method </span><span style="color: #008000;"&gt;*/</span><span style="color: #000000;"&gt;
    dontNeedEncoding.set(</span>'-'<span style="color: #000000;"&gt;);
    dontNeedEncoding.set(</span>'_'<span style="color: #000000;"&gt;);
    dontNeedEncoding.set(</span>'.'<span style="color: #000000;"&gt;);
    dontNeedEncoding.set(</span>'*'<span style="color: #000000;"&gt;);

    dfltEncName </span>=<span style="color: #000000;"&gt; AccessController.doPrivileged(
        </span><span style="color: #0000ff;"&gt;new</span> GetPropertyAction("file.encoding"<span style="color: #000000;"&gt;)
    );
}

</span><span style="color: #008000;"&gt;/**</span><span style="color: #008000;"&gt;
 * You can't call the constructor.
 </span><span style="color: #008000;"&gt;*/</span>
<span style="color: #0000ff;"&gt;private</span><span style="color: #000000;"&gt; UrlSafeEncoder() { }
</span><span style="color: #008000;"&gt;/**</span><span style="color: #008000;"&gt;
 * Translates a string into {</span><span style="color: #808080;"&gt;@code</span><span style="color: #008000;"&gt; application/x-www-form-urlencoded}
 * format using a specific encoding scheme. This method uses the
 * supplied encoding scheme to obtain the bytes for unsafe
 * characters.
 * <p>
 * <em><strong>Note:</strong> The <a href=
 * "</span><span style="color: #008000; text-decoration: underline;"&gt;http://www.w3.org/TR/html40/appendix/notes.html</span><span style="color: #008000;"&gt;#non-ascii-chars"&gt;
 * World Wide Web Consortium Recommendation</a> states that
 * UTF-8 should be used. Not doing so may introduce
 * incompatibilities.</em>
 *
 * </span><span style="color: #808080;"&gt;@param</span><span style="color: #008000;"&gt;   s   {</span><span style="color: #808080;"&gt;@code</span><span style="color: #008000;"&gt; String} to be translated.
 * </span><span style="color: #808080;"&gt;@param</span><span style="color: #008000;"&gt;   enc   The name of a supported
 *    <a href="../lang/package-summary.html#charenc"&gt;character
 *    encoding</a>.
 * </span><span style="color: #808080;"&gt;@return</span><span style="color: #008000;"&gt;  the translated {</span><span style="color: #808080;"&gt;@code</span><span style="color: #008000;"&gt; String}.
 * </span><span style="color: #808080;"&gt;@exception</span><span style="color: #008000;"&gt;  UnsupportedEncodingException
 *             If the named encoding is not supported
 * </span><span style="color: #808080;"&gt;@see</span><span style="color: #008000;"&gt; URLDecoder#decode(java.lang.String,java.lang.String)
 * </span><span style="color: #808080;"&gt;@since</span><span style="color: #008000;"&gt; 1.4
 </span><span style="color: #008000;"&gt;*/</span>
<span style="color: #0000ff;"&gt;public</span> <span style="color: #0000ff;"&gt;static</span> String encode(String s,String enc,<span style="color: #0000ff;"&gt;char</span><span style="color: #000000;"&gt; safe)
    </span><span style="color: #0000ff;"&gt;throws</span><span style="color: #000000;"&gt; UnsupportedEncodingException {
    dontNeedEncoding.set(safe);
    </span><span style="color: #0000ff;"&gt;boolean</span> needToChange = <span style="color: #0000ff;"&gt;false</span><span style="color: #000000;"&gt;;
    StringBuffer out </span>= <span style="color: #0000ff;"&gt;new</span><span style="color: #000000;"&gt; StringBuffer(s.length());
    Charset charset;
    CharArrayWriter charArrayWriter </span>= <span style="color: #0000ff;"&gt;new</span><span style="color: #000000;"&gt; CharArrayWriter();

    </span><span style="color: #0000ff;"&gt;if</span> (enc == <span style="color: #0000ff;"&gt;null</span><span style="color: #000000;"&gt;)
        </span><span style="color: #0000ff;"&gt;throw</span> <span style="color: #0000ff;"&gt;new</span> NullPointerException("charsetName"<span style="color: #000000;"&gt;);

    </span><span style="color: #0000ff;"&gt;try</span><span style="color: #000000;"&gt; {
        charset </span>=<span style="color: #000000;"&gt; Charset.forName(enc);
    } </span><span style="color: #0000ff;"&gt;catch</span><span style="color: #000000;"&gt; (IllegalCharsetNameException e) {
        </span><span style="color: #0000ff;"&gt;throw</span> <span style="color: #0000ff;"&gt;new</span><span style="color: #000000;"&gt; UnsupportedEncodingException(enc);
    } </span><span style="color: #0000ff;"&gt;catch</span><span style="color: #000000;"&gt; (UnsupportedCharsetException e) {
        </span><span style="color: #0000ff;"&gt;throw</span> <span style="color: #0000ff;"&gt;new</span><span style="color: #000000;"&gt; UnsupportedEncodingException(enc);
    }

    </span><span style="color: #0000ff;"&gt;for</span> (<span style="color: #0000ff;"&gt;int</span> i = 0; i <<span style="color: #000000;"&gt; s.length();) {
        </span><span style="color: #0000ff;"&gt;int</span> c = (<span style="color: #0000ff;"&gt;int</span><span style="color: #000000;"&gt;) s.charAt(i);
        </span><span style="color: #008000;"&gt;//</span><span style="color: #008000;"&gt;System.out.println("Examining character: " + c);</span>
        <span style="color: #0000ff;"&gt;if</span><span style="color: #000000;"&gt; (dontNeedEncoding.get(c)) {
            </span><span style="color: #0000ff;"&gt;if</span> (c == ' '<span style="color: #000000;"&gt;) {
                c </span>= '+'<span style="color: #000000;"&gt;;
                needToChange </span>= <span style="color: #0000ff;"&gt;true</span><span style="color: #000000;"&gt;;
            }
            </span><span style="color: #008000;"&gt;//</span><span style="color: #008000;"&gt;System.out.println("Storing: " + c);</span>
            out.append((<span style="color: #0000ff;"&gt;char</span><span style="color: #000000;"&gt;)c);
            i</span>++<span style="color: #000000;"&gt;;
        } </span><span style="color: #0000ff;"&gt;else</span><span style="color: #000000;"&gt; {
            </span><span style="color: #008000;"&gt;//</span><span style="color: #008000;"&gt; convert to external encoding before hex conversion</span>
            <span style="color: #0000ff;"&gt;do</span><span style="color: #000000;"&gt; {
                charArrayWriter.write(c);
                </span><span style="color: #008000;"&gt;/*</span><span style="color: #008000;"&gt;
                 * If this character represents the start of a Unicode
                 * surrogate pair,then pass in two characters. It's not
                 * clear what should be done if a bytes reserved in the
                 * surrogate pairs range occurs outside of a legal
                 * surrogate pair. For now,just treat it as if it were
                 * any other character.
                 </span><span style="color: #008000;"&gt;*/</span>
                <span style="color: #0000ff;"&gt;if</span> (c >= 0xD800 &amp;&amp; c <= 0xDBFF<span style="color: #000000;"&gt;) {
                    </span><span style="color: #008000;"&gt;/*</span><span style="color: #008000;"&gt;
                      System.out.println(Integer.toHexString(c)
                      + " is high surrogate");
                    </span><span style="color: #008000;"&gt;*/</span>
                    <span style="color: #0000ff;"&gt;if</span> ( (i+1) <<span style="color: #000000;"&gt; s.length()) {
                        </span><span style="color: #0000ff;"&gt;int</span> d = (<span style="color: #0000ff;"&gt;int</span>) s.charAt(i+1<span style="color: #000000;"&gt;);
                        </span><span style="color: #008000;"&gt;/*</span><span style="color: #008000;"&gt;
                          System.out.println("tExamining "
                          + Integer.toHexString(d));
                        </span><span style="color: #008000;"&gt;*/</span>
                        <span style="color: #0000ff;"&gt;if</span> (d >= 0xDC00 &amp;&amp; d <= 0xDFFF<span style="color: #000000;"&gt;) {
                            </span><span style="color: #008000;"&gt;/*</span><span style="color: #008000;"&gt;
                              System.out.println("t"
                              + Integer.toHexString(d)
                              + " is low surrogate");
                            </span><span style="color: #008000;"&gt;*/</span><span style="color: #000000;"&gt;
                            charArrayWriter.write(d);
                            i</span>++<span style="color: #000000;"&gt;;
                        }
                    }
                }
                i</span>++<span style="color: #000000;"&gt;;
            } </span><span style="color: #0000ff;"&gt;while</span> (i < s.length() &amp;&amp; !dontNeedEncoding.get((c = (<span style="color: #0000ff;"&gt;int</span><span style="color: #000000;"&gt;) s.charAt(i))));

            charArrayWriter.flush();
            String str </span>= <span style="color: #0000ff;"&gt;new</span><span style="color: #000000;"&gt; String(charArrayWriter.toCharArray());
            </span><span style="color: #0000ff;"&gt;byte</span>[] ba =<span style="color: #000000;"&gt; str.getBytes(charset);
            </span><span style="color: #0000ff;"&gt;for</span> (<span style="color: #0000ff;"&gt;int</span> j = 0; j < ba.length; j++<span style="color: #000000;"&gt;) {
                out.append(</span>'%'<span style="color: #000000;"&gt;);
                </span><span style="color: #0000ff;"&gt;char</span> ch = Character.forDigit((ba[j] >> 4) &amp; 0xF,16<span style="color: #000000;"&gt;);
                </span><span style="color: #008000;"&gt;//</span><span style="color: #008000;"&gt; converting to use uppercase letter as part of
                </span><span style="color: #008000;"&gt;//</span><span style="color: #008000;"&gt; the hex value if ch is a letter.</span>
                <span style="color: #0000ff;"&gt;if</span><span style="color: #000000;"&gt; (Character.isLetter(ch)) {
                    ch </span>-=<span style="color: #000000;"&gt; caseDiff;
                }
                out.append(ch);
                ch </span>= Character.forDigit(ba[j] &amp; 0xF,16<span style="color: #000000;"&gt;);
                </span><span style="color: #0000ff;"&gt;if</span><span style="color: #000000;"&gt; (Character.isLetter(ch)) {
                    ch </span>-=<span style="color: #000000;"&gt; caseDiff;
                }
                out.append(ch);
            }
            charArrayWriter.reset();
            needToChange </span>= <span style="color: #0000ff;"&gt;true</span><span style="color: #000000;"&gt;;
        }
    }

    </span><span style="color: #0000ff;"&gt;return</span> (needToChange?<span style="color: #000000;"&gt; out.toString() : s);
}
}
验证下基本ok
（编辑：莱芜站长网）
【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!