Question

我正在尝试从此目录下载所有文件。但是，我只能将它作为一个文件下载。我能做什么？我试图搜索这个问题，这让人感到困惑，人们开始建议使用httpclients。感谢您的帮助，这是我的代码到目前为止。有人建议我使用输入流来获取目录中的所有文件。那么那会进入阵列吗？我在这里尝试了教程http://docs.oracle.com/javase/tutorial/networking/urls/，但这对我没有帮助。

//ProgressBar/Install
            String URL_LOCATION = "http://www.futureretrogaming.tk/gamefiles/ProfessorPhys/";
            String LOCAL_FILE = filelocation.getText() + "\\ProfessorPhys\\";
            try {
                java.net.URL url = new URL(URL_LOCATION);
                HttpURLConnection connection = (HttpURLConnection) url.openConnection(); 
                connection.addRequestProperty("User-Agent", "Mozilla/4.76"); 
                //URLConnection connection = url.openConnection();
                BufferedInputStream stream = new BufferedInputStream(connection.getInputStream());
                int available = stream.available();
                byte b[]= new byte[available];
                stream.read(b);
                File file = new File(LOCAL_FILE);
                OutputStream out  = new FileOutputStream(file);
                out.write(b);
            } catch (Exception e) {
                System.err.println(e);
            }

我还发现此代码将返回要下载的文件列表。有人可以帮我合并这两个代码吗？

public class GetAllFilesInDirectory {

public static void main(String[] args) throws IOException {

    File dir = new File("dir");

    System.out.println("Getting all files in " + dir.getCanonicalPath() + " including those in subdirectories");
    List<File> files = (List<File>) FileUtils.listFiles(dir, TrueFileFilter.INSTANCE, TrueFileFilter.INSTANCE);
    for (File file : files) {
        System.out.println("file: " + file.getCanonicalPath());
    }

}

}

Answer 1

您需要下载页面，目录列表，解析它，然后下载页面中链接的inidiviudal文件......

你可以做点像......

URL url = new URL("http:www.futureretrogaming.tk/gamefiles/ProfessorPhys");
InputStream is = null;
try {
    is = url.openStream();
    byte[] buffer = new byte[1024];
    int bytesRead = -1;
    StringBuilder page = new StringBuilder(1024);
    while ((bytesRead = is.read(buffer)) != -1) {
        page.append(new String(buffer, 0, bytesRead));
    }
    // Spend the rest of your life using String methods
    // to parse the result...
} catch (IOException ex) {
    ex.printStackTrace();
} finally {
    try {
        is.close();
    } catch (Exception e) {
    }
}

或者，您可以下载Jsoup并使用它来完成所有艰苦工作......

try {
    Document doc = Jsoup.connect("http:www.futureretrogaming.tk/gamefiles/ProfessorPhys").get();
    Elements links = doc.getElementsByTag("a");
    for (Element link : links) {
        System.out.println(link.attr("href") + " - " + link.text());
    }
} catch (IOException ex) {
    ex.printStackTrace();
}

哪个输出......

?C=N;O=D - Name
?C=M;O=A - Last modified
?C=S;O=A - Size
?C=D;O=A - Description
/gamefiles/ - Parent Directory
Assembly-CSharp-Editor-firstpass-vs.csproj - Assembly-CSharp-Edit..>
Assembly-CSharp-Editor-firstpass.csproj - Assembly-CSharp-Edit..>
Assembly-CSharp-Editor-firstpass.pidb - Assembly-CSharp-Edit..>
Assembly-CSharp-firstpass-vs.csproj - Assembly-CSharp-firs..>
Assembly-CSharp-firstpass.csproj - Assembly-CSharp-firs..>
Assembly-CSharp-firstpass.pidb - Assembly-CSharp-firs..>
Assembly-CSharp-vs.csproj - Assembly-CSharp-vs.c..>
Assembly-CSharp.csproj - Assembly-CSharp.csproj
Assembly-CSharp.pidb - Assembly-CSharp.pidb
Assembly-UnityScript-Editor-firstpass-vs.unityproj - Assembly-UnityScript..>
Assembly-UnityScript-Editor-firstpass.pidb - Assembly-UnityScript..>
Assembly-UnityScript-Editor-firstpass.unityproj - Assembly-UnityScript..>
Assembly-UnityScript-firstpass-vs.unityproj - Assembly-UnityScript..>
Assembly-UnityScript-firstpass.pidb - Assembly-UnityScript..>
Assembly-UnityScript-firstpass.unityproj - Assembly-UnityScript..>
Assembly-UnityScript-vs.unityproj - Assembly-UnityScript..>
Assembly-UnityScript.pidb - Assembly-UnityScript..>
Assembly-UnityScript.unityproj - Assembly-UnityScript..>
Assets/ - Assets/
Library/ - Library/
Professor%20Phys-csharp.sln - Professor Phys-cshar..>
Professor%20Phys.exe - Professor Phys.exe
Professor%20Phys.sln - Professor Phys.sln
Professor%20Phys.userprefs - Professor Phys.userp..>
Professor%20Phys_Data/ - Professor Phys_Data/
Script.doc - Script.doc
~$Script.doc - ~$Script.doc
~WRL0392.tmp - ~WRL0392.tmp
~WRL1966.tmp - ~WRL1966.tmp

然后，您需要为每个文件构建一个新URL，并按照您已经完成的内容进行阅读...

例如，href的{{1}}为Assembly-CSharp-Edit..>，它出现在相对链接中，因此您需要在Assembly-CSharp-Editor-firstpass-vs.csproj前加上前缀以生成新的http://www.futureretrogaming.tk/gamefiles/ProfessorPhys URL 1 {} http://www.futureretrogaming.tk/gamefiles/ProfessorPhys/Assembly-CSharp-Editor-firstpass-vs.csproj

您需要为要抓取的每个元素执行此操作

Answer 2

您是否考虑过像HTTrack这样的工具，它可以检测HTML上锚标记的存在并下载整个网站（受树级限制）。您还可以指定筛选应下载的文件等

如果这不符合您的要求，您仍然可以使用手写的Java程序，除了问题是获取URL中的文件列表（以及其中的所有子文件夹）。您需要解析HTML，收集所有锚标记并遍历它（这是HTTrack正在做的事情）

Java下载目录中的所有文件和文件夹

2 个答案: