我正在尝试从此目录下载所有文件。但是,我只能将它作为一个文件下载。我能做什么?我试图搜索这个问题,这让人感到困惑,人们开始建议使用httpclients。感谢您的帮助,这是我的代码到目前为止。有人建议我使用输入流来获取目录中的所有文件。那么那会进入阵列吗?我在这里尝试了教程http://docs.oracle.com/javase/tutorial/networking/urls/,但这对我没有帮助。
//ProgressBar/Install
String URL_LOCATION = "http://www.futureretrogaming.tk/gamefiles/ProfessorPhys/";
String LOCAL_FILE = filelocation.getText() + "\\ProfessorPhys\\";
try {
java.net.URL url = new URL(URL_LOCATION);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.addRequestProperty("User-Agent", "Mozilla/4.76");
//URLConnection connection = url.openConnection();
BufferedInputStream stream = new BufferedInputStream(connection.getInputStream());
int available = stream.available();
byte b[]= new byte[available];
stream.read(b);
File file = new File(LOCAL_FILE);
OutputStream out = new FileOutputStream(file);
out.write(b);
} catch (Exception e) {
System.err.println(e);
}
我还发现此代码将返回要下载的文件列表。有人可以帮我合并这两个代码吗?
public class GetAllFilesInDirectory {
public static void main(String[] args) throws IOException {
File dir = new File("dir");
System.out.println("Getting all files in " + dir.getCanonicalPath() + " including those in subdirectories");
List<File> files = (List<File>) FileUtils.listFiles(dir, TrueFileFilter.INSTANCE, TrueFileFilter.INSTANCE);
for (File file : files) {
System.out.println("file: " + file.getCanonicalPath());
}
}
}
答案 0 :(得分:5)
您需要下载页面,目录列表,解析它,然后下载页面中链接的inidiviudal文件......
你可以做点像......
URL url = new URL("http:www.futureretrogaming.tk/gamefiles/ProfessorPhys");
InputStream is = null;
try {
is = url.openStream();
byte[] buffer = new byte[1024];
int bytesRead = -1;
StringBuilder page = new StringBuilder(1024);
while ((bytesRead = is.read(buffer)) != -1) {
page.append(new String(buffer, 0, bytesRead));
}
// Spend the rest of your life using String methods
// to parse the result...
} catch (IOException ex) {
ex.printStackTrace();
} finally {
try {
is.close();
} catch (Exception e) {
}
}
或者,您可以下载Jsoup并使用它来完成所有艰苦工作......
try {
Document doc = Jsoup.connect("http:www.futureretrogaming.tk/gamefiles/ProfessorPhys").get();
Elements links = doc.getElementsByTag("a");
for (Element link : links) {
System.out.println(link.attr("href") + " - " + link.text());
}
} catch (IOException ex) {
ex.printStackTrace();
}
哪个输出......
?C=N;O=D - Name
?C=M;O=A - Last modified
?C=S;O=A - Size
?C=D;O=A - Description
/gamefiles/ - Parent Directory
Assembly-CSharp-Editor-firstpass-vs.csproj - Assembly-CSharp-Edit..>
Assembly-CSharp-Editor-firstpass.csproj - Assembly-CSharp-Edit..>
Assembly-CSharp-Editor-firstpass.pidb - Assembly-CSharp-Edit..>
Assembly-CSharp-firstpass-vs.csproj - Assembly-CSharp-firs..>
Assembly-CSharp-firstpass.csproj - Assembly-CSharp-firs..>
Assembly-CSharp-firstpass.pidb - Assembly-CSharp-firs..>
Assembly-CSharp-vs.csproj - Assembly-CSharp-vs.c..>
Assembly-CSharp.csproj - Assembly-CSharp.csproj
Assembly-CSharp.pidb - Assembly-CSharp.pidb
Assembly-UnityScript-Editor-firstpass-vs.unityproj - Assembly-UnityScript..>
Assembly-UnityScript-Editor-firstpass.pidb - Assembly-UnityScript..>
Assembly-UnityScript-Editor-firstpass.unityproj - Assembly-UnityScript..>
Assembly-UnityScript-firstpass-vs.unityproj - Assembly-UnityScript..>
Assembly-UnityScript-firstpass.pidb - Assembly-UnityScript..>
Assembly-UnityScript-firstpass.unityproj - Assembly-UnityScript..>
Assembly-UnityScript-vs.unityproj - Assembly-UnityScript..>
Assembly-UnityScript.pidb - Assembly-UnityScript..>
Assembly-UnityScript.unityproj - Assembly-UnityScript..>
Assets/ - Assets/
Library/ - Library/
Professor%20Phys-csharp.sln - Professor Phys-cshar..>
Professor%20Phys.exe - Professor Phys.exe
Professor%20Phys.sln - Professor Phys.sln
Professor%20Phys.userprefs - Professor Phys.userp..>
Professor%20Phys_Data/ - Professor Phys_Data/
Script.doc - Script.doc
~$Script.doc - ~$Script.doc
~WRL0392.tmp - ~WRL0392.tmp
~WRL1966.tmp - ~WRL1966.tmp
然后,您需要为每个文件构建一个新URL,并按照您已经完成的内容进行阅读...
例如,href
的{{1}}为Assembly-CSharp-Edit..>
,它出现在相对链接中,因此您需要在Assembly-CSharp-Editor-firstpass-vs.csproj
前加上前缀以生成新的http://www.futureretrogaming.tk/gamefiles/ProfessorPhys
URL
1 {} http://www.futureretrogaming.tk/gamefiles/ProfessorPhys/Assembly-CSharp-Editor-firstpass-vs.csproj
您需要为要抓取的每个元素执行此操作
答案 1 :(得分:0)
您是否考虑过像HTTrack这样的工具,它可以检测HTML上锚标记的存在并下载整个网站(受树级限制)。您还可以指定筛选应下载的文件等
如果这不符合您的要求,您仍然可以使用手写的Java程序,除了问题是获取URL中的文件列表(以及其中的所有子文件夹)。您需要解析HTML,收集所有锚标记并遍历它(这是HTTrack正在做的事情)