在“桶”中组织文件

时间:2009-04-27 07:17:47

标签: java file-io java-io

我的问题是这样的:

我有一个客户端HTTP缓存,我需要以某种方式将HTTP有效负载存储在文件系统中。我不想用不必要的文件来混乱文件系统。

我写过这堂课:


/*
 * Copyright (c) 2008, The Codehaus. All Rights Reserved.
 *
 *   Licensed under the Apache License, Version 2.0 (the "License");
 *   you may not use this file except in compliance with the License.
 *   You may obtain a copy of the License at
 *   http://www.apache.org/licenses/LICENSE-2.0
 *
 *   Unless required by applicable law or agreed to in writing, software
 *   distributed under the License is distributed on an "AS IS" BASIS,
 *   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 *   See the License for the specific language governing permissions and
 *   limitations under the License.
 *
 */

package org.codehaus.httpcache4j.cache;

import org.apache.commons.lang.Validate;
import org.apache.commons.io.filefilter.AndFileFilter;
import org.apache.commons.io.filefilter.DirectoryFileFilter;
import org.apache.commons.io.filefilter.RegexFileFilter;

import org.codehaus.httpcache4j.util.DeletingFileFilter;

import java.io.File;
import java.io.FileFilter;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;

/**
 * This class is internal and should never be used by clients. 
 *
 * Responsible for creating and maintaining a "Pool" of file generations. 
* The files are promoted when they are accessed, so we can figure out which files that are OK to delete.
* Known Gotchas: This needs to be in sync with the size of the storage engine.
* If you have too few generations when you have many items in the cache, you might * be missing some files when you try to access them. * * Note from Despot: I am looking into another way of storing files, so this class might go away at some point, * or change to a different form. * */ class FileGenerationManager implements Serializable{ private static final long serialVersionUID = -1558644426181861334L; private final File baseDirectory; private final int generationSize; private final int numberOfGenerations; private final FileFilter generationFilter; public FileGenerationManager(final File baseDirectory, final int numberOfGenerations) { this(baseDirectory, numberOfGenerations, 100); } public FileGenerationManager(final File baseDirectory, final int numberOfGenerations, final int generationSize) { Validate.isTrue(numberOfGenerations > 0, "You may not create 0 generations"); Validate.notNull(baseDirectory, "You may not have a null base directory"); if (!baseDirectory.exists()) { Validate.isTrue(baseDirectory.mkdirs(), "Could not create base directory: " + baseDirectory); } this.baseDirectory = baseDirectory; this.generationSize = generationSize; this.numberOfGenerations = numberOfGenerations; generationFilter = new AndFileFilter(DirectoryFileFilter.DIRECTORY, new RegexFileFilter("[0-9]*")); getGenerations(); } /** * Creates generations of the directories in the base directory. * * @return the created generations. */ //TODO: Is this heavy? //TODO: Maybe we should do this when we miss in getFile() ? public synchronized List getGenerations() { final List generations = new ArrayList(); //handle existing generations... File[] directories = baseDirectory.listFiles(generationFilter); if (directories.length > 0) { for (File directory : directories) { generations.add(new Generation(baseDirectory, Integer.parseInt(directory.getName()))); } } else { generations.add(new Generation(baseDirectory, 1)); } Collections.sort(generations); Generation currentGeneration = generations.get(0); if (currentGeneration.getGenerationDirectory().list().length > generationSize) { generations.add(0, new Generation(baseDirectory, currentGeneration.getSequence() + 1)); removeLastGeneration(generations); } while (generations.size() > numberOfGenerations) { removeLastGeneration(generations); } return Collections.unmodifiableList(generations); } private void removeLastGeneration(List generations) { if (generations.size() > numberOfGenerations) { Generation generation = generations.remove(generations.size() - 1); generation.delete(); } } /** * Returns the most recent created generation * * @return the generation with the highest sequence number */ synchronized Generation getCurrentGeneration() { return getGenerations().get(0); } public synchronized File getFile(String fileName) { File target = new File(getCurrentGeneration().getGenerationDirectory(), fileName); for (Generation generation : getGenerations()) { File candidate = new File(generation.getGenerationDirectory(), fileName); if (candidate.exists()) { if (!target.equals(candidate)) { //because of; http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4017593 target.delete(); if (!candidate.renameTo(target)) { return candidate; } else { break; } } } } return target; } static class Generation implements Comparable { private File generationDirectory; private int sequence; public Generation(final File baseDir, final int generationNumber) { Validate.notNull(baseDir, "Generation directory may not be null"); File genFile = new File(baseDir, String.valueOf(generationNumber)); genFile.mkdirs(); this.generationDirectory = genFile; this.sequence = generationNumber; } public synchronized void delete() { File[] undeleteableFiles = generationDirectory.listFiles(new DeletingFileFilter()); if (undeleteableFiles == null || undeleteableFiles.length == 0) { generationDirectory.delete(); } else { System.err.println("Unable to delete these files: " + Arrays.toString(undeleteableFiles)); } } public File getGenerationDirectory() { return generationDirectory; } public int getSequence() { return sequence; } public int compareTo(Generation generation) { return 1 - (sequence - generation.sequence); } } }

问题是有时文件没有移动到正确的文件夹,我可能会泄漏文件描述符。

您对如何改善这一点有什么建议吗?

这可能有一个标准的解决方案吗?不管语言?

这也很慢,欢迎加速。

1 个答案:

答案 0 :(得分:3)

您的性能问题(可能是错误)可能是由于在标记代数时过度使用文件系统造成的,而不是将此信息存储在内存中。文件系统访问比内存访问要昂贵得多 - 特别是 File.listFiles()或File.list()可能非常慢。如果你有几千个文件,那么期望它使用NTFS在Windows系统上执行而不是毫秒。

如果可能,所有世代信息都应作为同步集合中的对象进行存储和更新。如果您只是使用文件系统来实际存储,检索和删除缓存的数据文件,您可以将所有缓存文件粘贴到一个目录中并随意调用它们(只需为文件提供一个数字或随机名称)。

如果生成缓存信息需要持久且安全,以防止突然的应用程序关闭,您可以使用序列化集合并定期将其写入磁盘(例如,每30秒一次,并在应用程序关闭时再次)。由于它只是一个缓存,您可以检查应用程序启动并删除没有缓存条目的实际文件和删除文件的缓存条目。

或者,您可能会考虑使用嵌入式数据库来存储整个缓存。 H2或HSQLDB是纯Java,非常快速和轻量级,并且支持更快的内存数据库和嵌入式模式。这样可以存储更多缓存对象,并且速度可能更快,因为DBMS可以将常用项目缓存在RAM中。