高效搜索地址簿

时间:2011-07-08 16:42:12

标签: java algorithm search

假设我有一个对象的名字,姓氏和电子邮件(称为AddressBook)。我使用什么数据结构(在Java中)来存储这个对象,这样我就需要按照以某些字母开头的姓氏进行搜索。例如,如果姓氏是约翰或约翰逊,我通过约翰查询,它应该返回所有具有以约翰开头的姓氏的AddressBook对象。

最有效的方法是什么?

假设有10个姓氏为X的人,那么我可以将此X作为键的一个键,其值为包含10个AddressBook对象的列表。

这里姓氏可以是X,X1,X12,X13,XA。需要帮助。

4 个答案:

答案 0 :(得分:3)

取一个trie:每个节点都将与姓氏以该节点中包含的字符串开头的人员组相关联。

[更新] 更好地了解Patricia Trie。或者甚至更好地将此代码作为trie的示例,它允许在前缀的末尾标记带有通配符“*”的节点,因此您可以查找“John *”之类的内容:

/**
 * Tatiana trie -- variant of Patricia trie
 * (http://en.wikipedia.org/wiki/Patricia_tree, http://en.wikipedia.org/wiki/Trie)
 * in which edge associated not with substring, but with regular expressions.
 * <b>Every child node RE defines match-set which is proper subset of parent node ER's match-set.</b>
 * <b>Null keys aren't permitted</b>
 * <p/>
 * Following wildcards <b>at the end</b> of RE are accepted:
 * * -- any string of length >= 0
 * <p/>
 * Examples of valid RE: <pre>a, abra*, </pre>
 * Example of invalid RE: <pre>a*a</pre>
 * <p/>
 */
public class TatianaTree<T> {
    private final Node<T> root;

    /**
     * Creates tree with <code>null</code> associated with root node.
     */
    public TatianaTree() {
        this(null);
    }

    /**
     * Creates tree with given element associated with root node.
     *
     * @param el element to associate with root
     */
    public TatianaTree(T el) {
        root = new Node<T>(null, el);
    }

    public TatianaTree<T> add(Node<T> node) {
        if (null == node.key)
            throw new IllegalArgumentException("Can't add node with null key");
        root.add(node);
        return this;
    }

    public Node<T> findNode(String key) {
        return root.findNode(Node.normalize(key));
    }

    /**
     * Removes most-specific node which matches given key.
     *
     * @return element of type <code>T</code> associated with deleted node or <code>null</code>.
     *         The only case when <code>null</code> will be returned is when root node corresponds to given key.
     */
    public T remove(String key) {
        Node<T> node = findNode(key);
        if (root == node)
            return null;

        node.removeSelf();
        return node.el;
    }

    public Node<T> getRoot() {
        return root;
    }

    public static class Node<T> {
        private static final String INTERNAL_ROOT_KEY = ".*";
        private static final String ROOT_KEY = "*";
        private static final Pattern KEY_WRONG_FORMAT_PATTERN = Pattern.compile("\\*.+");

        private static final String ROOT_KEY_IN_NODE_MSG = "Can't add non-root node with root key";
        private static final String WRONG_KEY_FORMAT_MSG = "Valid format is ^[A-Za-z0-9_]+(\\*){0,1}$";

        private final String key;
        private final T el;
        private final List<Node<T>> children = new ArrayList<Node<T>>();
        private Node<T> parent;

        public Node(String key) {
            this(key, null);
        }

        public Node(String key, T el) {
            String k = INTERNAL_ROOT_KEY;
            if (null != key) {
                k = normalize(key);
            }

            this.key = k;
            this.el = el;
            this.parent = null;
        }

        /**
         * Subset-check function.
         *
         * @param s    string to check
         * @param base string to check against
         * @return <code>true</code> if base is superset of s, <code>false</code> otherwise
         */
        private boolean isSubset(String s, String base) {
            String shortestS = s.replaceFirst("\\*$", "");
            String baseRE = "^" + base;
            Pattern p = Pattern.compile(baseRE);
            return p.matcher(shortestS).matches();
        }

        public T getEl() {
            return el;
        }

        private void add(Node<T> node) {
            boolean addHere = true;

            for (Node<T> child : children) {
                if (isSubset(child.key, node.key)) {
                    insertAbove(node);
                    addHere = false;
                    break;
                } else if (isSubset(node.key, child.key)) {
                    child.add(node);
                    addHere = false;
                    break;
                }
            }
            if (addHere) {
                children.add(node);
                node.parent = this;
            }
        }

        private void insertAbove(Node<T> newSibling) {
            List<Node<T>> thisChildren = new ArrayList<Node<T>>(),
                    newNodeChildren = new ArrayList<Node<T>>();
            for (Node<T> child : children) {
                if (isSubset(child.key, newSibling.key)) {
                    newNodeChildren.add(child);
                    child.parent = newSibling;
                } else {
                    thisChildren.add(child);
                }
            }
            newSibling.children.clear();
            newSibling.children.addAll(newNodeChildren);

            this.children.clear();
            this.children.addAll(thisChildren);
            this.children.add(newSibling);
            newSibling.parent = this;
        }

        private Node<T> findNode(String key) {
            for (Node<T> child : children) {
                if (isSubset(key, child.key))
                    return child.findNode(key);
            }
            return this;
        }

        public int getChildrenCount() {
            return children.size();
        }

        private static String normalize(String k) {
            if (ROOT_KEY.equals(k))
                throw new IllegalArgumentException(ROOT_KEY_IN_NODE_MSG);
            k = k.replaceFirst("\\*$", ".*").replaceAll("\\[", "\\\\[").replaceAll("\\]", "\\\\]");
            Matcher m = KEY_WRONG_FORMAT_PATTERN.matcher(k);
            if (m.find())
                throw new IllegalArgumentException(WRONG_KEY_FORMAT_MSG);
            return k;
        }

        private void removeSelf() {
            parent.children.remove(this);
            for (TatianaTree.Node<T> child : children)
                child.parent = parent;
            parent.children.addAll(children);
        }
    }
}

答案 1 :(得分:0)

我认为要搜索的字母数量不固定。也就是说,今天你可以查找以“John”开头的所有姓氏,但明天你可能会寻找“Joh”或“Johnb”。

如果是这种情况,则散列图不起作用,因为它不包含前缀的概念。您散列整个键值,只是因为John

如果你加载一次列表然后保持它并且它没有改变,我认为最实际的解决方案是为每个名称创建一个对象,然后创建一个指向这些的指针数组对象,并按姓氏对数组进行排序。然后使用Arrays.binarySearch查找具有给定前缀和循环的第一条记录,直到找到最后一条记录。

如果列表非常动态,我首先想到的是创建链表,并为链表中的选定点创建一组“索引指针”,如第一个A,第一个B等。从那里搜索。

如果列表既动态又太大而无法使用“索引标签”方法,那么我认为你实际的选择是将它存储在数据库中并使用数据库索引检索功能,还是做一个整体编写完整的内存索引方案的工作。坦率地说,这对我来说听起来太过分了。 (也许有一些开源软件包可以解决这个问题。)如果你真的在内存而不是数据库中维护大量数据,也许你应该问问自己为什么。这就是数据库的用途。

答案 2 :(得分:0)

这是一个经典的树解决方案。你建造一个这样的树:

             root
             x / \y 
             1/\2
            2/\3

当你开始搜索时,你走在树上。如果你有x,那么你去x,所有的名字都是那个节点下的子树......等等。您可以使用简单的递归方法来收集名称。

我认为,最简单的实现是HashMap节点,即每个节点由哈希映射构建,其中包含每个字符的匹配节点。这棵树也是动态的,很容易添加插入,删除等等。

答案 3 :(得分:0)

如果您只是在撰写问题(或任何其他单个字段)时按姓氏搜索,则可以使用TreeMap&lt;&gt;作为具有有效前缀查找的数据结构。

public class Phonebook {
    NavigableMap<String, Collection<Record>> map = new TreeMap<>();

    public void add(String name, String phone) {
        map.computeIfAbsent(name, k -> new ArrayList<>()).add(new Record(name, phone));
    }

    public Collection<Record> lookup(String prefix) {
        if (prefix.length() == 0) {
            return Collections.emptyList(); // or all values if needed
        }

        String from = prefix;

        char[] chars = prefix.toCharArray();
        chars[chars.length - 1]++;

        String to = new String(chars);

        Collection<Record> result = new ArrayList<>();

        map.subMap(from, to).values().forEach(result::addAll);

        return result;
    }

    private static class Record {
        private final String name;
        private final String phone;

        public Record(String name, String phone) {
            this.name = name;
            this.phone = phone;
        }

        @Override
        public String toString() {
            return "Record{" +
                    "name='" + name + '\'' +
                    ", phone='" + phone + '\'' +
                    '}';
        }
    }

    public static void main(String... args) {
        // example
        Phonebook book = new Phonebook();
        book.add("john", "1");
        book.add("john", "2");
        book.add("johnny", "3");
        book.add("joho", "4"); // joho will be computed as a value of 'to' parameter.

        Collection<Record> records = book.lookup("john");
        System.out.println("records = " + records);
    }
}