如何使用sys.path_hooks自定义加载模块?

时间:2017-02-01 21:33:14

标签: python python-3.4 python-import python-module

我希望以下问题不会太久。但除此之外我无法用问题来解释我想要的东西:

How to use importlib to import modules from arbitrary sources?学到的(我昨天的问题) 我为新文件类型(.xxx)编写了一个specfic加载器。 (实际上xxx是pyc的加密版本,以防止代码被盗)。

我想添加导入挂钩以用于新文件类型" xxx"不以任何方式影响其他类型(.py,.pyc,.pyd)。

现在,加载程序为ModuleLoader,继承自mportlib.machinery.SourcelessFileLoader

使用sys.path_hooks加载器应作为钩子添加:

myFinder = importlib.machinery.FileFinder
loader_details = (ModuleLoader, ['.xxx'])
sys.path_hooks.append(myFinder.path_hook(loader_details))

注意:通过调用modloader.activateLoader()

激活此功能

加载名为test的模块(test.xxx)后,我得到:

>>> import modloader
>>> modloader.activateLoader()
>>> import test
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named 'test'
>>>

但是,当我在添加钩子之前删除sys.path_hooks的内容时:

sys.path_hooks = []
sys.path.insert(0, '.') # current directory
sys.path_hooks.append(myFinder.path_hook(loader_details))

它有效:

>>> modloader.activateLoader()
>>> import test
using xxx class

in xxxLoader exec_module
in xxxLoader get_code: .\test.xxx
ANALYZING ...

GENERATE CODE OBJECT ...

  2           0 LOAD_CONST               0
              3 LOAD_CONST               1 ('foo2')
              6 MAKE_FUNCTION            0
              9 STORE_NAME               0 (foo2)
             12 LOAD_CONST               2 (None)
             15 RETURN_VALUE
>>>>>> test
<module 'test' from '.\\test.xxx'>

将文件内容转换为代码对象后,可以正确导入模块。

但是我无法从包中加载相同的模块: import pack.test

注意:__init__.py当然是包目录中的空文件。

>>> import pack.test
Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 2218, in _find_and_load_unlocked
AttributeError: 'module' object has no attribute '__path__'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named 'pack.test'; 'pack' is not a package
>>>

还不行,我不能再从该软件包加载普通的* .py模块了:我得到了与上面相同的错误:

>>> import pack.testpy
Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 2218, in _find_and_load_unlocked
AttributeError: 'module' object has no attribute '__path__'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named 'pack.testpy'; 'pack' is not a package
>>>

根据我的理解,遍历sys.path_hooks直到最后一个条目被尝试。那么为什么第一个变体(不删除sys.path_hooks)没有识别新的扩展名&#34; xxx&#34;和第二个变体(删除sys.path_hooks)吗? 当sys.path_hooks的条目无法识别&#34; xxx&#34;时,看起来机器正在抛出异常,而不是进一步遍历下一个条目。

为什么当前目录中的py,pyc和xxx模块的第二个版本正常工作,包pack中的但不能正常工作?我希望py和pyc甚至不能在当前目录中工作,因为sys.path_hooks只包含&#34; xxx&#34; ...

的钩子

2 个答案:

答案 0 :(得分:4)

简短的回答是sys.meta_path中的默认路径查找器并不意味着在它已支持的相同路径中添加新的文件扩展名和导入程序。但仍有希望!

快速细分

sys.path_hooksimportlib._bootstrap_external.PathFinder类使用。

当导入发生时,sys.meta_path中的每个条目都会被要求查找所请求模块的匹配规范。然后,路径查找器将获取sys.path的内容并将其传递给sys.path_hooks中的工厂函数。每个工厂函数都有机会引发ImportError(基本上工厂说“不,我不支持此路径条目”)或返回该路径的查找程序实例。然后,第一个成功返回的查找程序将缓存在sys.path_importer_cache中。从那时起,PathFinder将只询问那些缓存的finder实例是否可以提供所请求的模块。

如果查看sys.path_importer_cache的内容,您会看到sys.path中的所有目录条目都已映射到FileFinder实例。非目录条目(zip文件等)将映射到其他查找程序。

因此,如果您将通过FileFinder.path_hook创建的新工厂附加到sys.path_hooks,则只有在前一个FileFinder挂钩不接受该路径时才会调用您的工厂。这是不太可能的,因为FileFinder可以在任何现有目录上工作。

或者,如果在现有工厂之前将新工厂插入sys.path_hooks,则只有在新工厂不接受路径时才会使用默认挂钩。而且,由于FileFinder如此自由,它会接受,这将导致只使用你的装载机,正如你已经观察到的那样。

让它发挥作用

因此,您可以尝试调整现有工厂以支持您的文件扩展名和导入程序(这很困难,因为导入器和扩展字符串元组保存在一个闭包中),或者做我最后做的事情,即添加一个新的元路径查找器。

所以,例如。来自我自己的项目,


import sys

from importlib.abc import FileLoader
from importlib.machinery import FileFinder, PathFinder
from os import getcwd
from os.path import basename

from sibilant.module import prep_module, exec_module


SOURCE_SUFFIXES = [".lspy", ".sibilant"]


_path_importer_cache = {}
_path_hooks = []


class SibilantPathFinder(PathFinder):
    """
    An overridden PathFinder which will hunt for sibilant files in
    sys.path. Uses storage in this module to avoid conflicts with the
    original PathFinder
    """


    @classmethod
    def invalidate_caches(cls):
        for finder in _path_importer_cache.values():
            if hasattr(finder, 'invalidate_caches'):
                finder.invalidate_caches()


    @classmethod
    def _path_hooks(cls, path):
        for hook in _path_hooks:
            try:
                return hook(path)
            except ImportError:
                continue
        else:
            return None


    @classmethod
    def _path_importer_cache(cls, path):
        if path == '':
            try:
                path = getcwd()
            except FileNotFoundError:
                # Don't cache the failure as the cwd can easily change to
                # a valid directory later on.
                return None
        try:
            finder = _path_importer_cache[path]
        except KeyError:
            finder = cls._path_hooks(path)
            _path_importer_cache[path] = finder
        return finder


class SibilantSourceFileLoader(FileLoader):


    def create_module(self, spec):
        return None


    def get_source(self, fullname):
        return self.get_data(self.get_filename(fullname)).decode("utf8")


    def exec_module(self, module):
        name = module.__name__
        source = self.get_source(name)
        filename = basename(self.get_filename(name))

        prep_module(module)
        exec_module(module, source, filename=filename)


def _get_lspy_file_loader():
    return (SibilantSourceFileLoader, SOURCE_SUFFIXES)


def _get_lspy_path_hook():
    return FileFinder.path_hook(_get_lspy_file_loader())


def _install():
    done = False

    def install():
        nonlocal done
        if not done:
            _path_hooks.append(_get_lspy_path_hook())
            sys.meta_path.append(SibilantPathFinder)
            done = True

    return install


_install = _install()
_install()

SibilantPathFinder会覆盖PathFinder并仅替换那些引用sys.path_hooksys.path_importer_cache的方法,这些方法具有类似的实现,而不是查看此模块本地的_path_hook_path_importer_cache

导入期间,现有的路径查找器将尝试查找匹配的模块。如果不能,那么我注入的SibilantPathFinder将重新遍历sys.path并尝试找到与我自己的文件扩展名匹配的内容。

搞清楚

我最终深入研究了_bootstrap_external模块的源代码 https://github.com/python/cpython/blob/master/Lib/importlib/_bootstrap_external.py

_install函数和PathFinder.find_spec方法是了解事情为何如此运作的最佳起点。

答案 1 :(得分:1)

@ obriencj对情况的分析是正确的。但是我提出了一个不同的解决方案,不需要在sys.meta_path中添加任何内容。相反,它在sys.path_hooks中安装了一个特殊的钩子,它几乎就像PathFinder中的sys.meta_pathsys.path_hooks中的钩子一样,而不是...只使用第一个勾“我可以处理这条路!”它按顺序尝试所有匹配的钩子,直到它找到一个实际从其ModuleSpec方法返回有用的find_spec的钩子:

@PathEntryFinder.register
class MetaFileFinder:
    """
    A 'middleware', if you will, between the PathFinder sys.meta_path hook,
    and sys.path_hooks hooks--particularly FileFinder.

    The hook returned by FileFinder.path_hook is rather 'promiscuous' in that
    it will handle *any* directory.  So if one wants to insert another
    FileFinder.path_hook into sys.path_hooks, that will totally take over
    importing for any directory, and previous path hooks will be ignored.

    This class provides its own sys.path_hooks hook as follows: If inserted
    on sys.path_hooks (it should be inserted early so that it can supersede
    anything else).  Its find_spec method then calls each hook on
    sys.path_hooks after itself and, for each hook that can handle the given
    sys.path entry, it calls the hook to create a finder, and calls that
    finder's find_spec.  So each sys.path_hooks entry is tried until a spec is
    found or all finders are exhausted.
    """

    class hook:
        """
        Use this little internal class rather than a function with a closure
        or a classmethod or anything like that so that it's easier to
        identify our hook and skip over it while processing sys.path_hooks.
        """

        def __init__(self, basepath=None):
            self.basepath = os.path.abspath(basepath)

        def __call__(self, path):
            if not os.path.isdir(path):
                raise ImportError('only directories are supported', path=path)
            elif not self.handles(path):
                raise ImportError(
                    'only directories under {} are supported'.format(
                        self.basepath), path=path)

            return MetaFileFinder(path)

        def handles(self, path):
            """
            Return whether this hook will handle the given path, depending on
            what its basepath is.
            """

            path = os.path.abspath(path)

            return (self.basepath is None or
                    os.path.commonpath([self.basepath, path]) == self.basepath)

    def __init__(self, path):
        self.path = path
        self._finder_cache = {}

    def __repr__(self):
        return '{}({!r})'.format(self.__class__.__name__, self.path)

    def find_spec(self, fullname, target=None):
        if not sys.path_hooks:
            return None

        last = len(sys.path_hooks) - 1

        for idx, hook in enumerate(sys.path_hooks):
            if isinstance(hook, self.__class__.hook):
                continue

            finder = None
            try:
                if hook in self._finder_cache:
                    finder = self._finder_cache[hook]
                    if finder is None:
                        # We've tried this finder before and got an ImportError
                        continue
            except TypeError:
                # The hook is unhashable
                pass

            if finder is None:
                try:
                    finder = hook(self.path)
                except ImportError:
                    pass

            try:
                self._finder_cache[hook] = finder
            except TypeError:
                # The hook is unhashable for some reason so we don't bother
                # caching it
                pass

            if finder is not None:
                spec = finder.find_spec(fullname, target)
                if (spec is not None and
                        (spec.loader is not None or idx == last)):
                    # If no __init__.<suffix> was found by any Finder,
                    # we may be importing a namespace package (which
                    # FileFinder.find_spec returns in this case).  But we
                    # only want to return the namespace ModuleSpec if we've
                    # exhausted every other finder first.
                    return spec

        # Module spec not found through any of the finders
        return None

    def invalidate_caches(self):
        for finder in self._finder_cache.values():
            finder.invalidate_caches()

    @classmethod
    def install(cls, basepath=None):
        """
        Install the MetaFileFinder in the front sys.path_hooks, so that
        it can support any existing sys.path_hooks and any that might
        be appended later.

        If given, only support paths under and including basepath.  In this
        case it's not necessary to invalidate the entire
        sys.path_importer_cache, but only any existing entries under basepath.
        """

        if basepath is not None:
            basepath = os.path.abspath(basepath)

        hook = cls.hook(basepath)
        sys.path_hooks.insert(0, hook)
        if basepath is None:
            sys.path_importer_cache.clear()
        else:
            for path in list(sys.path_importer_cache):
                if hook.handles(path):
                    del sys.path_importer_cache[path]

这仍然是令人沮丧的,更加复杂,而不是必要的。我觉得在Python 2上,在导入系统重写之前,这样做要简单得多,因为对内置模块类型(.py等)的支持较少是建立在导入钩子之上的它们本身,因此通过添加钩子来导入新模块类型来破坏导入普通模块更加困难。我将开始讨论python-ideas,看看是否有任何方法我们无法改善这种情况。