How to organize multiple python files into a single module without it behaving like a package?

32,010

Solution 1

You can sort of do it, but it's not really a good idea and you're fighting against the way Python modules/packages are supposed to work. By importing appropriate names in __init__.py you can make them accessible in the package namespace. By deleting module names you can make them inaccessible. (For why you need to delete them, see this question). So you can get close to what you want with something like this (in __init__.py):

from another_class import doit
from another_class import dataholder
from descriptive_name import getSomeStuff
from descriptive_name import hold_more_data
del another_class, descriptive_name
__all__ = ['doit', 'dataholder', 'getSomeStuff', 'hold_more_data']

However, this will break subsequent attempts to import package.another_class. In general, you can't import anything from a package.module without making package.module accessible as an importable reference to that module (although with the __all__ you can block from package import module).

More generally, by splitting up your code by class/function you are working against the Python package/module system. A Python module should generally contain stuff you want to import as a unit. It's not uncommon to import submodule components directly in the top-level package namespace for convenience, but the reverse --- trying to hide the submodules and allow access to their contents only through the top-level package namespace --- is going to lead to problems. In addition, there is nothing to be gained by trying to "cleanse" the package namespace of the modules. Those modules are supposed to be in the package namespace; that's where they belong.

Solution 2

Define __all__ = ['names', 'that', 'are', 'public'] in the __init__.py e.g.:

__all__ = ['foo']

from ._subpackage import foo

Real-world example: numpy/__init__.py.


You have some misconception about how Python packages work:

If I do not use an __init__.py, I cannot import anything because Python doesn't descend into the folder from sys.path.

You need __init__.py file in Python versions older than Python 3.3 to mark a directory as containing a Python package.

If I use a blank __init__.py, when I import doit_tools it's an empty namespace with nothing in it. None of my files imported, which makes it more difficult to use.

It doesn't prevent the import:

from doit_tools import your_module

It works as expected.

If I list the submodules in __all__, I can use the (frowned upon?) from thing import * syntax, but all of my classes are behind unnecessary namespace barriers again. The user has to (1) know they should use from x import * instead of import x, (2) manually reshuffle classes until they can reasonably obey line width style constraints.

(1) Your users (in most cases) should not use from your_package import * outside an interactive Python shell.

(2) you could use () to break a long import line:

from package import (function1, Class1, Class2, ..snip many other names..,
                     ClassN)

If I add from thatfile import X statements to __init__.py, I get closer but I have namespace conflicts (?) and extra namespaces for things I didn't want to be in there.

It is upto you to resolve namespace conflicts (different objects with the same name). The name can refer to any object: integer, string, package, module, class, functions, etc. Python can't know what object you might prefer and even if it could it would be inconsistent to ignore some name bindings in this particular case with respect to the usage of name bindings in all other cases.

To mark names as non-public you could prefix them with _ e.g., package/_nonpublic_module.py.

Solution 3

There are perfectly valid reasons to hide the sub-structure of a package (not only when debugging). Amongst them are convenience and efficiency. When trying to do a rapid prototype with a package it is extremely annoying having to interrupt the train of thought just to look up the utterly useless information what the exact sub-module for a specific function or class might be.

When everything is available at the top level of a package, The idiom:

python -c 'import pkg; help(pkg)'

displays the entire help, not just some measly module names.

You can always turn off sub-module imports for production code, or to clean up the package modules after development.

The following is the best way I have come up with so far. It maximizes convenience while trying not to suppress valid errors. See also the full source with doctest documentation.


Define package name and sub-modules to be imported to avoid error-prone duplication:

_package_ = 'flat_export'
_modules_ = ['sub1', 'sub2', 'sub3']

Use relative imports when available (this is imperative, see is_importing_package):

_loaded = False
if is_importing_package(_package_, locals()):
    for _module in _modules_:
        exec ('from .' + _module + ' import *')
    _loaded = True
    del(_module)

Try importing the package, including __all__.
This happens when executing a module file as script with the package in the search path (e.g. python flat_export/__init__.py)

if not _loaded:
    try:
        exec('from ' + _package_ + ' import *')
        exec('from ' + _package_ + ' import __all__')
        _loaded = True
    except (ImportError):
        pass

As a last resort, try importing the sub-modules directly.
This happens when executing a module file as script inside the package directory without the package in the search path (e.g. cd flat_export; python __init__.py).

if not _loaded:
    for _module in _modules_:
        exec('from ' + _module + ' import *')
    del(_module)

Construct __all__ (leaving out modules), unless it has been imported before:

if not __all__:
    _module_type = type(__import__('sys'))
    for _sym, _val in sorted(locals().items()):
        if not _sym.startswith('_') and not isinstance(_val, _module_type) :
            __all__.append(_sym)
    del(_sym)
    del(_val)
    del(_module_type)

Here is the function is_importing_package:

def is_importing_package(_package_, locals_, dummy_name=None):
    """:returns: True, if relative package imports are working.

    :param _package_: the package name (unfortunately, __package__
      does not work, since it is None, when loading ``:(``).
    :param locals_: module local variables for auto-removing function
      after use.
    :param dummy_name: dummy module name (default: 'dummy').

    Tries to do a relative import from an empty module `.dummy`. This
    avoids any secondary errors, other than::

        ValueError: Attempted relative import in non-package
    """

    success = False
    if _package_:
        import sys
        dummy_name = dummy_name or 'dummy'
        dummy_module = _package_ + '.' + dummy_name
        if not dummy_module in sys.modules:
            import imp
            sys.modules[dummy_module] = imp.new_module(dummy_module)
        try:
            exec('from .' + dummy_name + ' import *')
            success = True
        except:
            pass
    if not 'sphinx.ext.autodoc' in __import__('sys').modules:
        del(locals_['is_importing_package'])
    return success
Share:
32,010

Related videos on Youtube

Brian
Author by

Brian

Updated on July 09, 2022

Comments

  • Brian
    Brian almost 2 years

    Is there a way to use __init__.py to organize multiple files into a module?

    Reason: Modules are easier to use than packages, because they don't have as many layers of namespace.

    Normally it makes a package, this I get. Problem is with a package, 'import thepackage' gives me an empty namespace. Users must then either use "from thepackage import *" (frowned upon) or know exactly what is contained and manually pull it out into a usable namespace.

    What I want to have is the user do 'import thepackage' and have nice clean namespaces that look like this, exposing functions and classes relevant to the project for use.

    current_module
    \
      doit_tools/
      \
       - (class) _hidden_resource_pool
       - (class) JobInfo
       - (class) CachedLookup
       - (class) ThreadedWorker
       - (Fn) util_a
       - (Fn) util_b
       - (Fn) gather_stuff
       - (Fn) analyze_stuff
    

    The maintainer's job would be to avoid defining the same name in different files, which should be easy when the project is small like mine is.

    It would also be nice if people can do from doit_stuff import JobInfo and have it retrieve the class, rather than a module containing the class.

    This is easy if all my code is in one gigantic file, but I like to organize when things start getting big. What I have on disk looks sort of like this:

    place_in_my_python_path/
      doit_tools/
        __init__.py
        JobInfo.py
          - class JobInfo:
        NetworkAccessors.py
          - class _hidden_resource_pool:
          - class CachedLookup:
          - class ThreadedWorker:
        utility_functions.py
          - def util_a()
          - def util_b()
        data_functions.py
          - def gather_stuff()
          - def analyze_stuff()
    

    I only separate them so my files aren't huge and unnavigable. They are all related, though someone (possible me) may want to use the classes by themselves without importing everything.

    I've read a number of suggestions in various threads, here's what happens for each suggestion I can find for how to do this:

    If I do not use an __init__.py, I cannot import anything because Python doesn't descend into the folder from sys.path.

    If I use a blank __init__.py, when I import doit_tools it's an empty namespace with nothing in it. None of my files imported, which makes it more difficult to use.

    If I list the submodules in __all__, I can use the (frowned upon?) from thing import * syntax, but all of my classes are behind unnecessary namespace barriers again. The user has to (1) know they should use from x import * instead of import x, (2) manually reshuffle classes until they can reasonably obey line width style constraints.

    If I add from thatfile import X statements to __init__.py, I get closer but I have namespace conflicts (?) and extra namespaces for things I didn't want to be in there. In the below example, you'll see that:

    1. The class JobInfo overwrote the module object named JobInfo because their names were the same. Somehow Python can figure this out, because JobInfo is of type <class 'doit_tools.JobInfo.JobInfo'>. (doit_tools.JobInfo is a class, but doit_tools.JobInfo.JobInfo is that same class... this is tangled and seems very bad, but doesn't seem to break anything.)
    2. Each filename made its way into the doit_tools namespace, which makes it more confusing to look through if anyone is looking at the contents of the module. I want doit_tools.utility_functions.py to hold some code, not define a new namespace.

    .

    current_module
    \
      doit_tools/
      \
       - (module) JobInfo
          \
           - (class) JobInfo
       - (class) JobInfo
       - (module) NetworkAccessors
          \
           - (class) CachedLookup
           - (class) ThreadedWorker
       - (class) CachedLookup
       - (class) ThreadedWorker
       - (module) utility_functions
          \
           - (Fn) util_a
           - (Fn) util_b
       - (Fn) util_a
       - (Fn) util_b
       - (module) data_functions
          \
           - (Fn) gather_stuff
           - (Fn) analyze_stuff
       - (Fn) gather_stuff
       - (Fn) analyze_stuff
    

    Also someone importing just the data abstraction class would get something different than they expect when they do 'from doit_tools import JobInfo':

    current_namespace
    \
     JobInfo (module)
      \
       -JobInfo (class)
    
    instead of:
    
    current_namespace
    \
     - JobInfo (class)
    

    So, is this just a wrong way to organize Python code? If not, what is a correct way to split related code up but still collect it in a module-like way?

    Maybe the best case scenario is that doing 'from doit_tools import JobInfo' is a little confusing for someone using the package?

    Maybe a python file called 'api' so that people using the code do the following?:

    import doit_tools.api
    from doit_tools.api import JobInfo
    

    ============================================

    Examples in response to comments:

    Take the following package contents, inside folder 'foo' which is in python path.

    foo/__init__.py

    __all__ = ['doit','dataholder','getSomeStuff','hold_more_data','SpecialCase']
    from another_class import doit
    from another_class import dataholder
    from descriptive_name import getSomeStuff
    from descriptive_name import hold_more_data
    from specialcase import SpecialCase
    

    foo/specialcase.py

    class SpecialCase:
        pass
    

    foo/more.py

    def getSomeStuff():
        pass
    
    class hold_more_data(object):
        pass
    

    foo/stuff.py

    def doit():
        print "I'm a function."
    
    class dataholder(object):
        pass
    

    Do this:

    >>> import foo
    >>> for thing in dir(foo): print thing
    ... 
    SpecialCase
    __builtins__
    __doc__
    __file__
    __name__
    __package__
    __path__
    another_class
    dataholder
    descriptive_name
    doit
    getSomeStuff
    hold_more_data
    specialcase
    

    another_class and descriptive_name are there cluttering things up, and also have extra copies of e.g. doit() underneath their namespaces.

    If I have a class named Data inside a file named Data.py, when I do 'from Data import Data' then I get a namespace conflict because Data is a class in the current namespace that is inside module Data, somehow is also in the current namespace. (But Python seems to be able to handle this.)

  • Brian
    Brian over 11 years
    This has the problem I mentioned above, where I get the public names are available but also the file names, as a namespace, with the public names also buried in those. Is this just to be expected, and it's the best I can do?
  • Brian
    Brian over 11 years
    This has the problem I mentioned above, where the public names are available but also the file names, as namespaces. Also the namespace conflict with JobInfo that I brought up, which seems bad, and my alternative is to rename JobInfo.py so that the file contains a class of a different name than the filename. When that's done, I then have the differently-named file appearing as a namespace inside the class. It gets messy, but is it the best I can do?
  • Brian
    Brian over 11 years
    Also, should I be concerned about the namespace conflict with module JobInfo being replaced with class JobInfo in the package namespace? Is this one of those things that seems dirty but isn't, and I should let Python handle it?
  • BrenBarn
    BrenBarn over 11 years
    @Brian: I don't understand what you mean. Any names you don't import will not be available. If you want to exclude package names you can use the __all__ technique mentioned by @J.F. Sebastian. Incidentally, your example is needlessly large and confusing. Can you create a simple example and show how you want to refer to the different parts?
  • Brian
    Brian over 11 years
    Responded with example, tested in interactive terminal just now. In the example, I do not want the namespaces created by the names of the python files to be in the module, because it's cluttered. The module should expose two classes and two functions, not two classes, two functions, and two submodules each containing a class and a function. The solution of moving all code into one file seems... blunt.
  • Brian
    Brian over 11 years
    Try it using an __init__.py in a folder with files in it. Note I was stating the use of a class of the same name as the file caused what seemed to be a bad thing. Also note my reasons for naming the file along with the class were for organizational purposes, so that I knew the definition of that class lived in that file, not for syntactic reasons, and I mentioned an unpleasant side effect of changing the name to something different.
  • Brian
    Brian over 11 years
    Also note, if I remove all contents of __init__.py, the submodule namespaces do not show up on an import of foo. They must be manually retrieved.
  • jfs
    jfs over 11 years
    @Brian: __all__ is used exactly for the purpose of distinguishing public names from the names that are available by accident. You explicitly add names to __all__ that you consider to be public (note: your example foo/__init__.py doesn't define __all__).
  • jfs
    jfs over 11 years
    @Brian: whatever is defined in __init__.py takes precedent. In this particular case follow the pep-8 and use a lower-case for module names (s/JobInfo/jobinfo/ for the module name).
  • Brian
    Brian over 11 years
    Adding __all__ yielded same result. Names of python files still present. Trying now with a case difference between class and module as you suggest.
  • jfs
    jfs over 11 years
    @Brian: dir(package) is not used to get public names. package.__all__ and documentation is used to get public names. from package import * introduces only names in __all__ (don't use wildcard import (*) outside a Python interactive shell or outside an __init__.py where it is used to import public names from a subpackage).
  • Brian
    Brian over 11 years
    Adding a submodule with lower-case name containing a same-named class with upper-case name, and adding the upper-case class name to all, yields my package has both the upper-case and lower-case names. (which would be very confusing for users.)
  • BrenBarn
    BrenBarn over 11 years
    @Brian: See my edited answer. You need to use the __init__.py code and __all__ in the __init__.py.
  • Brian
    Brian over 11 years
    I'm attempting to avoid 'from package import *'. Sounds like you're saying, I should not care what symbols are inside the imported namespace, and using dir() to look at a module and see what's inside is wrong?
  • jfs
    jfs over 11 years
    dir(package) shows also names that are not in package.__all__.
  • Brian
    Brian over 11 years
    Should I not care about names that show up in dir()? I use this frequently to see what is available in an object, and avoid importing the entire contents of a module into the main script namespace. The goal here is to achieve import mymodule, while organizing my code into separate files, without that code organization creating a bunch of extra symbols that reveal my organization as if it were functionality.
  • jfs
    jfs over 11 years
    I've specified above where usage of from package import * is acceptable in particular from .subpackage import * is fine in __init__.py where you do import .subpackage; __all__.extend(subpackage.__all__). Using dir(module) you'll get some module attributes names in the sorted order. It is not wrong to use it, you just have to know that dir() also might return names that you don't consider to be public. In your case (when you explicitly define public interface in __init__.py) you could use package.__all__ to get public names.
  • jfs
    jfs over 11 years
    @Brian: To explictly discourage usage of non-public names you could prefix them with _ e.g., _jobinfo. (see also the previous comment).
  • Brian
    Brian over 11 years
    About your edits: / Using __all__ does not do for me what you say it does. Could this be a Linux platform thing? / I do want parts of the module to be available to other code, in the same way it would be available if I did 'from foo import dataholder'. / I apologize for lapsing into bad style, but it doesn't change the issue. / What I want is to be able to open a file that contains a class definition I want to work on, and not see all the stuff I'm not interested in working on.
  • BrenBarn
    BrenBarn over 11 years
    When I create a package with modules, any modules that I don't put in __init__.py's __all__ don't show up in dir(package), even if import names from those modules. I guess this could be a difference between Python versions. What version are you using?
  • Brian
    Brian over 11 years
    Python 2.6.4, on Linux. The output of dir() posted above is from an interactive python shell with that actual structure of documents.
  • Brian
    Brian over 11 years
    The issue is more that if I have a file named abstractionclass.py containing a definition class AbstractionClass(object):, if I put that in my package, and in __init__.py I add it to __all__ and do 'from abstractionclass import AbstractionClass', then if someone imports my package and does a dir() of it, they see this: [ usual_stuff, 'abstractionclass', 'AbstractionClass' ]. It sounds like this does not happen for some other people? BrenBarn says on his install this does not happen. (I am using python 2.6.4 on Linux.)
  • Brian
    Brian over 11 years
    Note: others are saying their behavior in this situation is different from what I get. This may be a difference between python versions.
  • BrenBarn
    BrenBarn over 11 years
    @Brian: It looks like you're right, I had mixed up different versions of the setup I was trying out. However, in answer to your other question, I would say no, don't worry about what shows up in dir(). Obviously a bunch of internal things like __doc__ show up in there anyway. You should focus on getting the import usage you want to work, and not worry if submodules are also in the dir(). This does mean you have to be careful about your module/class names, but there's just no way around that. Python isn't set up to cater to a one-class-per-file layout.
  • BrenBarn
    BrenBarn over 11 years
    @Brian: There is some oddness going on with intra-package imports. See yet another edit in my answer for what is apparently a way to stop the submodules from showing up in the package dict.
  • jfs
    jfs over 11 years
    @Brian: You don't need to put a public class in a separate module (it is not Java) therefore abstractionclass might be a very pure name for a module in Python. To mark/document it non-public you could prefix a module name with _: _nonpublic_module.py. dir() returns names that are not in __all__ on all Python versions I've tried: Python 2.7, Python 3.3, Jython 2.5, Pypy 1.9. I've updated my answer to address the points from your question.
  • jfs
    jfs over 11 years
    @Brian: del module in the __init__.py also prevents an ordinary usage of package.module.somename_not_in_package_all (it might be even a desirable side-effect (it is still accessible via sys.modules['package.module'].somename_not_in_package_all)). Though it might seem odd at the first glance to del name if name is not introduced explicitly.
  • jfs
    jfs over 11 years
    then the only things in dir(package) will be those names. It is false as you've pointed out yourself already. Otherwise it is a good answer.
  • Brian
    Brian over 11 years
    That explains what I was seeing. I suppose most people get around this kind of thing by using an IDE that lets you collapse code blocks. I've favored editing in Vim for its nice automatations, so if it's in the file I see it. @J.F.Sebastian: I had wondered about del but it sounded 'wrong' somehow. Sounds like that will be the method of choice for unwanted symbols. Thanks!
  • Brian
    Brian over 11 years
    @J.F.Sebastian: If you want to submit an answer involving 'del' I'll mark it. The use of __all__ does not appear to have any effect on import mymodule, only expanding the * in from mymodule import * so I have nothing to mark.
  • jfs
    jfs over 11 years
    @Brian: I'm against (-0) using del _internal_module. It introduces unnecessary maintenance burden. None of stdlib modules or third-party Python packages installed on my machine do this as far as I can see (though they sometimes use del utility_module where utility_module is defined outside the package), but none use del on its own submodules. A proper documentation and/or __all__ are enough to point out public names.
  • jfs
    jfs over 11 years
    @Brian: I've misspelled: s/pure name/poor name/.
  • jfs
    jfs over 11 years
    @Brian: Also if somebody will try to extend your package in ways that you've not considered then breaking import package._internal_module as m statement is not good.
  • BrenBarn
    BrenBarn over 11 years
    @Brian: I'm sure there are ways to handle code folding and other file-navigation features with vim (and googling turns up some possibilities). For a project of any size, you're going to quickly get lost in your directory tree if you really try to make each file only "what I want to look at at one time".
  • Brian
    Brian over 11 years
    I agree about using del(). Sounds like Python does not have a way to split up files, and relies on an IDE to provide organizational features. I disagree that organizing in this way would lead to getting lost, I think that depends on the scope of the project. I know in advance that what I'm doing will not reasonably exceed a certain size, and it seems natural to me to put e.g. code definitions for a class into a file named after the class, with code that uses it elsewhere.
  • Brian
    Brian over 11 years
    However, I see how Python is in a way trying to force me to use separate namespaces for things that I consider to be separate, and that's probably a good thing for large projects. In the end what I object to is just some details about how namespaces work in packages, and that's just how Python is so I shouldn't fight it. Ultimate solution: Put up with annoying import constructs like import doit_tools.tools, let users have to figure out that it's a package when import doit_tools gives them a blank namespace, define __all__ and don't bring things up to the package from submodules.
  • jfs
    jfs almost 11 years
    'with __all__ you can block from package import module' seems misleading __all__ doesn't block from package import module. It just controls what is available if you do from package import *
  • dactylroot
    dactylroot almost 6 years
    wonderful use of "del(unwanted_module)", this is the critical step