How to organize multiple python files into a single module without it behaving like a package?
Solution 1
You can sort of do it, but it's not really a good idea and you're fighting against the way Python modules/packages are supposed to work. By importing appropriate names in __init__.py
you can make them accessible in the package namespace. By deleting module names you can make them inaccessible. (For why you need to delete them, see this question). So you can get close to what you want with something like this (in __init__.py
):
from another_class import doit
from another_class import dataholder
from descriptive_name import getSomeStuff
from descriptive_name import hold_more_data
del another_class, descriptive_name
__all__ = ['doit', 'dataholder', 'getSomeStuff', 'hold_more_data']
However, this will break subsequent attempts to import package.another_class
. In general, you can't import anything from a package.module
without making package.module
accessible as an importable reference to that module (although with the __all__
you can block from package import module
).
More generally, by splitting up your code by class/function you are working against the Python package/module system. A Python module should generally contain stuff you want to import as a unit. It's not uncommon to import submodule components directly in the top-level package namespace for convenience, but the reverse --- trying to hide the submodules and allow access to their contents only through the top-level package namespace --- is going to lead to problems. In addition, there is nothing to be gained by trying to "cleanse" the package namespace of the modules. Those modules are supposed to be in the package namespace; that's where they belong.
Solution 2
Define __all__ = ['names', 'that', 'are', 'public']
in the __init__.py
e.g.:
__all__ = ['foo']
from ._subpackage import foo
Real-world example: numpy/__init__.py
.
You have some misconception about how Python packages work:
If I do not use an
__init__.py
, I cannot import anything because Python doesn't descend into the folder from sys.path.
You need __init__.py
file in Python versions older than Python 3.3 to mark a directory as containing a Python package.
If I use a blank
__init__.py
, when I import doit_tools it's an empty namespace with nothing in it. None of my files imported, which makes it more difficult to use.
It doesn't prevent the import:
from doit_tools import your_module
It works as expected.
If I list the submodules in
__all__
, I can use the (frowned upon?)from thing import *
syntax, but all of my classes are behind unnecessary namespace barriers again. The user has to (1) know they should usefrom x import *
instead ofimport x
, (2) manually reshuffle classes until they can reasonably obey line width style constraints.
(1) Your users (in most cases) should not use from your_package import *
outside an interactive Python shell.
(2) you could use ()
to break a long import line:
from package import (function1, Class1, Class2, ..snip many other names..,
ClassN)
If I add
from thatfile import X
statements to__init__.py
, I get closer but I have namespace conflicts (?) and extra namespaces for things I didn't want to be in there.
It is upto you to resolve namespace conflicts (different objects with the same name). The name can refer to any object: integer, string, package, module, class, functions, etc. Python can't know what object you might prefer and even if it could it would be inconsistent to ignore some name bindings in this particular case with respect to the usage of name bindings in all other cases.
To mark names as non-public you could prefix them with _
e.g., package/_nonpublic_module.py
.
Solution 3
There are perfectly valid reasons to hide the sub-structure of a package (not only when debugging). Amongst them are convenience and efficiency. When trying to do a rapid prototype with a package it is extremely annoying having to interrupt the train of thought just to look up the utterly useless information what the exact sub-module for a specific function or class might be.
When everything is available at the top level of a package, The idiom:
python -c 'import pkg; help(pkg)'
displays the entire help, not just some measly module names.
You can always turn off sub-module imports for production code, or to clean up the package modules after development.
The following is the best way I have come up with so far. It maximizes convenience while trying not to suppress valid errors. See also the full source with doctest documentation.
Define package name and sub-modules to be imported to avoid error-prone duplication:
_package_ = 'flat_export'
_modules_ = ['sub1', 'sub2', 'sub3']
Use relative imports when available (this is imperative, see is_importing_package
):
_loaded = False
if is_importing_package(_package_, locals()):
for _module in _modules_:
exec ('from .' + _module + ' import *')
_loaded = True
del(_module)
Try importing the package, including __all__
.
This happens when executing a module file as script with the
package in the search path (e.g. python flat_export/__init__.py
)
if not _loaded:
try:
exec('from ' + _package_ + ' import *')
exec('from ' + _package_ + ' import __all__')
_loaded = True
except (ImportError):
pass
As a last resort, try importing the sub-modules directly.
This happens when executing a module file as script inside the
package directory without the package in the search path
(e.g. cd flat_export; python __init__.py
).
if not _loaded:
for _module in _modules_:
exec('from ' + _module + ' import *')
del(_module)
Construct __all__
(leaving out modules), unless it has been imported
before:
if not __all__:
_module_type = type(__import__('sys'))
for _sym, _val in sorted(locals().items()):
if not _sym.startswith('_') and not isinstance(_val, _module_type) :
__all__.append(_sym)
del(_sym)
del(_val)
del(_module_type)
Here is the function is_importing_package
:
def is_importing_package(_package_, locals_, dummy_name=None):
""":returns: True, if relative package imports are working.
:param _package_: the package name (unfortunately, __package__
does not work, since it is None, when loading ``:(``).
:param locals_: module local variables for auto-removing function
after use.
:param dummy_name: dummy module name (default: 'dummy').
Tries to do a relative import from an empty module `.dummy`. This
avoids any secondary errors, other than::
ValueError: Attempted relative import in non-package
"""
success = False
if _package_:
import sys
dummy_name = dummy_name or 'dummy'
dummy_module = _package_ + '.' + dummy_name
if not dummy_module in sys.modules:
import imp
sys.modules[dummy_module] = imp.new_module(dummy_module)
try:
exec('from .' + dummy_name + ' import *')
success = True
except:
pass
if not 'sphinx.ext.autodoc' in __import__('sys').modules:
del(locals_['is_importing_package'])
return success
Related videos on Youtube
Brian
Updated on July 09, 2022Comments
-
Brian almost 2 years
Is there a way to use
__init__.py
to organize multiple files into a module?Reason: Modules are easier to use than packages, because they don't have as many layers of namespace.
Normally it makes a package, this I get. Problem is with a package, 'import thepackage' gives me an empty namespace. Users must then either use "from thepackage import *" (frowned upon) or know exactly what is contained and manually pull it out into a usable namespace.
What I want to have is the user do 'import thepackage' and have nice clean namespaces that look like this, exposing functions and classes relevant to the project for use.
current_module \ doit_tools/ \ - (class) _hidden_resource_pool - (class) JobInfo - (class) CachedLookup - (class) ThreadedWorker - (Fn) util_a - (Fn) util_b - (Fn) gather_stuff - (Fn) analyze_stuff
The maintainer's job would be to avoid defining the same name in different files, which should be easy when the project is small like mine is.
It would also be nice if people can do
from doit_stuff import JobInfo
and have it retrieve the class, rather than a module containing the class.This is easy if all my code is in one gigantic file, but I like to organize when things start getting big. What I have on disk looks sort of like this:
place_in_my_python_path/ doit_tools/ __init__.py JobInfo.py - class JobInfo: NetworkAccessors.py - class _hidden_resource_pool: - class CachedLookup: - class ThreadedWorker: utility_functions.py - def util_a() - def util_b() data_functions.py - def gather_stuff() - def analyze_stuff()
I only separate them so my files aren't huge and unnavigable. They are all related, though someone (possible me) may want to use the classes by themselves without importing everything.
I've read a number of suggestions in various threads, here's what happens for each suggestion I can find for how to do this:
If I do not use an
__init__.py
, I cannot import anything because Python doesn't descend into the folder from sys.path.If I use a blank
__init__.py
, when Iimport doit_tools
it's an empty namespace with nothing in it. None of my files imported, which makes it more difficult to use.If I list the submodules in
__all__
, I can use the (frowned upon?)from thing import *
syntax, but all of my classes are behind unnecessary namespace barriers again. The user has to (1) know they should usefrom x import *
instead ofimport x
, (2) manually reshuffle classes until they can reasonably obey line width style constraints.If I add
from thatfile import X
statements to__init__.py
, I get closer but I have namespace conflicts (?) and extra namespaces for things I didn't want to be in there. In the below example, you'll see that:- The class JobInfo overwrote the module object named JobInfo because their names were the same. Somehow Python can figure this out, because JobInfo is of type
<class 'doit_tools.JobInfo.JobInfo'>
. (doit_tools.JobInfo is a class, but doit_tools.JobInfo.JobInfo is that same class... this is tangled and seems very bad, but doesn't seem to break anything.) - Each filename made its way into the doit_tools namespace, which makes it more confusing to look through if anyone is looking at the contents of the module. I want doit_tools.utility_functions.py to hold some code, not define a new namespace.
.
current_module \ doit_tools/ \ - (module) JobInfo \ - (class) JobInfo - (class) JobInfo - (module) NetworkAccessors \ - (class) CachedLookup - (class) ThreadedWorker - (class) CachedLookup - (class) ThreadedWorker - (module) utility_functions \ - (Fn) util_a - (Fn) util_b - (Fn) util_a - (Fn) util_b - (module) data_functions \ - (Fn) gather_stuff - (Fn) analyze_stuff - (Fn) gather_stuff - (Fn) analyze_stuff
Also someone importing just the data abstraction class would get something different than they expect when they do 'from doit_tools import JobInfo':
current_namespace \ JobInfo (module) \ -JobInfo (class) instead of: current_namespace \ - JobInfo (class)
So, is this just a wrong way to organize Python code? If not, what is a correct way to split related code up but still collect it in a module-like way?
Maybe the best case scenario is that doing 'from doit_tools import JobInfo' is a little confusing for someone using the package?
Maybe a python file called 'api' so that people using the code do the following?:
import doit_tools.api from doit_tools.api import JobInfo
============================================
Examples in response to comments:
Take the following package contents, inside folder 'foo' which is in python path.
foo/__init__.py
__all__ = ['doit','dataholder','getSomeStuff','hold_more_data','SpecialCase'] from another_class import doit from another_class import dataholder from descriptive_name import getSomeStuff from descriptive_name import hold_more_data from specialcase import SpecialCase
foo/specialcase.py
class SpecialCase: pass
foo/more.py
def getSomeStuff(): pass class hold_more_data(object): pass
foo/stuff.py
def doit(): print "I'm a function." class dataholder(object): pass
Do this:
>>> import foo >>> for thing in dir(foo): print thing ... SpecialCase __builtins__ __doc__ __file__ __name__ __package__ __path__ another_class dataholder descriptive_name doit getSomeStuff hold_more_data specialcase
another_class
anddescriptive_name
are there cluttering things up, and also have extra copies of e.g. doit() underneath their namespaces.If I have a class named Data inside a file named Data.py, when I do 'from Data import Data' then I get a namespace conflict because Data is a class in the current namespace that is inside module Data, somehow is also in the current namespace. (But Python seems to be able to handle this.)
- The class JobInfo overwrote the module object named JobInfo because their names were the same. Somehow Python can figure this out, because JobInfo is of type
-
Brian over 11 yearsThis has the problem I mentioned above, where I get the public names are available but also the file names, as a namespace, with the public names also buried in those. Is this just to be expected, and it's the best I can do?
-
Brian over 11 yearsThis has the problem I mentioned above, where the public names are available but also the file names, as namespaces. Also the namespace conflict with JobInfo that I brought up, which seems bad, and my alternative is to rename JobInfo.py so that the file contains a class of a different name than the filename. When that's done, I then have the differently-named file appearing as a namespace inside the class. It gets messy, but is it the best I can do?
-
Brian over 11 yearsAlso, should I be concerned about the namespace conflict with module JobInfo being replaced with class JobInfo in the package namespace? Is this one of those things that seems dirty but isn't, and I should let Python handle it?
-
BrenBarn over 11 years@Brian: I don't understand what you mean. Any names you don't import will not be available. If you want to exclude package names you can use the
__all__
technique mentioned by @J.F. Sebastian. Incidentally, your example is needlessly large and confusing. Can you create a simple example and show how you want to refer to the different parts? -
Brian over 11 yearsResponded with example, tested in interactive terminal just now. In the example, I do not want the namespaces created by the names of the python files to be in the module, because it's cluttered. The module should expose two classes and two functions, not two classes, two functions, and two submodules each containing a class and a function. The solution of moving all code into one file seems... blunt.
-
Brian over 11 yearsTry it using an
__init__.py
in a folder with files in it. Note I was stating the use of a class of the same name as the file caused what seemed to be a bad thing. Also note my reasons for naming the file along with the class were for organizational purposes, so that I knew the definition of that class lived in that file, not for syntactic reasons, and I mentioned an unpleasant side effect of changing the name to something different. -
Brian over 11 yearsAlso note, if I remove all contents of
__init__.py
, the submodule namespaces do not show up on an import of foo. They must be manually retrieved. -
jfs over 11 years@Brian:
__all__
is used exactly for the purpose of distinguishing public names from the names that are available by accident. You explicitly add names to__all__
that you consider to be public (note: your examplefoo/__init__.py
doesn't define__all__
). -
jfs over 11 years@Brian: whatever is defined in
__init__.py
takes precedent. In this particular case follow the pep-8 and use a lower-case for module names (s/JobInfo/jobinfo/
for the module name). -
Brian over 11 yearsAdding
__all__
yielded same result. Names of python files still present. Trying now with a case difference between class and module as you suggest. -
jfs over 11 years@Brian:
dir(package)
is not used to get public names.package.__all__
and documentation is used to get public names.from package import *
introduces only names in__all__
(don't use wildcard import (*
) outside a Python interactive shell or outside an__init__.py
where it is used to import public names from a subpackage). -
Brian over 11 yearsAdding a submodule with lower-case name containing a same-named class with upper-case name, and adding the upper-case class name to all, yields my package has both the upper-case and lower-case names. (which would be very confusing for users.)
-
BrenBarn over 11 years@Brian: See my edited answer. You need to use the
__init__.py
code and__all__
in the__init__.py
. -
Brian over 11 yearsI'm attempting to avoid 'from package import *'. Sounds like you're saying, I should not care what symbols are inside the imported namespace, and using dir() to look at a module and see what's inside is wrong?
-
jfs over 11 years
dir(package)
shows also names that are not inpackage.__all__
. -
Brian over 11 yearsShould I not care about names that show up in dir()? I use this frequently to see what is available in an object, and avoid importing the entire contents of a module into the main script namespace. The goal here is to achieve
import mymodule
, while organizing my code into separate files, without that code organization creating a bunch of extra symbols that reveal my organization as if it were functionality. -
jfs over 11 yearsI've specified above where usage of
from package import *
is acceptable in particularfrom .subpackage import *
is fine in__init__.py
where you doimport .subpackage; __all__.extend(subpackage.__all__)
. Usingdir(module)
you'll get some module attributes names in the sorted order. It is not wrong to use it, you just have to know thatdir()
also might return names that you don't consider to be public. In your case (when you explicitly define public interface in__init__.py
) you could usepackage.__all__
to get public names. -
jfs over 11 years@Brian: To explictly discourage usage of non-public names you could prefix them with
_
e.g.,_jobinfo
. (see also the previous comment). -
Brian over 11 yearsAbout your edits: / Using
__all__
does not do for me what you say it does. Could this be a Linux platform thing? / I do want parts of the module to be available to other code, in the same way it would be available if I did 'from foo import dataholder'. / I apologize for lapsing into bad style, but it doesn't change the issue. / What I want is to be able to open a file that contains a class definition I want to work on, and not see all the stuff I'm not interested in working on. -
BrenBarn over 11 yearsWhen I create a package with modules, any modules that I don't put in
__init__.py
's__all__
don't show up indir(package)
, even if import names from those modules. I guess this could be a difference between Python versions. What version are you using? -
Brian over 11 yearsPython 2.6.4, on Linux. The output of dir() posted above is from an interactive python shell with that actual structure of documents.
-
Brian over 11 yearsThe issue is more that if I have a file named abstractionclass.py containing a definition
class AbstractionClass(object):
, if I put that in my package, and in__init__.py
I add it to__all__
and do 'from abstractionclass import AbstractionClass', then if someone imports my package and does a dir() of it, they see this: [ usual_stuff, 'abstractionclass', 'AbstractionClass' ]. It sounds like this does not happen for some other people? BrenBarn says on his install this does not happen. (I am using python 2.6.4 on Linux.) -
Brian over 11 yearsNote: others are saying their behavior in this situation is different from what I get. This may be a difference between python versions.
-
BrenBarn over 11 years@Brian: It looks like you're right, I had mixed up different versions of the setup I was trying out. However, in answer to your other question, I would say no, don't worry about what shows up in
dir()
. Obviously a bunch of internal things like__doc__
show up in there anyway. You should focus on getting the import usage you want to work, and not worry if submodules are also in the dir(). This does mean you have to be careful about your module/class names, but there's just no way around that. Python isn't set up to cater to a one-class-per-file layout. -
BrenBarn over 11 years@Brian: There is some oddness going on with intra-package imports. See yet another edit in my answer for what is apparently a way to stop the submodules from showing up in the package dict.
-
jfs over 11 years@Brian: You don't need to put a public class in a separate module (it is not Java) therefore
abstractionclass
might be a very pure name for a module in Python. To mark/document it non-public you could prefix a module name with_
:_nonpublic_module.py
.dir()
returns names that are not in__all__
on all Python versions I've tried: Python 2.7, Python 3.3, Jython 2.5, Pypy 1.9. I've updated my answer to address the points from your question. -
jfs over 11 years@Brian:
del module
in the__init__.py
also prevents an ordinary usage ofpackage.module.somename_not_in_package_all
(it might be even a desirable side-effect (it is still accessible viasys.modules['package.module'].somename_not_in_package_all
)). Though it might seem odd at the first glance todel name
ifname
is not introduced explicitly. -
jfs over 11 years
then the only things in dir(package) will be those names.
It is false as you've pointed out yourself already. Otherwise it is a good answer. -
Brian over 11 yearsThat explains what I was seeing. I suppose most people get around this kind of thing by using an IDE that lets you collapse code blocks. I've favored editing in Vim for its nice automatations, so if it's in the file I see it. @J.F.Sebastian: I had wondered about del but it sounded 'wrong' somehow. Sounds like that will be the method of choice for unwanted symbols. Thanks!
-
Brian over 11 years@J.F.Sebastian: If you want to submit an answer involving 'del' I'll mark it. The use of
__all__
does not appear to have any effect onimport mymodule
, only expanding the * infrom mymodule import *
so I have nothing to mark. -
jfs over 11 years@Brian: I'm against (
-0
) usingdel _internal_module
. It introduces unnecessary maintenance burden. None of stdlib modules or third-party Python packages installed on my machine do this as far as I can see (though they sometimes usedel utility_module
whereutility_module
is defined outside the package), but none usedel
on its own submodules. A proper documentation and/or__all__
are enough to point out public names. -
jfs over 11 years@Brian: I've misspelled:
s/pure name/poor name/
. -
jfs over 11 years@Brian: Also if somebody will try to extend your package in ways that you've not considered then breaking
import package._internal_module as m
statement is not good. -
BrenBarn over 11 years@Brian: I'm sure there are ways to handle code folding and other file-navigation features with vim (and googling turns up some possibilities). For a project of any size, you're going to quickly get lost in your directory tree if you really try to make each file only "what I want to look at at one time".
-
Brian over 11 yearsI agree about using del(). Sounds like Python does not have a way to split up files, and relies on an IDE to provide organizational features. I disagree that organizing in this way would lead to getting lost, I think that depends on the scope of the project. I know in advance that what I'm doing will not reasonably exceed a certain size, and it seems natural to me to put e.g. code definitions for a class into a file named after the class, with code that uses it elsewhere.
-
Brian over 11 yearsHowever, I see how Python is in a way trying to force me to use separate namespaces for things that I consider to be separate, and that's probably a good thing for large projects. In the end what I object to is just some details about how namespaces work in packages, and that's just how Python is so I shouldn't fight it. Ultimate solution: Put up with annoying import constructs like
import doit_tools.tools
, let users have to figure out that it's a package whenimport doit_tools
gives them a blank namespace, define__all__
and don't bring things up to the package from submodules. -
jfs almost 11 years'with
__all__
you can block from package import module' seems misleading__all__
doesn't blockfrom package import module
. It just controls what is available if you dofrom package import *
-
dactylroot almost 6 yearswonderful use of "del(unwanted_module)", this is the critical step