How do I protect Python code from being read by users?

370,388

Solution 1

Python, being a byte-code-compiled interpreted language, is very difficult to lock down. Even if you use a exe-packager like py2exe, the layout of the executable is well-known, and the Python byte-codes are well understood.

Usually in cases like this, you have to make a tradeoff. How important is it really to protect the code? Are there real secrets in there (such as a key for symmetric encryption of bank transfers), or are you just being paranoid? Choose the language that lets you develop the best product quickest, and be realistic about how valuable your novel ideas are.

If you decide you really need to enforce the license check securely, write it as a small C extension so that the license check code can be extra-hard (but not impossible!) to reverse engineer, and leave the bulk of your code in Python.

Solution 2

"Is there a good way to handle this problem?" No. Nothing can be protected against reverse engineering. Even the firmware on DVD machines has been reverse engineered and the AACS Encryption key exposed. And that's in spite of the DMCA making that a criminal offense.

Since no technical method can stop your customers from reading your code, you have to apply ordinary commercial methods.

  1. Licenses. Contracts. Terms and Conditions. This still works even when people can read the code. Note that some of your Python-based components may require that you pay fees before you sell software using those components. Also, some open-source licenses prohibit you from concealing the source or origins of that component.

  2. Offer significant value. If your stuff is so good -- at a price that is hard to refuse -- there's no incentive to waste time and money reverse engineering anything. Reverse engineering is expensive. Make your product slightly less expensive.

  3. Offer upgrades and enhancements that make any reverse engineering a bad idea. When the next release breaks their reverse engineering, there's no point. This can be carried to absurd extremes, but you should offer new features that make the next release more valuable than reverse engineering.

  4. Offer customization at rates so attractive that they'd rather pay you to build and support the enhancements.

  5. Use a license key which expires. This is cruel, and will give you a bad reputation, but it certainly makes your software stop working.

  6. Offer it as a web service. SaaS involves no downloads to customers.

Solution 3

Python is not the tool you need

You must use the right tool to do the right thing, and Python was not designed to be obfuscated. It's the contrary; everything is open or easy to reveal or modify in Python because that's the language's philosophy.

If you want something you can't see through, look for another tool. This is not a bad thing, it is important that several different tools exist for different usages.

Obfuscation is really hard

Even compiled programs can be reverse-engineered so don't think that you can fully protect any code. You can analyze obfuscated PHP, break the flash encryption key, etc. Newer versions of Windows are cracked every time.

Having a legal requirement is a good way to go

You cannot prevent somebody from misusing your code, but you can easily discover if someone does. Therefore, it's just a casual legal issue.

Code protection is overrated

Nowadays, business models tend to go for selling services instead of products. You cannot copy a service, pirate nor steal it. Maybe it's time to consider to go with the flow...

Solution 4

Compile python and distribute binaries!

Sensible idea:

Use Cython, Nuitka, Shed Skin or something similar to compile python to C code, then distribute your app as python binary libraries (pyd) instead.

That way, no Python (byte) code is left and you've done any reasonable amount of obscurification anyone (i.e. your employer) could expect from regular Code, I think. (.NET or Java less safe than this case, as that bytecode is not obfuscated and can relatively easily be decompiled into reasonable source.)

Cython is getting more and more compatible with CPython, so I think it should work. (I'm actually considering this for our product.. We're already building some thirdparty libs as pyd/dlls, so shipping our own python code as binaries is not a overly big step for us.)

See This Blog Post (not by me) for a tutorial on how to do it. (thx @hithwen)

Crazy idea:

You could probably get Cython to store the C-files separately for each module, then just concatenate them all and build them with heavy inlining. That way, your Python module is pretty monolithic and difficult to chip at with common tools.

Beyond crazy:

You might be able to build a single executable if you can link to (and optimize with) the python runtime and all libraries (dlls) statically. That way, it'd sure be difficult to intercept calls to/from python and whatever framework libraries you use. This cannot be done if you're using LGPL code though.

Solution 5

I understand that you want your customers to use the power of python but do not want expose the source code.

Here are my suggestions:

(a) Write the critical pieces of the code as C or C++ libraries and then use SIP or swig to expose the C/C++ APIs to Python namespace.

(b) Use cython instead of Python

(c) In both (a) and (b), it should be possible to distribute the libraries as licensed binary with a Python interface.

Share:
370,388
Jordfräs
Author by

Jordfräs

Scuba diver, cyclist and programmer. Lean and agile advocate.

Updated on July 08, 2022

Comments

  • Jordfräs
    Jordfräs almost 2 years

    I am developing a piece of software in Python that will be distributed to my employer's customers. My employer wants to limit the usage of the software with a time-restricted license file.

    If we distribute the .py files or even .pyc files it will be easy to (decompile and) remove the code that checks the license file.

    Another aspect is that my employer does not want the code to be read by our customers, fearing that the code may be stolen or at least the "novel ideas".

    Is there a good way to handle this problem?

  • Marc Stober
    Marc Stober over 15 years
    Even if the license-checking code were hard to reverse engineer because it's written in C, wouldn't it still be relatively easy to remove the calls to the license-checking code?
  • yantrab
    yantrab over 15 years
    Yes it would, depending on where the license check is performed. If there are many calls to the extension, it could be difficult to eradicate. Or you can move some other crucial part of the application into the license check as well so that removing the call to the extension cripples the app.
  • yantrab
    yantrab over 15 years
    Really, all of this work is not about preventing modification, but about increasing its difficulty so that it's no longer worth it. Anything can be reverse-engineered and modified if there's enough benefit.
  • Ali Afshar
    Ali Afshar over 15 years
    +1 For the signing; -1 for the obfuscator You can at least prevent the code from being changed.
  • ddaa
    ddaa over 15 years
    Signing does not work in this context. It's always possible to bypass the signature-checking loader. The first thing you need for useful software protection is an opaque bootstrap mechanism. Not something that Python makes easy.
  • Ali Afshar
    Ali Afshar over 15 years
    Yes, bootstrap in non-python.
  • Chii
    Chii over 15 years
    but the fact that it was gotten past meant that they failed - the bottom line is just don't try, but go for legal protection.
  • Abgan
    Abgan over 15 years
    Or validate the licence not only on startup but in several other places. Can be easily implemented, and can severely increase the time to bypass.
  • intuited
    intuited almost 14 years
    +1 (back to 0): it seems the only true solution to the problem, assuming such an approach to be practical for the setting.
  • Nick T
    Nick T over 13 years
    What if they release software to customers, and the customer modifies it internally without re-releasing it?
  • Aaron Digulla
    Aaron Digulla over 13 years
    @Nick: Doesn't change the situation in any way. See my edits.
  • Brian
    Brian over 13 years
    @Blair Conrad: Not if the license-checking code hides functionality, too. E.g. mylicensedfunction(licenseblob liblob, int foo, int bar, std::string bash)
  • TryPyPy
    TryPyPy over 13 years
    Other possibilities in the same vein: Shed Skin code.google.com/p/shedskin and Nuitka kayhayen24x7.homelinux.org/blog/nuitka-a-python-compiler
  • johndodo
    johndodo over 12 years
    Python is not the tool you need. Malbolge is. :)
  • Dasun
    Dasun over 12 years
    I think the clever way is implementing critical parts in C and implement all license checking stuff in there. (I use hardware dongle which can implement some calculation inside it. So it's almost impossible to reverse it back.)
  • DevPlayer
    DevPlayer almost 12 years
    Beaware that if your licensing webserver goes down or the customers internet access is down your customer will not be happy that they can't run thier business because of loss of access to licensing checks.
  • Cees Timmerman
    Cees Timmerman over 11 years
    I think that simply bundles the .pyc files. Cython, Shed Skin, and PyPy go beyond bytecode.
  • Mitar
    Mitar over 11 years
    Is there any information published on how to get pass this protection mechanisms?
  • Filipe
    Filipe about 11 years
    I just gave a look on Shed Skin as suggested by TyPyPy and it appears to be really good stuff!
  • Thomas Browne
    Thomas Browne about 11 years
    Like this extreme idea. Gets it out there in a huge way and massive market share, then you have a very big customer base for support and addons. I have also been grappling with this question and all the "licensing" answers are basically bull because it doesn't protect against widespread copying, yet doesn't give you any market share advantage.
  • Jordan
    Jordan almost 11 years
    +1 for stealing ideas back. Why limit your client-serving power to your in-house solutions, when you could see how others improve on your solution and accordingly improve your own product? "If you have an apple and I have an apple and we exchange these apples then you and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas."
  • Wowbagger and his liquid lunch
    Wowbagger and his liquid lunch over 10 years
    Good answer, but "casual legal issue"? Really? Where do you live that you have any legal issues that are casual?
  • Mike McKerns
    Mike McKerns about 10 years
    I've actually seen commercial python code shipped as embedded python inside of a C library. Instead of converting some parts of the code to C, they hide the entire python code inside a protective C layer. Then, if they want a module importable by python, they write a thin python extension on top of the C. Open source is a much easier way of life.
  • Jeffrey
    Jeffrey about 10 years
    @DevPlayer There are solutions to this. You could implement a local key mechanism that allows temporary access when the software cannot reach the remote licensing server.
  • sergzach
    sergzach about 9 years
    I think, if we have a frequency - how often expensive obfuscated code is hacked - we could say about practicability of using Python and obfuscated code.
  • Oddthinking
    Oddthinking about 9 years
    @Jeffrey: That gets you right back to where you started - how to you protect that code. To be safer, you need to put some of the key functionality on your own server, so replacing it would involve substantially effort (at which point, why not just start an open-source competitor?)
  • m3nda
    m3nda over 8 years
    The good point on this, is to demoralize anyone who try to decode functionallity. Combine that with Cython and some extra crypt over modules or internet calls, and you probably got prize.
  • m3nda
    m3nda over 8 years
    Point 2 is even more important. If it's cheaper buy than reverse engineering, plus yearly updates, no one will try and even if it does, no one will pay a hacker instead the provider of the software.
  • Daniel
    Daniel over 8 years
    Would compiling with cython work with a python 3.4 Django app, or could it be made to work without a huge amount of effort?
  • ICTMitchell
    ICTMitchell over 8 years
    @Daniel: Not sure. Haven't tried on Django. Feel free to post a new question about that.
  • Daniel
    Daniel over 8 years
  • ICTMitchell
    ICTMitchell over 8 years
    @mlvljr FWIW, IMHO compiling to binaries is a nice tradeoff between selling all your secrets and trying to protect against NSA-class reverse engineering. Esp if you have a big python code base and reasons to be paranoid. ;)
  • Delali
    Delali over 6 years
    That's true. Reverse engineering is doable but expensive in most situations. @S.Lott, I believe point 6 holds more importance based on the question. If the source code really needs to be protected then it should be remote from the end user.
  • Delali
    Delali over 6 years
    If your code have interesting features, the one who was able to misuse it would redistribute it @Macke
  • Skandix
    Skandix almost 6 years
    What, if one of your customers re-releases your code or the ideas for free and anonymously? You can't tell who did it and sue them and because they didn't get benifit from it, you won't as well. This will ruine your work while one of you customers only paid the basic price for it. (obviously only works if you have more than one customer for your solution)
  • Aaron Digulla
    Aaron Digulla almost 6 years
    @Skandix How exactly would that work? Uploading your work on the Internet doesn't harm you. It would start to harm you if a lot of people would find it AND those people would be paying customers instead. Code theft is a myth. "My knowledge is for free, my time is expensive" (not sure who said that).
  • qg_java_17137
    qg_java_17137 almost 6 years
    hithwen's POST is invalid now.
  • user666412
    user666412 almost 6 years
    Is cython a viable alterative for this?
  • code_dredd
    code_dredd over 5 years
    I know this is old, but I must note that storing crypto keys or anything sensitive inside an executable is a Bad Idea™. The fact that the executable is compiled to binary is irrelevant. You can simply run the strings command against it and you'll have the key right there in front of you.
  • A Simple Algorithm
    A Simple Algorithm about 5 years
    Question: "is there a good way to protect my family and myself from being murdered by intruders in our sleep?" Internet: "No. Anyone can be gotten to, and no dwelling is ever 100 percent impenetrable. A mortal human family is the wrong tool for the job."
  • Anonymous
    Anonymous almost 5 years
    A Windows fix would be to compile the file into a. Pyd and import it later
  • markroxor
    markroxor almost 5 years
    The only thing this package managed to accomplish is to fool the 'obfuscator' that the code is obfuscated.
  • Vicrobot
    Vicrobot over 4 years
    this was making errors when i tried. i think it mishandled the data, and didn't fully convert it.
  • TomSawyer
    TomSawyer over 4 years
    doesn't work for whole project, or template engine since it needs variable name to display on template
  • jjmontes
    jjmontes over 4 years
    Point 5 could not be applied on the same assumption that it can be reverse engineered and cracked.
  • Make42
    Make42 about 4 years
    How in the world would you "easily discover if someone does"?
  • Make42
    Make42 about 4 years
    How would I steal anything back? They just put the code into their product and don't tell anyone how it works and just sell it. How would I every find out, that they used my code in the first place?
  • Aaron Digulla
    Aaron Digulla about 4 years
    @Make42 If you never noticed how would that be different from it not happening? It simply would have no effect on you at all. So to make a case here, you must know they "stole" your code.
  • Make42
    Make42 about 4 years
    @AaronDigulla: I don't think this applies. The other people the thief sells to, might have become my customers, but I would never know. If someone A gives B some money to deliver to me unbeknownst to me. B steals the money by keeping it. I might not be sad, because I never knew, but I still have less money then if B had kept the promise.
  • Make42
    Make42 about 4 years
    But, the upgrades are also just give-aways... so how would they charge for that? Wouldn't it just be the support?
  • Make42
    Make42 about 4 years
    Regarding the WingIDE business model: Support is a service, software a product. Products scale, service don't. Support is only a good business model if there is no other business model - meaning, if nobody would buy your product (for whatever reason), you give the product away, so that you have a customer base that at least buys your service.
  • Aaron Digulla
    Aaron Digulla about 4 years
    @Make42 How does obfuscation help here? They can still sell your software unchanged to anyone without sharing with you. I think a good example is MongoDB and similar services where a large group of people has put a lot of effort in and GAFA "steals" their income by selling services around the products without giving back. Being open in your product is what creates a bigger market. Being able to provide services around the product generates most of the income. The product itself is just a door opener.
  • Make42
    Make42 about 4 years
    @AaronDigulla: Protecting your code (not necessarily obfuscation) helps with two points: 1) Code is rarely transferable 100%. Usually one needs to tweak a couple of things. Often this is very little (so it is worth changing), but without these changes, the code is unsuable in the new project. Protecting your code, protects your work from being used in projects, in which you are not involved. 2) When you sell software, then what you are actually selling is the license key (as you most likely well know). Protecting your code ensures that the mechanism for needing the key cannot be circumvented.
  • Make42
    Make42 about 4 years
    Regarding MongoDB: You example is an example for my point: They opened up the code and at the end of the day, GAFA steals their income. You want to say that selling services is worth it. Sure, but MongoDB didn't.
  • Make42
    Make42 about 4 years
    Regarding "My knowledge is for free, my time is expensive": No it isn't. Some projects are finished by me in a couple of days, but it took years, if not decades, to acquire that knowledge. In fact, that is (in an ideal world) the reason why someone with a long education (say a PhD) gets paid more than someone who just received training on the job (e.g. a cleaner): They are reimbursed for the unpaid years in which they are expected to have acquired a lot of knowledge, while cleaning does not require a lot of training. (This is not meant to be disrespectful to the cleaner as a human though.)
  • Aaron Digulla
    Aaron Digulla about 4 years
    @Make42 MongoDB is a point for both of us. It's an example where most people feel this is "obvious" theft. Obfuscation wouldn't have helped since GAFA can use code as is. That said, it's a corner case because it's a database and those are meant to be useful as is. I agree that you're selling a license key but that key is impossible to protect from code - any code based protection can be circumvented; it's just a question of effort. Therefore, the license key is protected by a contract (software license or good old commercial product).
  • Aaron Digulla
    Aaron Digulla about 4 years
    My core argument is: Obfuscating code is usually a waste of time. Set up a software license or get a lawyer. Adding some basic protection like a license file demonstrates that you want to protect your code against "theft" which is good enough in court (they care about intention, not the height of technical hurdles).
  • Make42
    Make42 about 4 years
    Regarding "as is"-usefulness: Yes, that may be the case here, but for many software applications in the professional context this is not the case.
  • Make42
    Make42 about 4 years
    "It is just a question of effort" - certainly! But if the cost of the effort is larger than the cost of my actual product or service, then I have protected my code well enough, because it is more economical to pay me than to steal from me. That is the hurdle I have to set up. The license of course will have an expiration date.
  • Make42
    Make42 about 4 years
    "Obfuscating code is usually a waste of time." - when I talk about protection I mean something like compiling Python to C and C to bytecode, not using pyminifier. When I hear "obfuscating" I think of pyminifier, not the former approach. (Maybe we misunderstand each other.) Well I think that if using Nuitka takes me a couple of hours after a project that took me a year, than it does not seem to be a lot of time spend and if I protect my code this way enough (it the mentioned economical sense), it does not seem to be a waste to me.
  • Make42
    Make42 about 4 years
    Winning a court case is expensive and difficult and harmful to everyone involved. The moment you need lawyers, you already are in deep trouble.
  • Darian
    Darian about 4 years
    Link goes to example.com.
  • Alex
    Alex almost 4 years
    Yes, but not if you distribute that exact Python version with your obfuscated code.
  • Preethi Vaidyanathan
    Preethi Vaidyanathan almost 4 years
    This library does not seem to be maintained, and gives me indentation errors. I am using Python 3.7
  • Preethi Vaidyanathan
    Preethi Vaidyanathan almost 4 years
    This is an opinion, not a technical answer. I agree that obfuscation doesn't mean your code is completely locked down, but it does prevent low level hacks and makes sense depending on your use case.
  • greendino
    greendino almost 4 years
    yep. i can confirm pyminifier is dead
  • TheTechRobo Stands for Ukraine
    TheTechRobo Stands for Ukraine almost 4 years
    pyminifier may be dead, but I found this repo last push was in May 2020... (how I found it: techgaun.github.io/active-forks/index.html#liftoff/… fork seems to add other people's fixes (probably by looking at the open pull requests on the original repo)...
  • phybarin
    phybarin over 3 years
    'Code protection is overrated', what are u thinking about on premise services ?
  • mathewguest
    mathewguest over 2 years
    autopy2exe compiles in and ships a portable python installation with the distributable in a single <application.exe> file format. Note: also Linux-compatible. It can be complex and a pain point to manage python installations on client computers.
  • Tejas Tank
    Tejas Tank over 2 years
    With technologies and upgrade, everything is possible.
  • user1633272
    user1633272 over 2 years
    It seems that the bytecode could be easily uncompiled using uncompyle6.