How do I protect Python code from being read by users?
Solution 1
Python, being a byte-code-compiled interpreted language, is very difficult to lock down. Even if you use a exe-packager like py2exe, the layout of the executable is well-known, and the Python byte-codes are well understood.
Usually in cases like this, you have to make a tradeoff. How important is it really to protect the code? Are there real secrets in there (such as a key for symmetric encryption of bank transfers), or are you just being paranoid? Choose the language that lets you develop the best product quickest, and be realistic about how valuable your novel ideas are.
If you decide you really need to enforce the license check securely, write it as a small C extension so that the license check code can be extra-hard (but not impossible!) to reverse engineer, and leave the bulk of your code in Python.
Solution 2
"Is there a good way to handle this problem?" No. Nothing can be protected against reverse engineering. Even the firmware on DVD machines has been reverse engineered and the AACS Encryption key exposed. And that's in spite of the DMCA making that a criminal offense.
Since no technical method can stop your customers from reading your code, you have to apply ordinary commercial methods.
-
Licenses. Contracts. Terms and Conditions. This still works even when people can read the code. Note that some of your Python-based components may require that you pay fees before you sell software using those components. Also, some open-source licenses prohibit you from concealing the source or origins of that component.
-
Offer significant value. If your stuff is so good -- at a price that is hard to refuse -- there's no incentive to waste time and money reverse engineering anything. Reverse engineering is expensive. Make your product slightly less expensive.
-
Offer upgrades and enhancements that make any reverse engineering a bad idea. When the next release breaks their reverse engineering, there's no point. This can be carried to absurd extremes, but you should offer new features that make the next release more valuable than reverse engineering.
-
Offer customization at rates so attractive that they'd rather pay you to build and support the enhancements.
-
Use a license key which expires. This is cruel, and will give you a bad reputation, but it certainly makes your software stop working.
-
Offer it as a web service. SaaS involves no downloads to customers.
Solution 3
Python is not the tool you need
You must use the right tool to do the right thing, and Python was not designed to be obfuscated. It's the contrary; everything is open or easy to reveal or modify in Python because that's the language's philosophy.
If you want something you can't see through, look for another tool. This is not a bad thing, it is important that several different tools exist for different usages.
Obfuscation is really hard
Even compiled programs can be reverse-engineered so don't think that you can fully protect any code. You can analyze obfuscated PHP, break the flash encryption key, etc. Newer versions of Windows are cracked every time.
Having a legal requirement is a good way to go
You cannot prevent somebody from misusing your code, but you can easily discover if someone does. Therefore, it's just a casual legal issue.
Code protection is overrated
Nowadays, business models tend to go for selling services instead of products. You cannot copy a service, pirate nor steal it. Maybe it's time to consider to go with the flow...
Solution 4
Compile python and distribute binaries!
Sensible idea:
Use Cython, Nuitka, Shed Skin or something similar to compile python to C code, then distribute your app as python binary libraries (pyd) instead.
That way, no Python (byte) code is left and you've done any reasonable amount of obscurification anyone (i.e. your employer) could expect from regular Code, I think. (.NET or Java less safe than this case, as that bytecode is not obfuscated and can relatively easily be decompiled into reasonable source.)
Cython is getting more and more compatible with CPython, so I think it should work. (I'm actually considering this for our product.. We're already building some thirdparty libs as pyd/dlls, so shipping our own python code as binaries is not a overly big step for us.)
See This Blog Post (not by me) for a tutorial on how to do it. (thx @hithwen)
Crazy idea:
You could probably get Cython to store the C-files separately for each module, then just concatenate them all and build them with heavy inlining. That way, your Python module is pretty monolithic and difficult to chip at with common tools.
Beyond crazy:
You might be able to build a single executable if you can link to (and optimize with) the python runtime and all libraries (dlls) statically. That way, it'd sure be difficult to intercept calls to/from python and whatever framework libraries you use. This cannot be done if you're using LGPL code though.
Solution 5
I understand that you want your customers to use the power of python but do not want expose the source code.
Here are my suggestions:
(a) Write the critical pieces of the code as C or C++ libraries and then use SIP or swig to expose the C/C++ APIs to Python namespace.
(b) Use cython instead of Python
(c) In both (a) and (b), it should be possible to distribute the libraries as licensed binary with a Python interface.
Jordfräs
Scuba diver, cyclist and programmer. Lean and agile advocate.
Updated on July 08, 2022Comments
-
Jordfräs almost 2 years
I am developing a piece of software in Python that will be distributed to my employer's customers. My employer wants to limit the usage of the software with a time-restricted license file.
If we distribute the
.py
files or even.pyc
files it will be easy to (decompile and) remove the code that checks the license file.Another aspect is that my employer does not want the code to be read by our customers, fearing that the code may be stolen or at least the "novel ideas".
Is there a good way to handle this problem?
-
Marc Stober over 15 yearsEven if the license-checking code were hard to reverse engineer because it's written in C, wouldn't it still be relatively easy to remove the calls to the license-checking code?
-
yantrab over 15 yearsYes it would, depending on where the license check is performed. If there are many calls to the extension, it could be difficult to eradicate. Or you can move some other crucial part of the application into the license check as well so that removing the call to the extension cripples the app.
-
yantrab over 15 yearsReally, all of this work is not about preventing modification, but about increasing its difficulty so that it's no longer worth it. Anything can be reverse-engineered and modified if there's enough benefit.
-
Ali Afshar over 15 years+1 For the signing; -1 for the obfuscator You can at least prevent the code from being changed.
-
ddaa over 15 yearsSigning does not work in this context. It's always possible to bypass the signature-checking loader. The first thing you need for useful software protection is an opaque bootstrap mechanism. Not something that Python makes easy.
-
Ali Afshar over 15 yearsYes, bootstrap in non-python.
-
Chii over 15 yearsbut the fact that it was gotten past meant that they failed - the bottom line is just don't try, but go for legal protection.
-
Abgan over 15 yearsOr validate the licence not only on startup but in several other places. Can be easily implemented, and can severely increase the time to bypass.
-
intuited almost 14 years+1 (back to 0): it seems the only true solution to the problem, assuming such an approach to be practical for the setting.
-
Nick T over 13 yearsWhat if they release software to customers, and the customer modifies it internally without re-releasing it?
-
Aaron Digulla over 13 years@Nick: Doesn't change the situation in any way. See my edits.
-
Brian over 13 years@Blair Conrad: Not if the license-checking code hides functionality, too. E.g.
mylicensedfunction(licenseblob liblob, int foo, int bar, std::string bash)
-
TryPyPy over 13 yearsOther possibilities in the same vein: Shed Skin code.google.com/p/shedskin and Nuitka kayhayen24x7.homelinux.org/blog/nuitka-a-python-compiler
-
johndodo over 12 yearsPython is not the tool you need. Malbolge is. :)
-
Dasun over 12 yearsI think the clever way is implementing critical parts in C and implement all license checking stuff in there. (I use hardware dongle which can implement some calculation inside it. So it's almost impossible to reverse it back.)
-
DevPlayer almost 12 yearsBeaware that if your licensing webserver goes down or the customers internet access is down your customer will not be happy that they can't run thier business because of loss of access to licensing checks.
-
Cees Timmerman over 11 yearsI think that simply bundles the .pyc files. Cython, Shed Skin, and PyPy go beyond bytecode.
-
Mitar over 11 yearsIs there any information published on how to get pass this protection mechanisms?
-
Filipe about 11 yearsI just gave a look on Shed Skin as suggested by TyPyPy and it appears to be really good stuff!
-
Thomas Browne about 11 yearsLike this extreme idea. Gets it out there in a huge way and massive market share, then you have a very big customer base for support and addons. I have also been grappling with this question and all the "licensing" answers are basically bull because it doesn't protect against widespread copying, yet doesn't give you any market share advantage.
-
Jordan almost 11 years+1 for stealing ideas back. Why limit your client-serving power to your in-house solutions, when you could see how others improve on your solution and accordingly improve your own product? "If you have an apple and I have an apple and we exchange these apples then you and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas."
-
Wowbagger and his liquid lunch over 10 yearsGood answer, but "casual legal issue"? Really? Where do you live that you have any legal issues that are casual?
-
Mike McKerns about 10 yearsI've actually seen commercial python code shipped as embedded python inside of a C library. Instead of converting some parts of the code to C, they hide the entire python code inside a protective C layer. Then, if they want a module importable by python, they write a thin python extension on top of the C. Open source is a much easier way of life.
-
Jeffrey about 10 years@DevPlayer There are solutions to this. You could implement a local key mechanism that allows temporary access when the software cannot reach the remote licensing server.
-
sergzach about 9 yearsI think, if we have a frequency - how often expensive obfuscated code is hacked - we could say about practicability of using Python and obfuscated code.
-
Oddthinking about 9 years@Jeffrey: That gets you right back to where you started - how to you protect that code. To be safer, you need to put some of the key functionality on your own server, so replacing it would involve substantially effort (at which point, why not just start an open-source competitor?)
-
m3nda over 8 yearsThe good point on this, is to demoralize anyone who try to decode functionallity. Combine that with Cython and some extra crypt over modules or internet calls, and you probably got prize.
-
m3nda over 8 yearsPoint 2 is even more important. If it's cheaper buy than reverse engineering, plus yearly updates, no one will try and even if it does, no one will pay a hacker instead the provider of the software.
-
Daniel over 8 yearsWould compiling with cython work with a python 3.4 Django app, or could it be made to work without a huge amount of effort?
-
ICTMitchell over 8 years@Daniel: Not sure. Haven't tried on Django. Feel free to post a new question about that.
-
Daniel over 8 years
-
ICTMitchell over 8 years@mlvljr FWIW, IMHO compiling to binaries is a nice tradeoff between selling all your secrets and trying to protect against NSA-class reverse engineering. Esp if you have a big python code base and reasons to be paranoid. ;)
-
Delali over 6 yearsThat's true. Reverse engineering is doable but expensive in most situations. @S.Lott, I believe point 6 holds more importance based on the question. If the source code really needs to be protected then it should be remote from the end user.
-
Delali over 6 yearsIf your code have interesting features, the one who was able to misuse it would redistribute it @Macke
-
Skandix almost 6 yearsWhat, if one of your customers re-releases your code or the ideas for free and anonymously? You can't tell who did it and sue them and because they didn't get benifit from it, you won't as well. This will ruine your work while one of you customers only paid the basic price for it. (obviously only works if you have more than one customer for your solution)
-
Aaron Digulla almost 6 years@Skandix How exactly would that work? Uploading your work on the Internet doesn't harm you. It would start to harm you if a lot of people would find it AND those people would be paying customers instead. Code theft is a myth. "My knowledge is for free, my time is expensive" (not sure who said that).
-
qg_java_17137 almost 6 yearshithwen's POST is invalid now.
-
user666412 almost 6 yearsIs cython a viable alterative for this?
-
code_dredd over 5 yearsI know this is old, but I must note that storing crypto keys or anything sensitive inside an executable is a Bad Idea™. The fact that the executable is compiled to binary is irrelevant. You can simply run the
strings
command against it and you'll have the key right there in front of you. -
A Simple Algorithm about 5 yearsQuestion: "is there a good way to protect my family and myself from being murdered by intruders in our sleep?" Internet: "No. Anyone can be gotten to, and no dwelling is ever 100 percent impenetrable. A mortal human family is the wrong tool for the job."
-
Anonymous almost 5 yearsA Windows fix would be to compile the file into a. Pyd and import it later
-
markroxor almost 5 yearsThe only thing this package managed to accomplish is to fool the 'obfuscator' that the code is obfuscated.
-
Vicrobot over 4 yearsthis was making errors when i tried. i think it mishandled the data, and didn't fully convert it.
-
TomSawyer over 4 yearsdoesn't work for whole project, or template engine since it needs variable name to display on template
-
jjmontes over 4 yearsPoint 5 could not be applied on the same assumption that it can be reverse engineered and cracked.
-
Make42 about 4 yearsHow in the world would you "easily discover if someone does"?
-
Make42 about 4 yearsHow would I steal anything back? They just put the code into their product and don't tell anyone how it works and just sell it. How would I every find out, that they used my code in the first place?
-
Aaron Digulla about 4 years@Make42 If you never noticed how would that be different from it not happening? It simply would have no effect on you at all. So to make a case here, you must know they "stole" your code.
-
Make42 about 4 years@AaronDigulla: I don't think this applies. The other people the thief sells to, might have become my customers, but I would never know. If someone A gives B some money to deliver to me unbeknownst to me. B steals the money by keeping it. I might not be sad, because I never knew, but I still have less money then if B had kept the promise.
-
Make42 about 4 yearsBut, the upgrades are also just give-aways... so how would they charge for that? Wouldn't it just be the support?
-
Make42 about 4 yearsRegarding the WingIDE business model: Support is a service, software a product. Products scale, service don't. Support is only a good business model if there is no other business model - meaning, if nobody would buy your product (for whatever reason), you give the product away, so that you have a customer base that at least buys your service.
-
Aaron Digulla about 4 years@Make42 How does obfuscation help here? They can still sell your software unchanged to anyone without sharing with you. I think a good example is MongoDB and similar services where a large group of people has put a lot of effort in and GAFA "steals" their income by selling services around the products without giving back. Being open in your product is what creates a bigger market. Being able to provide services around the product generates most of the income. The product itself is just a door opener.
-
Make42 about 4 years@AaronDigulla: Protecting your code (not necessarily obfuscation) helps with two points: 1) Code is rarely transferable 100%. Usually one needs to tweak a couple of things. Often this is very little (so it is worth changing), but without these changes, the code is unsuable in the new project. Protecting your code, protects your work from being used in projects, in which you are not involved. 2) When you sell software, then what you are actually selling is the license key (as you most likely well know). Protecting your code ensures that the mechanism for needing the key cannot be circumvented.
-
Make42 about 4 yearsRegarding MongoDB: You example is an example for my point: They opened up the code and at the end of the day, GAFA steals their income. You want to say that selling services is worth it. Sure, but MongoDB didn't.
-
Make42 about 4 yearsRegarding "My knowledge is for free, my time is expensive": No it isn't. Some projects are finished by me in a couple of days, but it took years, if not decades, to acquire that knowledge. In fact, that is (in an ideal world) the reason why someone with a long education (say a PhD) gets paid more than someone who just received training on the job (e.g. a cleaner): They are reimbursed for the unpaid years in which they are expected to have acquired a lot of knowledge, while cleaning does not require a lot of training. (This is not meant to be disrespectful to the cleaner as a human though.)
-
Aaron Digulla about 4 years@Make42 MongoDB is a point for both of us. It's an example where most people feel this is "obvious" theft. Obfuscation wouldn't have helped since GAFA can use code as is. That said, it's a corner case because it's a database and those are meant to be useful as is. I agree that you're selling a license key but that key is impossible to protect from code - any code based protection can be circumvented; it's just a question of effort. Therefore, the license key is protected by a contract (software license or good old commercial product).
-
Aaron Digulla about 4 yearsMy core argument is: Obfuscating code is usually a waste of time. Set up a software license or get a lawyer. Adding some basic protection like a license file demonstrates that you want to protect your code against "theft" which is good enough in court (they care about intention, not the height of technical hurdles).
-
Make42 about 4 yearsRegarding "as is"-usefulness: Yes, that may be the case here, but for many software applications in the professional context this is not the case.
-
Make42 about 4 years"It is just a question of effort" - certainly! But if the cost of the effort is larger than the cost of my actual product or service, then I have protected my code well enough, because it is more economical to pay me than to steal from me. That is the hurdle I have to set up. The license of course will have an expiration date.
-
Make42 about 4 years"Obfuscating code is usually a waste of time." - when I talk about protection I mean something like compiling Python to C and C to bytecode, not using pyminifier. When I hear "obfuscating" I think of pyminifier, not the former approach. (Maybe we misunderstand each other.) Well I think that if using Nuitka takes me a couple of hours after a project that took me a year, than it does not seem to be a lot of time spend and if I protect my code this way enough (it the mentioned economical sense), it does not seem to be a waste to me.
-
Make42 about 4 yearsWinning a court case is expensive and difficult and harmful to everyone involved. The moment you need lawyers, you already are in deep trouble.
-
Darian about 4 yearsLink goes to example.com.
-
Alex almost 4 yearsYes, but not if you distribute that exact Python version with your obfuscated code.
-
Preethi Vaidyanathan almost 4 yearsThis library does not seem to be maintained, and gives me indentation errors. I am using Python 3.7
-
Preethi Vaidyanathan almost 4 yearsThis is an opinion, not a technical answer. I agree that obfuscation doesn't mean your code is completely locked down, but it does prevent low level hacks and makes sense depending on your use case.
-
greendino almost 4 yearsyep. i can confirm pyminifier is dead
-
TheTechRobo Stands for Ukraine almost 4 yearspyminifier may be dead, but I found this repo last push was in May 2020... (how I found it: techgaun.github.io/active-forks/index.html#liftoff/… fork seems to add other people's fixes (probably by looking at the open pull requests on the original repo)...
-
phybarin over 3 years'Code protection is overrated', what are u thinking about on premise services ?
-
mathewguest over 2 yearsautopy2exe compiles in and ships a portable python installation with the distributable in a single <application.exe> file format. Note: also Linux-compatible. It can be complex and a pain point to manage python installations on client computers.
-
Tejas Tank over 2 yearsWith technologies and upgrade, everything is possible.
-
user1633272 over 2 yearsIt seems that the bytecode could be easily uncompiled using
uncompyle6
.