Is there javascript to convert HTML to markdown?

16,747

Solution 1

I've started a project to do this:

https://github.com/domchristie/turndown

It's still in its early stages, so has not been heavily tested, but it's a start.

Feedback/contributions welcome.

Solution 2

I have also collaborated on a project on github that does this. At this moment, it is only tested in browser.

html2markdown

I have done a lot of testing on web. Added ton of unit tests. Still not perfect but works nicely. Feedback welcome and I will be happy to receive pull requests or fix any defects you find.

Solution 3

Theoretically, you can convert it back. You'd have to write your own DOM traversal code and convert the HTML back to Markdown.

Generally Markdown is thought to be the human readable/writable source of the information that is converted to HTML for further markup and styling.

HTML can be much more complex than Markdown and can be indefinitely nested and partitioned into tags. This is why it's so questionable to write a general purpose converter which reliably converts HTML back to Markdown. Just imagine all the whitespace and paragraphs going bye-bye and possibly causing a terrible mess for the human eye.

My suggestion is: Unless you generate originating HTML yourself and know what it consists of, don't convert it back to Markdown. Keep the Markdown version all the time and convert to HTML when needed.

Share:
16,747

Related videos on Youtube

Ethan
Author by

Ethan

SOreadytohelp

Updated on January 03, 2020

Comments

  • Ethan
    Ethan over 4 years

    There is showdown.js to convert markdown to HTML, and PHP Markdown to convert markdown to and from HTML. My question is, is there javascript library to convert HTML to markdown?

    • Dean Harding
      Dean Harding about 14 years
      Correct me if I'm wrong, but I don't think either of those libraries convert HTML to markdown. I don't think it's possible, in general, since the markdown->HTML convertion is lossy (that is, data is lost in the conversion that would be required to convert back again).
    • Ethan
      Ethan about 14 years
      Why do you think that markdown->HTML is lossy? I think HTML->markdown is lossy, because every markdown syntax has its HTML equivalent, but not vice versa.
    • Nick Craver
      Nick Craver about 14 years
      @Ethan - When whatever it was went through conversion from markdown to HTML in the first place, it lost data, e.g. extra returns, etc...there's no way to restore that data completely accurately, it's gone. You'll notice SO stores both the original text and the html version of each post...this is one of the reasons.
    • Ethan
      Ethan about 14 years
      @Nick: Also, some HTML tags has more than one markdown equivalents, such as <h2> can be either ## or ----. But what I am looking for is something that can convert HTML to "standard" markdown, i.e., stripping out extra returns and unsupported HTML tags, use ---- for all headings, and others.
    • Justin Johnson
      Justin Johnson about 14 years
      Complete and accurate restoration is not part of the OP. Lossiness of markdown to HTML is irrelevant unless the OP specifies. At any rate, whitespace is lost when rendering HTML as it is, unless of course, it's in a pre tag.
    • Shikiryu
      Shikiryu over 13 years
      Apparently, after many research, it doesn't exist. I should do a "DIY answer" :)
    • Marco Demaio
      Marco Demaio almost 13 years
      showdown.js is gone as well as WMD! :(
    • jonschlinkert
      jonschlinkert over 7 years
      I created github.com/breakdance/breakdance to do this. every other solution I found leaves too much junk in the resulting HTML. IMHO, if you're converting to markdown, you probably aren't interested in keeping tags that don't work in markdown.
  • Jordan Reiter
    Jordan Reiter almost 13 years
    One case where you'd want to convert HTML -> Markdown is in a WYSIWYG editor. Most of them provide the text in HTML or XHTML. It'd be nice to convert that into MarkDown for storage.
  • aleemb
    aleemb almost 12 years
    cool project, thanks very much for sharing.
  • Phoenix
    Phoenix over 11 years
    Of all the html-to-markdown converters I've looked at, this one fit my needs best. The rest moved the links to the bottom, like annotations.
  • GaryBishop
    GaryBishop about 11 years
    Not constructive my A$$! This is great!
  • Paul Verest
    Paul Verest almost 11 years
    This is the same as Dom Christie's answer
  • tilgovi
    tilgovi almost 11 years
    Paul, while the projects have the same goal they are separate implementations. This is not the same answer. It was helpful to me to be able to look at both and compare them.
  • fer
    fer about 9 years
    Lovely stuff! It would be great if it finally supports GitHub Flavored Markdown too.
  • Dom Christie
    Dom Christie about 9 years
    @fer feel free to try: github.com/domchristie/to-markdown/tree/gfm I can’t say when it’s going to be merged, but it’d definitely help to have some testing done :)
  • fer
    fer about 9 years
    good! i will have a look and see if i can contribute... great job Dom!
  • Admin
    Admin over 7 years
    Can the library implement on server side ? I used to install this package by npm install on nodejs, however I can't get affort when I use it. No change with result
  • Dom Christie
    Dom Christie over 7 years
    Yes, the library should work on the server side (node version 4+), and is available on NPM. If you are having problems using it, please raise an issue on the GitHub repository. Thanks.
  • stackovermat
    stackovermat over 5 years
    The link is broken and the project made deprecated. The new project with the same goal can be found here: Turndown
  • Alex G
    Alex G over 3 years
    Thank you so much for creating this project! Exactly what I am looking for!