1
0
mirror of https://github.com/gryf/ebook-converter.git synced 2026-03-25 11:53:33 +01:00

Compare commits

...

6 Commits

Author SHA1 Message Date
Vitaliy Krasnoperov
c89fc132b8 Fix unsmarten text option (#16)
* Create unsmarten.py

* Update unsmarten.py

* Update unsmarten.py

* Create unsmarten.py
2026-02-06 09:06:12 +01:00
8b8a92e9fd Removed license classifier in favor of SPDX entry. 2025-04-18 16:06:33 +02:00
6b7f796cfb README update 2025-03-19 21:28:37 +01:00
72d0858ad8 Move from setup.cfg/py to pure pyproject.toml project definition 2025-03-13 16:55:40 +01:00
4f548ec882 Merge pull request #10 from zagura/add-pyproject-toml
Add pyproject.toml
2025-03-13 12:51:51 +01:00
Michał Zagórski
0faa2c0758 Add pyproject.toml 2025-03-12 23:23:22 +01:00
8 changed files with 161 additions and 78 deletions

View File

@@ -1,2 +0,0 @@
graft ebook_converter/data
exclude .gitignore

View File

@@ -2,19 +2,19 @@
Ebook converter
===============
This is an impudent ripoff of the bits from `Calibre project`_, and is aimed only
for converter thing.
This is an impudent ripoff of the bits from `Calibre project`_, and is aimed
only for converter thing.
My motivation is to have only the converter for ebooks run from the commandline,
without all of those bells and whistles Calibre has, and with cleanest more
*pythonic* approach.
My motivation is to have only the converter for ebooks run from the
commandline, without all of those bells and whistles Calibre has, and with
cleanest more *pythonic* approach.
Requirements
------------
To build and run ebook converter, you'll need:
- Python 3.6 or newer
- Python 3.10 or newer
- `Liberation fonts`_
- setuptools
- ``pdftohtml``, ``pdfinfo`` and ``pdftoppm`` from `poppler`_ project for
@@ -22,6 +22,20 @@ To build and run ebook converter, you'll need:
- ``libxml2-dev`` and ``libxslt-dev`` as dependencies for format manipulation
from some of the Calibre code
and several Python packages:
- `beautifulsoup4`_
- `css-parser`_
- `filelock`_
- `html2text`_
- `html5-parser`_
- `msgpack`_
- `odfpy`_
- `pillow`_
- `python-dateutil`_
- `setuptools`_
- `tinycss`_
No Python2 support. Even if Calibre probably still is able to run on Python2, I
do not have an intention to support it.
@@ -29,9 +43,9 @@ do not have an intention to support it.
What's supported
----------------
To be able to perform some optimization and make the converter more reliable and
easy to use, first I need to remove some of the features, which are totally not
crucial in my opinion, although they might be re-added later, like, for
To be able to perform some optimization and make the converter more reliable
and easy to use, first I need to remove some of the features, which are totally
not crucial in my opinion, although they might be re-added later, like, for
instance there is no automatic language translations depending on the locale
settings.
@@ -45,9 +59,10 @@ Windows is not currently supported, because of the original spaghetti code.
This may change in the future, after cleanup of mentioned pasta would be
completed.
So called `Kindle periodical` format is not supported, since all we do care are
local files. If there would be downloaded periodical thing (using Calibre for
example), it would be treated as common book.
So called *Kindle periodical* format (which `Amazon has`_ `killed`_ anyway back
in September 2023) is not supported, since all we do care are local files. If
there would be downloaded periodical thing (using Calibre for example), it
would be treated as common book.
Input formats
@@ -108,7 +123,7 @@ managers), i.e:
$ . venv/bin/activate
(venv) $ git clone https://github.com/gryf/ebook-converter
(venv) $ cd ebook-converter
(venv) $ pip install -r requirements.txt .
(venv) $ pip install .
Simple as that. And from now on, you can issue converter:
@@ -123,9 +138,20 @@ License
This work is licensed on GPL3 license, like the original work. See LICENSE file
for details.
.. _Calibre project: https://calibre-ebook.com/
.. _pypi: https://pypi.python.org
.. _Liberation fonts: https://github.com/liberationfonts/liberation-fonts
.. _Kindle periodical: https://sellercentral.amazon.com/gp/help/external/help.html?itemID=202047960&language=en-US
.. _Amazon has: https://goodereader.com/blog/kindle/amazon-will-discontinue-newspaper-and-magazine-subscriptions-in-september
.. _killed: https://www.theverge.com/23861370/amazon-kindle-periodicals-unlimited-ended
.. _poppler: https://poppler.freedesktop.org/
.. _beautifulsoup4: https://www.crummy.com/software/BeautifulSoup
.. _css-parser: https://github.com/ebook-utils/css-parser
.. _filelock: https://github.com/tox-dev/py-filelock
.. _html2text: https://github.com/Alir3z4/html2text
.. _html5-parser: https://html5-parser.readthedocs.io
.. _msgpack: https://msgpack.org
.. _odfpy: https://github.com/eea/odfpy
.. _pillow: https://python-pillow.github.io
.. _python-dateutil: https://github.com/dateutil/dateutil
.. _setuptools: https://setuptools.pypa.io
.. _tinycss: http://tinycss.readthedocs.io

View File

@@ -0,0 +1,28 @@
__license__ = 'GPL 3'
__copyright__ = '2011, John Schember <john@nachtimwald.com>'
__docformat__ = 'restructuredtext en'
from ebook_converter.ebooks.oeb.base import OEB_DOCS, XPath
from ebook_converter.ebooks.oeb.parse_utils import barename
from ebook_converter.utils.unsmarten import unsmarten_text
class UnsmartenPunctuation:
def __init__(self):
self.html_tags = XPath('descendant::h:*')
def unsmarten(self, root):
for x in self.html_tags(root):
if not barename(x.tag) == 'pre':
if getattr(x, 'text', None):
x.text = unsmarten_text(x.text)
if getattr(x, 'tail', None) and x.tail:
x.tail = unsmarten_text(x.tail)
def __call__(self, oeb, context):
bx = XPath('//h:body')
for x in oeb.manifest.items:
if x.media_type in OEB_DOCS:
for body in bx(x.data):
self.unsmarten(body)

View File

@@ -0,0 +1,40 @@
__license__ = 'GPL 3'
__copyright__ = '2011, John Schember <john@nachtimwald.com>'
__docformat__ = 'restructuredtext en'
from ebook_converter.utils.mreplace import MReplace
_mreplace = MReplace({
'&#8211;': '--',
'&ndash;': '--',
'': '--',
'&#8212;': '---',
'&mdash;': '---',
'': '---',
'&#8230;': '...',
'&hellip;': '...',
'': '...',
'&#8220;': '"',
'&#8221;': '"',
'&#8222;': '"',
'&#8243;': '"',
'&ldquo;': '"',
'&rdquo;': '"',
'&bdquo;': '"',
'&Prime;': '"',
'':'"',
'':'"',
'':'"',
'':'"',
'&#8216;':"'",
'&#8217;':"'",
'&#8242;':"'",
'&lsquo;':"'",
'&rsquo;':"'",
'&prime;':"'",
'':"'",
'':"'",
'':"'",
})
unsmarten_text = _mreplace.mreplace

52
pyproject.toml Normal file
View File

@@ -0,0 +1,52 @@
[build-system]
requires = ["setuptools >= 77.0"]
build-backend = "setuptools.build_meta"
[project]
name = "ebook-converter"
version = "4.12.0"
requires-python = ">= 3.10"
description = "Convert ebook between different formats"
dependencies = [
"beautifulsoup4>=4.9.3",
"css-parser>=1.0.6",
"filelock>=3.0.12",
"html2text>=2020.1.16",
"html5-parser==0.4.12",
"msgpack>=1.0.0",
"odfpy>=1.4.1",
"pillow>=8.0.1",
"python-dateutil>=2.8.1",
"setuptools>=61.0",
"tinycss>=0.4"
]
readme = "README.rst"
authors = [
{name = "gryf", email = "gryf73@gmail.com"}
]
license = "GPL-3.0-or-later"
classifiers = [
"Environment :: Console",
"Intended Audience :: Other Audience",
"Operating System :: POSIX :: Linux",
"Development Status :: 3 - Alpha",
"Programming Language :: Python",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3 :: Only",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13"
]
[project.urls]
Repository = "https://github.com/gryf/ebook-converter"
[project.scripts]
ebook-converter = "ebook_converter.main:main"
[tool.setuptools.packages.find]
exclude = ["snap"]
[tool.setuptools.package-data]
"*" = ["*.types", "*.css", "*.html", "*.xhtml", "*.xsl", "*.json"]

View File

@@ -1,11 +0,0 @@
beautifulsoup4>=4.9.3
css-parser>=1.0.6
filelock>=3.0.12
html2text>=2020.1.16
html5-parser==0.4.9 --no-binary lxml
msgpack>=1.0.0
odfpy>=1.4.1
pillow>=8.0.1
python-dateutil>=2.8.1
setuptools>=50.3.2
tinycss>=0.4

View File

@@ -1,46 +0,0 @@
[metadata]
name = ebook-converter
version = 4.12.0
summary = Convert ebook between different formats
description-file =
README.rst
author = gryf
author-email = gryf73@gmail.com
license = GPL3
license_file = LICENSE
url = https://github.com/gryf/ebook-converter
classifier =
Environment :: Console
Intended Audience :: Other Audience
License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Operating System :: POSIX :: Linux
Development Status :: 3 - Alpha
Programming Language :: Python
Programming Language :: Python :: 3
Programming Language :: Python :: 3 :: Only
Programming Language :: Python :: 3.6
Programming Language :: Python :: 3.7
[options]
packages = find:
include_package_data = True
install_requires =
filelock
python-dateutil
lxml
css-parser
beautifulsoup4
tinycss
pillow
msgpack
html5-parser
odfpy
setuptools
html2text
[options.entry_points]
console_scripts =
ebook-converter=ebook_converter.main:main
[options.package_data]
* = *.types *.css, *.html, *.xsl

View File

@@ -1,4 +0,0 @@
import setuptools
setuptools.setup()