1
0
mirror of https://github.com/gryf/ebook-converter.git synced 2026-03-25 11:53:33 +01:00

Compare commits

...

10 Commits

Author SHA1 Message Date
Vitaliy Krasnoperov
c89fc132b8 Fix unsmarten text option (#16)
* Create unsmarten.py

* Update unsmarten.py

* Update unsmarten.py

* Create unsmarten.py
2026-02-06 09:06:12 +01:00
8b8a92e9fd Removed license classifier in favor of SPDX entry. 2025-04-18 16:06:33 +02:00
6b7f796cfb README update 2025-03-19 21:28:37 +01:00
72d0858ad8 Move from setup.cfg/py to pure pyproject.toml project definition 2025-03-13 16:55:40 +01:00
4f548ec882 Merge pull request #10 from zagura/add-pyproject-toml
Add pyproject.toml
2025-03-13 12:51:51 +01:00
Michał Zagórski
0faa2c0758 Add pyproject.toml 2025-03-12 23:23:22 +01:00
d37850520b Remove getsize method of PIL in favor of getbbox 2025-03-10 18:33:05 +01:00
5e56cb8c7a Merge pull request #9 from NunoSempere/master
add dependencies, fix some typos
2025-02-10 16:43:31 +01:00
NunoSempere
084e0d11ce fix a few README typos
mostly the lack of "the". I've left some others which are more
charming
2025-01-05 22:32:56 +01:00
NunoSempere
4c3c5a9e27 add missing dependencies (found in Debian 12) 2025-01-05 22:30:09 +01:00
12 changed files with 170 additions and 85 deletions

1
.gitignore vendored
View File

@@ -3,3 +3,4 @@ build/
dist/
sdist/
*.egg-info/
venv/

View File

@@ -1,2 +0,0 @@
graft ebook_converter/data
exclude .gitignore

View File

@@ -2,24 +2,39 @@
Ebook converter
===============
This is impudent ripoff of the bits from `Calibre project`_, and is aimed only
for converter thing.
My motivation is to have only converter for ebooks run from commandline,
without all of those bells and whistles Calibre has, and with cleanest more
*pythonic* approach.
This is an impudent ripoff of the bits from `Calibre project`_, and is aimed
only for converter thing.
My motivation is to have only the converter for ebooks run from the
commandline, without all of those bells and whistles Calibre has, and with
cleanest more *pythonic* approach.
Requirements
------------
To build and run ebook converter, you'll need:
- Python 3.6 or newer
- Python 3.10 or newer
- `Liberation fonts`_
- setuptools
- ``pdftohtml``, ``pdfinfo`` and ``pdftoppm`` from `poppler`_ project for
conversion from PDF available in ``$PATH``
- ``libxml2-dev`` and ``libxslt-dev`` as dependencies for format manipulation
from some of the Calibre code
and several Python packages:
- `beautifulsoup4`_
- `css-parser`_
- `filelock`_
- `html2text`_
- `html5-parser`_
- `msgpack`_
- `odfpy`_
- `pillow`_
- `python-dateutil`_
- `setuptools`_
- `tinycss`_
No Python2 support. Even if Calibre probably still is able to run on Python2, I
do not have an intention to support it.
@@ -28,9 +43,9 @@ do not have an intention to support it.
What's supported
----------------
To be able to perform some optimization and make converter more reliable and
easy to use, first I need to remove some of the features, which are totally not
crucial in my opinion, although they might be re-added later, like, for
To be able to perform some optimization and make the converter more reliable
and easy to use, first I need to remove some of the features, which are totally
not crucial in my opinion, although they might be re-added later, like, for
instance there is no automatic language translations depending on the locale
settings.
@@ -44,15 +59,16 @@ Windows is not currently supported, because of the original spaghetti code.
This may change in the future, after cleanup of mentioned pasta would be
completed.
So called `Kindle periodical` format is not supported, since all we do care are
local files. If there would be downloaded periodical thing (using Calibre for
example), it would be treated as common book.
So called *Kindle periodical* format (which `Amazon has`_ `killed`_ anyway back
in September 2023) is not supported, since all we do care are local files. If
there would be downloaded periodical thing (using Calibre for example), it
would be treated as common book.
Input formats
~~~~~~~~~~~~~
Currently, I've tested following input formats:
Currently, I've tested the following input formats:
- Microsoft Word 2007 and up (``docx``)
- EPUB, both v2 and v3 (``epub``)
@@ -107,7 +123,7 @@ managers), i.e:
$ . venv/bin/activate
(venv) $ git clone https://github.com/gryf/ebook-converter
(venv) $ cd ebook-converter
(venv) $ pip install -r requirements.txt .
(venv) $ pip install .
Simple as that. And from now on, you can issue converter:
@@ -122,9 +138,20 @@ License
This work is licensed on GPL3 license, like the original work. See LICENSE file
for details.
.. _Calibre project: https://calibre-ebook.com/
.. _pypi: https://pypi.python.org
.. _Liberation fonts: https://github.com/liberationfonts/liberation-fonts
.. _Kindle periodical: https://sellercentral.amazon.com/gp/help/external/help.html?itemID=202047960&language=en-US
.. _Amazon has: https://goodereader.com/blog/kindle/amazon-will-discontinue-newspaper-and-magazine-subscriptions-in-september
.. _killed: https://www.theverge.com/23861370/amazon-kindle-periodicals-unlimited-ended
.. _poppler: https://poppler.freedesktop.org/
.. _beautifulsoup4: https://www.crummy.com/software/BeautifulSoup
.. _css-parser: https://github.com/ebook-utils/css-parser
.. _filelock: https://github.com/tox-dev/py-filelock
.. _html2text: https://github.com/Alir3z4/html2text
.. _html5-parser: https://html5-parser.readthedocs.io
.. _msgpack: https://msgpack.org
.. _odfpy: https://github.com/eea/odfpy
.. _pillow: https://python-pillow.github.io
.. _python-dateutil: https://github.com/dateutil/dateutil
.. _setuptools: https://setuptools.pypa.io
.. _tinycss: http://tinycss.readthedocs.io

View File

@@ -62,7 +62,7 @@ class PMLOutput(OutputFormatPlugin):
im = Image.open(io.BytesIO(item.data))
else:
im = Image.open(io.BytesIO(item.data)).convert('P')
im.thumbnail((300,300), Image.ANTIALIAS)
im.thumbnail((300,300), Image.LANCZOS)
data = io.BytesIO()
im.save(data, 'PNG')

View File

@@ -1012,7 +1012,7 @@ class HTMLConverter(object):
self.image_memory.append(pt) # Neccessary, trust me ;-)
try:
im.resize((int(width), int(height)),
PILImage.ANTIALIAS).save(pt, encoding)
PILImage.LANCZOS).save(pt, encoding)
pt.close()
self.scaled_images[path] = pt
return pt.name
@@ -1970,7 +1970,7 @@ def process_file(path, options, logger):
options.cover = cf.name
tim = im.resize((int(0.75 * th), th),
PILImage.ANTIALIAS).convert('RGB')
PILImage.LANCZOS).convert('RGB')
tf = PersistentTemporaryFile(prefix=__appname__ + '_',
suffix=".jpg")
tf.close()

View File

@@ -145,7 +145,7 @@ class Cell(object):
continue
word = token.split()
word = word[0] if word else ""
width = font.getsize(word)[0]
width = font.getbbox(word)[2]
if width > mwidth:
mwidth = width
return parindent + mwidth + 2
@@ -191,7 +191,7 @@ class Cell(object):
if (ff, fs) != (ts['fontfacename'], ts['fontsize']):
font = get_font(ff, self.pts_to_pixels(fs))
for word in token.split():
width, height = font.getsize(word)
_, _, width, height = font.getbbox(word)
left, right, top, bottom = add_word(width, height, left, right, top, bottom, ls, ws)
return right+3+max(parindent, 10), bottom

View File

@@ -0,0 +1,28 @@
__license__ = 'GPL 3'
__copyright__ = '2011, John Schember <john@nachtimwald.com>'
__docformat__ = 'restructuredtext en'
from ebook_converter.ebooks.oeb.base import OEB_DOCS, XPath
from ebook_converter.ebooks.oeb.parse_utils import barename
from ebook_converter.utils.unsmarten import unsmarten_text
class UnsmartenPunctuation:
def __init__(self):
self.html_tags = XPath('descendant::h:*')
def unsmarten(self, root):
for x in self.html_tags(root):
if not barename(x.tag) == 'pre':
if getattr(x, 'text', None):
x.text = unsmarten_text(x.text)
if getattr(x, 'tail', None) and x.tail:
x.tail = unsmarten_text(x.tail)
def __call__(self, oeb, context):
bx = XPath('//h:body')
for x in oeb.manifest.items:
if x.media_type in OEB_DOCS:
for body in bx(x.data):
self.unsmarten(body)

View File

@@ -0,0 +1,40 @@
__license__ = 'GPL 3'
__copyright__ = '2011, John Schember <john@nachtimwald.com>'
__docformat__ = 'restructuredtext en'
from ebook_converter.utils.mreplace import MReplace
_mreplace = MReplace({
'&#8211;': '--',
'&ndash;': '--',
'': '--',
'&#8212;': '---',
'&mdash;': '---',
'': '---',
'&#8230;': '...',
'&hellip;': '...',
'': '...',
'&#8220;': '"',
'&#8221;': '"',
'&#8222;': '"',
'&#8243;': '"',
'&ldquo;': '"',
'&rdquo;': '"',
'&bdquo;': '"',
'&Prime;': '"',
'':'"',
'':'"',
'':'"',
'':'"',
'&#8216;':"'",
'&#8217;':"'",
'&#8242;':"'",
'&lsquo;':"'",
'&rsquo;':"'",
'&prime;':"'",
'':"'",
'':"'",
'':"'",
})
unsmarten_text = _mreplace.mreplace

52
pyproject.toml Normal file
View File

@@ -0,0 +1,52 @@
[build-system]
requires = ["setuptools >= 77.0"]
build-backend = "setuptools.build_meta"
[project]
name = "ebook-converter"
version = "4.12.0"
requires-python = ">= 3.10"
description = "Convert ebook between different formats"
dependencies = [
"beautifulsoup4>=4.9.3",
"css-parser>=1.0.6",
"filelock>=3.0.12",
"html2text>=2020.1.16",
"html5-parser==0.4.12",
"msgpack>=1.0.0",
"odfpy>=1.4.1",
"pillow>=8.0.1",
"python-dateutil>=2.8.1",
"setuptools>=61.0",
"tinycss>=0.4"
]
readme = "README.rst"
authors = [
{name = "gryf", email = "gryf73@gmail.com"}
]
license = "GPL-3.0-or-later"
classifiers = [
"Environment :: Console",
"Intended Audience :: Other Audience",
"Operating System :: POSIX :: Linux",
"Development Status :: 3 - Alpha",
"Programming Language :: Python",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3 :: Only",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13"
]
[project.urls]
Repository = "https://github.com/gryf/ebook-converter"
[project.scripts]
ebook-converter = "ebook_converter.main:main"
[tool.setuptools.packages.find]
exclude = ["snap"]
[tool.setuptools.package-data]
"*" = ["*.types", "*.css", "*.html", "*.xhtml", "*.xsl", "*.json"]

View File

@@ -1,11 +0,0 @@
beautifulsoup4>=4.9.3
css-parser>=1.0.6
filelock>=3.0.12
html2text>=2020.1.16
html5-parser==0.4.9 --no-binary lxml
msgpack>=1.0.0
odfpy>=1.4.1
pillow>=8.0.1
python-dateutil>=2.8.1
setuptools>=50.3.2
tinycss>=0.4

View File

@@ -1,46 +0,0 @@
[metadata]
name = ebook-converter
version = 4.12.0
summary = Convert ebook between different formats
description-file =
README.rst
author = gryf
author-email = gryf73@gmail.com
license = GPL3
license_file = LICENSE
url = https://github.com/gryf/ebook-converter
classifier =
Environment :: Console
Intended Audience :: Other Audience
License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Operating System :: POSIX :: Linux
Development Status :: 3 - Alpha
Programming Language :: Python
Programming Language :: Python :: 3
Programming Language :: Python :: 3 :: Only
Programming Language :: Python :: 3.6
Programming Language :: Python :: 3.7
[options]
packages = find:
include_package_data = True
install_requires =
filelock
python-dateutil
lxml
css-parser
beautifulsoup4
tinycss
pillow
msgpack
html5-parser
odfpy
setuptools
html2text
[options.entry_points]
console_scripts =
ebook-converter=ebook_converter.main:main
[options.package_data]
* = *.types *.css, *.html, *.xsl

View File

@@ -1,4 +0,0 @@
import setuptools
setuptools.setup()