mirror of
https://github.com/gryf/ebook-converter.git
synced 2026-03-25 11:53:33 +01:00
Compare commits
10 Commits
c240495c3d
...
master
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
c89fc132b8 | ||
| 8b8a92e9fd | |||
| 6b7f796cfb | |||
| 72d0858ad8 | |||
| 4f548ec882 | |||
|
|
0faa2c0758 | ||
| d37850520b | |||
| 5e56cb8c7a | |||
|
|
084e0d11ce | ||
|
|
4c3c5a9e27 |
1
.gitignore
vendored
1
.gitignore
vendored
@@ -3,3 +3,4 @@ build/
|
||||
dist/
|
||||
sdist/
|
||||
*.egg-info/
|
||||
venv/
|
||||
|
||||
@@ -1,2 +0,0 @@
|
||||
graft ebook_converter/data
|
||||
exclude .gitignore
|
||||
61
README.rst
61
README.rst
@@ -2,24 +2,39 @@
|
||||
Ebook converter
|
||||
===============
|
||||
|
||||
This is impudent ripoff of the bits from `Calibre project`_, and is aimed only
|
||||
for converter thing.
|
||||
|
||||
My motivation is to have only converter for ebooks run from commandline,
|
||||
without all of those bells and whistles Calibre has, and with cleanest more
|
||||
*pythonic* approach.
|
||||
This is an impudent ripoff of the bits from `Calibre project`_, and is aimed
|
||||
only for converter thing.
|
||||
|
||||
My motivation is to have only the converter for ebooks run from the
|
||||
commandline, without all of those bells and whistles Calibre has, and with
|
||||
cleanest more *pythonic* approach.
|
||||
|
||||
Requirements
|
||||
------------
|
||||
|
||||
To build and run ebook converter, you'll need:
|
||||
|
||||
- Python 3.6 or newer
|
||||
- Python 3.10 or newer
|
||||
- `Liberation fonts`_
|
||||
- setuptools
|
||||
- ``pdftohtml``, ``pdfinfo`` and ``pdftoppm`` from `poppler`_ project for
|
||||
conversion from PDF available in ``$PATH``
|
||||
- ``libxml2-dev`` and ``libxslt-dev`` as dependencies for format manipulation
|
||||
from some of the Calibre code
|
||||
|
||||
and several Python packages:
|
||||
|
||||
- `beautifulsoup4`_
|
||||
- `css-parser`_
|
||||
- `filelock`_
|
||||
- `html2text`_
|
||||
- `html5-parser`_
|
||||
- `msgpack`_
|
||||
- `odfpy`_
|
||||
- `pillow`_
|
||||
- `python-dateutil`_
|
||||
- `setuptools`_
|
||||
- `tinycss`_
|
||||
|
||||
No Python2 support. Even if Calibre probably still is able to run on Python2, I
|
||||
do not have an intention to support it.
|
||||
@@ -28,9 +43,9 @@ do not have an intention to support it.
|
||||
What's supported
|
||||
----------------
|
||||
|
||||
To be able to perform some optimization and make converter more reliable and
|
||||
easy to use, first I need to remove some of the features, which are totally not
|
||||
crucial in my opinion, although they might be re-added later, like, for
|
||||
To be able to perform some optimization and make the converter more reliable
|
||||
and easy to use, first I need to remove some of the features, which are totally
|
||||
not crucial in my opinion, although they might be re-added later, like, for
|
||||
instance there is no automatic language translations depending on the locale
|
||||
settings.
|
||||
|
||||
@@ -44,15 +59,16 @@ Windows is not currently supported, because of the original spaghetti code.
|
||||
This may change in the future, after cleanup of mentioned pasta would be
|
||||
completed.
|
||||
|
||||
So called `Kindle periodical` format is not supported, since all we do care are
|
||||
local files. If there would be downloaded periodical thing (using Calibre for
|
||||
example), it would be treated as common book.
|
||||
So called *Kindle periodical* format (which `Amazon has`_ `killed`_ anyway back
|
||||
in September 2023) is not supported, since all we do care are local files. If
|
||||
there would be downloaded periodical thing (using Calibre for example), it
|
||||
would be treated as common book.
|
||||
|
||||
|
||||
Input formats
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
Currently, I've tested following input formats:
|
||||
Currently, I've tested the following input formats:
|
||||
|
||||
- Microsoft Word 2007 and up (``docx``)
|
||||
- EPUB, both v2 and v3 (``epub``)
|
||||
@@ -107,7 +123,7 @@ managers), i.e:
|
||||
$ . venv/bin/activate
|
||||
(venv) $ git clone https://github.com/gryf/ebook-converter
|
||||
(venv) $ cd ebook-converter
|
||||
(venv) $ pip install -r requirements.txt .
|
||||
(venv) $ pip install .
|
||||
|
||||
Simple as that. And from now on, you can issue converter:
|
||||
|
||||
@@ -122,9 +138,20 @@ License
|
||||
This work is licensed on GPL3 license, like the original work. See LICENSE file
|
||||
for details.
|
||||
|
||||
|
||||
.. _Calibre project: https://calibre-ebook.com/
|
||||
.. _pypi: https://pypi.python.org
|
||||
.. _Liberation fonts: https://github.com/liberationfonts/liberation-fonts
|
||||
.. _Kindle periodical: https://sellercentral.amazon.com/gp/help/external/help.html?itemID=202047960&language=en-US
|
||||
.. _Amazon has: https://goodereader.com/blog/kindle/amazon-will-discontinue-newspaper-and-magazine-subscriptions-in-september
|
||||
.. _killed: https://www.theverge.com/23861370/amazon-kindle-periodicals-unlimited-ended
|
||||
.. _poppler: https://poppler.freedesktop.org/
|
||||
.. _beautifulsoup4: https://www.crummy.com/software/BeautifulSoup
|
||||
.. _css-parser: https://github.com/ebook-utils/css-parser
|
||||
.. _filelock: https://github.com/tox-dev/py-filelock
|
||||
.. _html2text: https://github.com/Alir3z4/html2text
|
||||
.. _html5-parser: https://html5-parser.readthedocs.io
|
||||
.. _msgpack: https://msgpack.org
|
||||
.. _odfpy: https://github.com/eea/odfpy
|
||||
.. _pillow: https://python-pillow.github.io
|
||||
.. _python-dateutil: https://github.com/dateutil/dateutil
|
||||
.. _setuptools: https://setuptools.pypa.io
|
||||
.. _tinycss: http://tinycss.readthedocs.io
|
||||
|
||||
@@ -62,7 +62,7 @@ class PMLOutput(OutputFormatPlugin):
|
||||
im = Image.open(io.BytesIO(item.data))
|
||||
else:
|
||||
im = Image.open(io.BytesIO(item.data)).convert('P')
|
||||
im.thumbnail((300,300), Image.ANTIALIAS)
|
||||
im.thumbnail((300,300), Image.LANCZOS)
|
||||
|
||||
data = io.BytesIO()
|
||||
im.save(data, 'PNG')
|
||||
|
||||
@@ -1012,7 +1012,7 @@ class HTMLConverter(object):
|
||||
self.image_memory.append(pt) # Neccessary, trust me ;-)
|
||||
try:
|
||||
im.resize((int(width), int(height)),
|
||||
PILImage.ANTIALIAS).save(pt, encoding)
|
||||
PILImage.LANCZOS).save(pt, encoding)
|
||||
pt.close()
|
||||
self.scaled_images[path] = pt
|
||||
return pt.name
|
||||
@@ -1970,7 +1970,7 @@ def process_file(path, options, logger):
|
||||
options.cover = cf.name
|
||||
|
||||
tim = im.resize((int(0.75 * th), th),
|
||||
PILImage.ANTIALIAS).convert('RGB')
|
||||
PILImage.LANCZOS).convert('RGB')
|
||||
tf = PersistentTemporaryFile(prefix=__appname__ + '_',
|
||||
suffix=".jpg")
|
||||
tf.close()
|
||||
|
||||
@@ -145,7 +145,7 @@ class Cell(object):
|
||||
continue
|
||||
word = token.split()
|
||||
word = word[0] if word else ""
|
||||
width = font.getsize(word)[0]
|
||||
width = font.getbbox(word)[2]
|
||||
if width > mwidth:
|
||||
mwidth = width
|
||||
return parindent + mwidth + 2
|
||||
@@ -191,7 +191,7 @@ class Cell(object):
|
||||
if (ff, fs) != (ts['fontfacename'], ts['fontsize']):
|
||||
font = get_font(ff, self.pts_to_pixels(fs))
|
||||
for word in token.split():
|
||||
width, height = font.getsize(word)
|
||||
_, _, width, height = font.getbbox(word)
|
||||
left, right, top, bottom = add_word(width, height, left, right, top, bottom, ls, ws)
|
||||
return right+3+max(parindent, 10), bottom
|
||||
|
||||
|
||||
28
ebook_converter/ebooks/oeb/transforms/unsmarten.py
Normal file
28
ebook_converter/ebooks/oeb/transforms/unsmarten.py
Normal file
@@ -0,0 +1,28 @@
|
||||
__license__ = 'GPL 3'
|
||||
__copyright__ = '2011, John Schember <john@nachtimwald.com>'
|
||||
__docformat__ = 'restructuredtext en'
|
||||
|
||||
from ebook_converter.ebooks.oeb.base import OEB_DOCS, XPath
|
||||
from ebook_converter.ebooks.oeb.parse_utils import barename
|
||||
from ebook_converter.utils.unsmarten import unsmarten_text
|
||||
|
||||
|
||||
class UnsmartenPunctuation:
|
||||
|
||||
def __init__(self):
|
||||
self.html_tags = XPath('descendant::h:*')
|
||||
|
||||
def unsmarten(self, root):
|
||||
for x in self.html_tags(root):
|
||||
if not barename(x.tag) == 'pre':
|
||||
if getattr(x, 'text', None):
|
||||
x.text = unsmarten_text(x.text)
|
||||
if getattr(x, 'tail', None) and x.tail:
|
||||
x.tail = unsmarten_text(x.tail)
|
||||
|
||||
def __call__(self, oeb, context):
|
||||
bx = XPath('//h:body')
|
||||
for x in oeb.manifest.items:
|
||||
if x.media_type in OEB_DOCS:
|
||||
for body in bx(x.data):
|
||||
self.unsmarten(body)
|
||||
40
ebook_converter/utils/unsmarten.py
Normal file
40
ebook_converter/utils/unsmarten.py
Normal file
@@ -0,0 +1,40 @@
|
||||
__license__ = 'GPL 3'
|
||||
__copyright__ = '2011, John Schember <john@nachtimwald.com>'
|
||||
__docformat__ = 'restructuredtext en'
|
||||
|
||||
from ebook_converter.utils.mreplace import MReplace
|
||||
|
||||
_mreplace = MReplace({
|
||||
'–': '--',
|
||||
'–': '--',
|
||||
'–': '--',
|
||||
'—': '---',
|
||||
'—': '---',
|
||||
'—': '---',
|
||||
'…': '...',
|
||||
'…': '...',
|
||||
'…': '...',
|
||||
'“': '"',
|
||||
'”': '"',
|
||||
'„': '"',
|
||||
'″': '"',
|
||||
'“': '"',
|
||||
'”': '"',
|
||||
'„': '"',
|
||||
'″': '"',
|
||||
'“':'"',
|
||||
'”':'"',
|
||||
'„':'"',
|
||||
'″':'"',
|
||||
'‘':"'",
|
||||
'’':"'",
|
||||
'′':"'",
|
||||
'‘':"'",
|
||||
'’':"'",
|
||||
'′':"'",
|
||||
'‘':"'",
|
||||
'’':"'",
|
||||
'′':"'",
|
||||
})
|
||||
|
||||
unsmarten_text = _mreplace.mreplace
|
||||
52
pyproject.toml
Normal file
52
pyproject.toml
Normal file
@@ -0,0 +1,52 @@
|
||||
[build-system]
|
||||
requires = ["setuptools >= 77.0"]
|
||||
build-backend = "setuptools.build_meta"
|
||||
|
||||
[project]
|
||||
name = "ebook-converter"
|
||||
version = "4.12.0"
|
||||
requires-python = ">= 3.10"
|
||||
description = "Convert ebook between different formats"
|
||||
dependencies = [
|
||||
"beautifulsoup4>=4.9.3",
|
||||
"css-parser>=1.0.6",
|
||||
"filelock>=3.0.12",
|
||||
"html2text>=2020.1.16",
|
||||
"html5-parser==0.4.12",
|
||||
"msgpack>=1.0.0",
|
||||
"odfpy>=1.4.1",
|
||||
"pillow>=8.0.1",
|
||||
"python-dateutil>=2.8.1",
|
||||
"setuptools>=61.0",
|
||||
"tinycss>=0.4"
|
||||
]
|
||||
readme = "README.rst"
|
||||
authors = [
|
||||
{name = "gryf", email = "gryf73@gmail.com"}
|
||||
]
|
||||
license = "GPL-3.0-or-later"
|
||||
classifiers = [
|
||||
"Environment :: Console",
|
||||
"Intended Audience :: Other Audience",
|
||||
"Operating System :: POSIX :: Linux",
|
||||
"Development Status :: 3 - Alpha",
|
||||
"Programming Language :: Python",
|
||||
"Programming Language :: Python :: 3",
|
||||
"Programming Language :: Python :: 3 :: Only",
|
||||
"Programming Language :: Python :: 3.10",
|
||||
"Programming Language :: Python :: 3.11",
|
||||
"Programming Language :: Python :: 3.12",
|
||||
"Programming Language :: Python :: 3.13"
|
||||
]
|
||||
|
||||
[project.urls]
|
||||
Repository = "https://github.com/gryf/ebook-converter"
|
||||
|
||||
[project.scripts]
|
||||
ebook-converter = "ebook_converter.main:main"
|
||||
|
||||
[tool.setuptools.packages.find]
|
||||
exclude = ["snap"]
|
||||
|
||||
[tool.setuptools.package-data]
|
||||
"*" = ["*.types", "*.css", "*.html", "*.xhtml", "*.xsl", "*.json"]
|
||||
@@ -1,11 +0,0 @@
|
||||
beautifulsoup4>=4.9.3
|
||||
css-parser>=1.0.6
|
||||
filelock>=3.0.12
|
||||
html2text>=2020.1.16
|
||||
html5-parser==0.4.9 --no-binary lxml
|
||||
msgpack>=1.0.0
|
||||
odfpy>=1.4.1
|
||||
pillow>=8.0.1
|
||||
python-dateutil>=2.8.1
|
||||
setuptools>=50.3.2
|
||||
tinycss>=0.4
|
||||
46
setup.cfg
46
setup.cfg
@@ -1,46 +0,0 @@
|
||||
[metadata]
|
||||
name = ebook-converter
|
||||
version = 4.12.0
|
||||
summary = Convert ebook between different formats
|
||||
description-file =
|
||||
README.rst
|
||||
author = gryf
|
||||
author-email = gryf73@gmail.com
|
||||
license = GPL3
|
||||
license_file = LICENSE
|
||||
url = https://github.com/gryf/ebook-converter
|
||||
classifier =
|
||||
Environment :: Console
|
||||
Intended Audience :: Other Audience
|
||||
License :: OSI Approved :: GNU General Public License v3 (GPLv3)
|
||||
Operating System :: POSIX :: Linux
|
||||
Development Status :: 3 - Alpha
|
||||
Programming Language :: Python
|
||||
Programming Language :: Python :: 3
|
||||
Programming Language :: Python :: 3 :: Only
|
||||
Programming Language :: Python :: 3.6
|
||||
Programming Language :: Python :: 3.7
|
||||
|
||||
[options]
|
||||
packages = find:
|
||||
include_package_data = True
|
||||
install_requires =
|
||||
filelock
|
||||
python-dateutil
|
||||
lxml
|
||||
css-parser
|
||||
beautifulsoup4
|
||||
tinycss
|
||||
pillow
|
||||
msgpack
|
||||
html5-parser
|
||||
odfpy
|
||||
setuptools
|
||||
html2text
|
||||
|
||||
[options.entry_points]
|
||||
console_scripts =
|
||||
ebook-converter=ebook_converter.main:main
|
||||
|
||||
[options.package_data]
|
||||
* = *.types *.css, *.html, *.xsl
|
||||
Reference in New Issue
Block a user